Using dbt with GlareDB

GlareDB can serve as a destination warehouse for dbt, allowing you to build transformation pipelines that work with data from multiple sources without additional ETL infrastructure, so you can save money and time by not needing to configure data extraction with tools like Fivetran or Stitch. And since GlareDB can connect using the Postgres protocol, you can use the existing Postgres adapter without needing anything new. This guide will walk you through setting up and using dbt with GlareDB.

Prerequisites

  • dbt Core installed (pip install dbt-postgres)
  • A GlareDB Cloud account (sign up at console.glaredb.com)
  • Basic familiarity with dbt concepts

Setup

1. Initialize a dbt Project

Create a new dbt project:

dbt init my_glaredb_project

2. Configure dbt Credentials

Add your GlareDB credentials to your profiles.yml file. You can find your connection details in GlareDB Cloud by clicking the “Connect” button and selecting the Postgres tab.

dbt_glaredb_quickstart:
  outputs:
    dev:
      dbname: org_name/deployment_name
      host: proxy.glaredb.com
      pass: your_password
      port: 6543
      schema: public
      threads: 1
      type: postgres
      user: your_username
  target: dev

Note: When using your GlareDB Cloud connection string (which looks like glaredb://user_name:password@org_name.proxy.glaredb.com:6543/deployment_name), specify your dbname as org_name/deployment_name and the host as proxy.glaredb.com.

3. Test Your Connection

Verify your connection:

dbt debug

Creating dbt Models

Basic Model Example

First, define your sources in models/properties.yml:

version: 2

sources:
  - name: glaredb
    schema: public
    database: default
    tables:
      - name: sales_data

Then, create a staging model in models/staging_sales.sql:

SELECT
    borough,
    neighborhood,
    building_class_category,
    residential_units,
    commercial_units,
    total_units,
    land_square_feet,
    gross_square_feet,
    year_built,
    sale_price,
    sale_date,
    latitude,
    longitude,
    bin,
    bbl
FROM 

Working with Multiple Data Sources

GlareDB allows you to join data from different sources directly in your dbt models. Here’s an example that combines data from an external Postgres data source and a remote file using GlareDB’s read_postgres function:

SELECT
    lookup.borough_name,
    sales.neighborhood,
    sales.building_class_category,
    sales.residential_units,
    sales.commercial_units,
    sales.total_units,
    sales.land_square_feet,
    sales.gross_square_feet,
    sales.year_built,
    sales.sale_price,
    sales.sale_date,
    sales.latitude,
    sales.longitude,
    sales.bin,
    sales.bbl
FROM  AS sales
JOIN read_postgres(
    'postgres://demo:demo@pg.demo.glaredb.com/postgres',
    'public',
    'borough_lookup'
) AS lookup
ON sales.borough = lookup.borough_id

If you’ll be using your external Postgres data source for many queries, you may want to configure it as an external data source in GlareDB so that you don’t have to specify your credentials each time. In that case, you could set this up in GlareDB with:

CREATE EXTERNAL DATABASE IF NOT EXISTS my_postgres
FROM postgres
OPTIONS(
    connection_string = 'postgresql://user:pass@host:5432/dbname'
);

Once configured, you can incorporate the external database in your model:

SELECT
    lookup.borough_name,
    sales.neighborhood,
    sales.building_class_category,
    sales.residential_units,
    sales.commercial_units,
    sales.total_units,
    sales.land_square_feet,
    sales.gross_square_feet,
    sales.year_built,
    sales.sale_price,
    sales.sale_date,
    sales.latitude,
    sales.longitude,
    sales.bin,
    sales.bbl
FROM  sales
JOIN my_postgres.public.borough_lookup lookup
ON sales.borough = lookup.borough_id

Advanced Features

Using Pre-hooks to Configure Data Sources

You can use dbt pre-hooks to make sure that any external databases are set up before running your models:


File Upload Integration

GlareDB Cloud supports file uploads that can be referenced in your dbt models:

SELECT * FROM cloud_upload('myfile.xlsx')

COPY TO Functionality

You can use GlareDB’s COPY TO feature in post-hooks to export transformed data to other destinations aside from your GlareDB warehouse:

{{ config(
    materialized='view',
    post_hook=[
    "COPY public.my_model_name TO 's3://bucket/my_model_name.csv' FORMAT csv
    CREDENTIALS my_aws_creds ( region '<aws_region>' );"
    ]
  )
}}

Best Practices

  1. Use explicit schemas in your model references to avoid ambiguity when working with multiple data sources
  2. Leverage dbt’s documentation features to track data lineage across different sources
  3. Consider using GlareDB’s external database feature for frequently accessed data sources
  4. Use pre-hooks to ensure data sources are properly configured before model execution

Troubleshooting

Common issues and their solutions:

  • Connection Issues: Ensure your connection string is properly formatted and includes the correct organization/deployment names
  • Performance: Consider materializing intermediate models as tables for frequently accessed data

Additional Resources