Working with your data
After setting up some data sources, it’s time to begin querying your data. GlareDB is designed to run analytical queries that access data from multiple sources, and writing SQL queries that reference multiple sources is easy.
- Referencing data sources in queries
- Querying multiple data sources
- Creating views referencing data sources
Referencing data sources in queries
Data sources are either external tables (see CREATE EXTERNAL TABLE) or external database (see CREATE EXTERNAL DATABASE). Getting a list of data sources connected to your deployment can be done with the following SQL query.
select datasource, name, object_type
from glare_catalog.external_datasources;
datasource | name | object_type
------------+-------+-------------
postgres | users | table
bigquery | bq | database
mongo | atlas | database
postgres | qa | database
(4 rows)
The object_type
column indicates whether the data source is an external table or an external database. In this example, there’s three external databases and one external table.
When a data source is an external table, it may be queried with just its name. For example, querying the data source named users
:
select * from users;
When a data source is an external database, a fully qualified name must be used to reference the table of interest. For example, querying an account_signup
table inside a events
schema in the bq
data source:
select * from bq.events.account_signup;
Querying multiple data sources
The real power of GlareDB is being able to work with multiple data sources in a single query. Let’s work through an example using a BigQuery and Postgres data source.
select datasource, name, object_type
from glare_catalog.external_datasources;
datasource | name | object_type
------------+-------+-------------
bigquery | bq | database
postgres | prod | database
(2 rows)
Internally at GlareDB, we want to be able be able to answer the question “How many GlareDB deployments were created per organization after the organization has upgraded their billing plan?”
Our production Postgres holds information about deployments and organizations, and our BigQuery instance stores all events about when things happened. So to answer this question, we can join a table from our BigQuery data source onto tables inside our Postgres data source:
select count(*) as num_deployments, o.id
from bq.events.billing_plan_upgraded as b
inner join prod.public.organizations as o on b.org_id = o.id
inner join prod.public.deployments as d on o.id = d.org_id
where d.created_at > b.timestamp
group by o.id
Notice we’re able reference data sources everywhere you would normally reference tables in a typically SQL database.
Creating views referencing data sources
Any valid SELECT query can be used when defining a view. This enables us to create a view that can reference any number of data sources.
The query defined in the previous section can be turned into a view with no changes:
create view num_deployments_created_after_upgrade(num_deployments, org_id) as (
select count(*), o.id
from bq.events.billing_plan_upgraded as b
inner join prod.public.organizations as o on b.org_id = o.id
inner join prod.public.deployments as d on o.id = d.org_id
where d.created_at > b.timestamp
group by o.id)
And now we’re able to query the view without having to specify individual data sources to query.
select * from num_deployments_created_after_upgrade;
Using views to define queries that reference multiple data sources is a powerful way of modeling data for use in upstream clients and applications. A single view could be used for generating a report in Tableau and also be accessible in psql
or other tools.
See CREATE VIEW for the reference doc on views.