Table of contents
Quickstart with SQL using the GlareDB CLI
-
Install and start the GlareDB binary by running the following in a shell. When starting GlareDB, use
-f
and specify a path so that GlareDB will create a directory to persist your tables after you close the session. You can also leave this blank for an in-memory-only session.> curl https://glaredb.com/install.sh | sh > ./glaredb -f ./my_db_path
-
Start writing SQL! You can try out the below to query a Parquet file hosted on GitHub over HTTP. This file contains NYC real estate transactions from January 2022.
SELECT * FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet');
-
Download a file and query it locally. Here is a file with the next month of transaction data. After downloading it, query it using its relative or absolute path.
-- Relative path to the same directory from which you started the GlareDB CLI SELECT * FROM read_parquet('./nyc_sales-2022_02.parquet'); -- Absolute path to the file SELECT * FROM read_parquet('/Users/username/Downloads/nyc_sales-2022_02.parquet');
-
Query a remote Postgres database using the
read_postgres()
function:SELECT * FROM read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', --connection 'public', --schema name 'customer' --table name );
-
Join between the local Parquet file and the remote Postgres database to get a count of number of sales per NYC borough, and include the names of the NYC borough where the sales took place:
SELECT COUNT(sales.sale_date), lookup.borough_name FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet') sales JOIN read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', 'public', 'borough_lookup' ) lookup ON sales.borough = lookup.borough_id GROUP BY lookup.borough_name;
-
Save your data for easy access later by creating a table with the previous query:
CREATE TABLE sales_aggregate_by_borough AS SELECT COUNT(sales.sale_date), lookup.borough_name FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet') sales JOIN read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', 'public', 'borough_lookup' ) lookup ON sales.borough = lookup.borough_id GROUP BY lookup.borough_name; SELECT * FROM sales_aggregate_by_borough;
-
Or export your data to a Parquet (or other) file for easy access later by wrapping your query in a
COPY TO
statement:COPY ( SELECT COUNT(sales.sale_date), lookup.borough_name FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet') sales JOIN read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', 'public', 'borough_lookup' ) lookup ON sales.borough = lookup.borough_id GROUP BY lookup.borough_name ) TO './sales_aggregate_by_borough.parquet'; SELECT * FROM './sales_aggregate_by_borough.parquet';
Quickstart with SQL using Python in a Jupyter Notebook
-
Create and activate a new Python environment with:
> python -m venv .venv > source .venv/bin/activate
-
Install GlareDB, PyArrow, and Jupyter Notebooks with:
> pip install 'glaredb[pandas]' pyarrow jupyter
-
Open your Jupyter notebook
> jupyter notebook
-
In your notebook, import GlareDB and set up your connection. When instantiating your connection specify a path so that GlareDB will create a directory to persist your tables after you close the session. (You can also leave this blank for an in-memory-only session.)
import glaredb con = glaredb.connect('./my_db_path')
-
Start writing SQL! You can try out the below to query a Parquet file hosted on GitHub over HTTP. This file contains NYC real estate transactions from January 2022.
df = con.sql( """ SELECT * FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet'); """ ).to_pandas() df
-
Download a file and query it locally. Here is a file with the next month of transaction data. After downloading it, query it using its relative or absolute path.
# Relative path to the same directory from which you started the GlareDB CLI df = con.sql( """ SELECT * FROM read_parquet('./nyc_sales-2022_02.parquet'); """ ).to_pandas() # Absolute path to the file df = con.sql( """ SELECT * FROM read_parquet('/Users/username/Downloads/nyc_sales-2022_02.parquet'); """ ).to_pandas() df
-
Query a remote Postgres database using the
read_postgres()
function:df = con.sql( """ SELECT * FROM read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', --connection 'public', --schema name 'customer' --table name ); """ ).to_pandas() df
-
Join between the local Parquet file and the remote Postgres database to get a count of number of sales per NYC borough, and include the names of the NYC borough where the sales took place:
df = con.sql( """ SELECT COUNT(sales.sale_date), lookup.borough_name FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet') sales JOIN read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', 'public', 'borough_lookup' ) lookup ON sales.borough = lookup.borough_id GROUP BY lookup.borough_name; ); """ ).to_pandas() df
-
Save your data for easy access later by creating a table with the previous query:
con.sql( """ CREATE TABLE sales_aggregate_by_borough AS SELECT COUNT(sales.sale_date), lookup.borough_name FROM read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet') sales JOIN read_postgres( 'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres', 'public', 'borough_lookup' ) lookup ON sales.borough = lookup.borough_id GROUP BY lookup.borough_name; """ ) df = con.sql( """ SELECT * FROM sales_aggregate_by_borough; """ ).to_pandas() df
-
Or export your data to a Parquet (or other) file for easy access later by wrapping your query in a
COPY TO
statement:
con.sql(
"""
COPY (
SELECT
COUNT(sales.sale_date),
lookup.borough_name
FROM
read_parquet('https://github.com/GlareDB/tutorial_data/raw/main/quickstart_data/nyc_sales-2022_01.parquet') sales
JOIN
read_postgres(
'postgresql://demo:demo@pg.demo.glaredb.com:5432/postgres',
'public',
'borough_lookup'
) lookup
ON sales.borough = lookup.borough_id
GROUP BY lookup.borough_name
) TO './sales_aggregate_by_borough.parquet';
"""
)
df = con.sql(
"""
SELECT *
FROM './sales_aggregate_by_borough.parquet';
"""
).to_pandas()
df
Next Steps
That’s it! Up next you can:
- Read more about GlareDB Cloud or sign up
- Check out the different data sources GlareDB supports
- Check out the GlareDB YouTube channel for more step-by-step examples
- Star us on GitHub
- Open a GitHub Issue for bug reports or feature requests
- Join our Discord if you have any questions or want to chat