Python Bindings

GlareDB can be used in Python environments along with Pandas and Polars to work with distributed data.

For more information, refer to our in-depth blog post Working with Python.

Install

The glaredb module is available on PyPI https://pypi.org/project/glaredb/. To install the module:

pip install glaredb

Usage

Embedded Connection

Once installed, GlareDB can be used inside your python scripts by importing the glaredb library, and creating a connection using connect.

import glaredb

con = glaredb.connect()
con.sql("select 1").show()

By default, connect will use an in-memory database. To persist data locally, a path may be provided to connect.

import glaredb

con = glaredb.connect("./my_db_path")
con.sql("select * from persisted_table").show()

Cloud Connection

Provide a connection string to connect to GlareDB Cloud.

import glaredb

con = glaredb.connect("glaredb://<user>:<password>@<org>.remote.glaredb.com:6443/<deployment-name>")
con.sql("select * from cloud_table").show()

Connecting to GlareDB Cloud with Python enables Hybrid execution. Queries are optimized to use both cloud and local compute resources.

Querying data

GlareDB is able to query a variety of file types as tables directly.

import glaredb
con = glaredb.connect()

# Query CSV files.
con.sql("select * from './data.csv'").show()

# Query JSON files.
con.sql("select * from './data.json'").show()

# Query Parquet files.
con.sql("select * from './data.parquet'").show()

Polars and Pandas dataframes can be queried as well.

import glaredb
con = glaredb.connect()

# Query a Pandas dataframe
import pandas as pd
pandasdf = pd.DataFrame({"fruits": ["banana", "banana", "apple", "apple", "banana"]})
con.sql("select * from pandasdf")

# Query a Polars dataframe
import polars as pl
polarsdf = pl.DataFrame({"fruits": ["banana", "banana", "apple", "apple", "banana"]})
con.sql("select * from polarsdf")

Note that as with the files, GlareDB can access these dataframes directly once they are variables in the scope of your Python environment.

Converting to a dataframe

In addition to selecting from Polars and Pandas dataframes, your GlareDB connection enables you to quickly convert results to Polars or Pandas dataframes as well.

import glaredb
con = glaredb.connect()

# Convert to a Pandas dataframe
pandas_out = con.sql("select 1").to_pandas()

# Convert to a Polars dataframe
polars_out = con.sql("select 1").to_polars()

Lazy evaluation

The sql method returns the logical plan of the query passed in as an argument. This means that the query is not executed until show() or one of the dataframe conversion methods (e.g. to_pandas()) are called.

This can be used to incrementally build up sql queries by referencing previously defined logical plans created with sql.

import glaredb
import pandas as pd

con = glaredb.connect()

df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "b": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)


intermediate = con.sql("select * from df where a > 2;")

# Note that we reference the variable 'intermediate' here.
con.sql("select * from intermediate where fruits = 'apple';").show()

# ┌───────┬────────┬───────┬────────┐
# │ a     │ fruits │ b     │ cars   │
# │ ──    │ ──     │ ──    │ ──     │
# │ Int64 │ Utf8   │ Int64 │ Utf8   │
# ╞═══════╪════════╪═══════╪════════╡
# │ 3     │ apple  │ 3     │ beetle │
# │ 4     │ apple  │ 2     │ beetle │
# └───────┴────────┴───────┴────────┘

Eager evalutation

Eagerly evaulating a query can be done via the execute method. This will execute the provided query to completion. This is useful when the result of a query doesn’t matter (e.g. when creating a table or inserting data into a table).

import glaredb
con = glaredb.connect()

# Create a table.
con.execute("create table my_table (a int)")
# Insert some data.
con.execute("insert into my_table values (1), (2)")

# Query the table we just created. Note that we're using `sql` here because we
# want to show the results.
con.sql("select * from my_table").show()

# ┌───────┐
# │ a     │
# │ ──    │
# │ Int32 │
# ╞═══════╡
# │ 1     │
# │ 2     │
# └───────┘

Example

For more examples, see https://github.com/GlareDB/glaredb/tree/main/py-glaredb/examples

The following is an example that joins a polars data frame with data from a hosted demo Postgres instance.

import glaredb
import polars as pl

# Polars DataFrame created in the script
df = pl.DataFrame(
    {
        "region": [0, 1, 2, 3, 4],
        "population": [10, 20, 30, 40, 50]
    }
)

con = glaredb.connect()

# Join the above Polars DataFrame on data from our demo Postgres instance
result = con.sql(
        """select
            t1.r_regionkey,
            t1.r_name,
            t2.Population
        from
            read_postgres('postgres://demo:demo@pg.demo.glaredb.com/postgres', 'public', 'region') as t1
        join
            df as t2 on t1.r_regionkey = t2.region;"""
).to_polars();

print(result)

The above results in:

shape: (5, 3)
┌─────────────┬───────────────────────────┬────────────┐
│ r_regionkey ┆ r_name                    ┆ population │
│ ---         ┆ ---                       ┆ ---        │
│ i32         ┆ str                       ┆ i64        │
╞═════════════╪═══════════════════════════╪════════════╡
│ 3           ┆ EUROPE                    ┆ 40         │
│ 0           ┆ AFRICA                    ┆ 10         │
│ 2           ┆ ASIA                      ┆ 30         │
│ 4           ┆ MIDDLE EAST               ┆ 50         │
│ 1           ┆ AMERICA                   ┆ 20         │
└─────────────┴───────────────────────────┴────────────┘