read_csv
Read csv files from a local filesystem or supported object stores.
The files may be compressed, for example: .csv.gz
.
URLs provided to read_csv
may also include glob characters to allow scanning multiple files when reading from S3, GCS, or a local filesystem. Globbed paths are not allowed when querying csv files over HTTP/HTTPS.
Note: read_csv
was formerly known as csv_scan
Syntax
-- Single url or path.
read_csv(<url>)
-- Multiple urls or paths.
read_csv([<url>])
-- Using a cloud credentials object.
read_csv(<url>, <credential_object>)
-- Required named argument for S3 buckets.
read_csv(<url>, <credentials_object>, region => '<aws_region>')
-- Pass S3 credentials using named arguments.
read_csv(<url>, access_key_id => '<aws_access_key_id>', secret_access_key => '<aws_secret_access_key>', region => '<aws_region>')
-- Pass GCS credentials using named arguments.
read_csv(<url>, service_account_key => '<gcp_service_account_key>')
Parameter descriptions and which object store it’s relevant to.
Parameter | Object store | Description |
---|---|---|
url | All | The URL or path to a csv file to scan. |
credential_object | S3 and GCS | A database object storing credentials for accessing the object or objects. |
aws_region | S3 | If scanning an object in S3, the region of the bucket. |
aws_access_key_id | S3 | ID of AWS access key with permissions to read from the bucket. |
aws_secret_access_key | S3 | Secret associated with the AWS access key. |
gcp_service_account_key | GCS | A JSON-encoded GCP service account key with access to the bucket. |
Usage
Local files
Local files can be read by passing the path or a globbed path to read_csv
. Paths may be absolute or relative.
-- Read a relative path.
SELECT * FROM read_csv('./my_data.csv');
-- Read all csv files in a directory.
SELECT * FROM read_csv('./directory_of_data/*.csv');
-- Read csv files from multiple directories.
SELECT * FROM read_csv('./**/*.csv');
-- Read multiple explicitly provided files.
SELECT * FROM read_csv(['./directory_of_data/1.csv', './directory_of_data/2.csv']);
Objects in GCS
Objects in GCS can be read by providing the path to the object and credentials for the object to read_csv
. The service account key provided should be JSON encoded.
-- Single object.
SELECT * FROM read_csv('gs://my-bucket/path/object.csv',
service_account_key => '<service_account_key>');
-- Scan objects matching a glob.
SELECT * FROM read_csv('gs://my-bucket/path/prefix_*.csv',
service_account_key => '<service_account_key>');
-- Read all objects in a directory.
SELECT * FROM read_csv('gs://my-bucket/path/*',
service_account_key => '<service_account_key>');
Optionally, a credentials object can be created and used in place of providing the service account key directly.
-- Create the credentials.
CREATE CREDENTIALS my_gcp_creds PROVIDER gcp
( service_account_key '<service_account_key>' );
-- And use them in the scan.
SELECT * FROM read_csv('gs://my-bucket/path/object.csv', my_gcp_creds);
Objects in S3
Objects in GCS can be read by providing the path to the object and credentials for the object, and the bucket region to read_csv
.
SELECT * FROM read_csv('gs://my-bucket/path/object.csv',
access_key_id = '<access_key_id>',
secret_access_key = '<secret_access_key>',
region = 'us-east-1');
-- Scan objects matching a glob.
SELECT * FROM read_csv('gs://my-bucket/path/prefix_*.csv',
access_key_id = '<access_key_id>',
secret_access_key = '<secret_access_key>',
region = 'us-east-1');
-- Read all objects in a directory.
SELECT * FROM read_csv('gs://my-bucket/path/prefix_*.csv',
access_key_id = '<access_key_id>',
secret_access_key = '<secret_access_key>',
region = 'us-east-1');
Optionally, a credentials object can be created and used in place of providing the service account key directly.
-- Create the credentials.
CREATE CREDENTIALS my_aws_creds PROVIDER aws
OPTIONS (
access_key_id = '<access_key_id>',
secret_access_key = '<secret_access_key>',
);
-- And use them in the scan. Note that a region still needs to be provided.
SELECT * FROM read_csv('gs://my-bucket/path/object.csv', my_aws_creds, region => 'us-east-1');
HTTP files
CSV files can also be scanned over HTTP/HTTPS.
-- Read a single file.
SELECT * FROM read_csv('https://my_domain.com/file.csv');
-- Read multiple files
SELECT * FROM read_csv(['https://my_domain.com/1.csv', 'https://my_domain.com/2.csv']);
Globbed paths are not supported when reading csv files over HTTP/HTTPS
Examples
In the following example we use COPY TO to create a csv file and finally read it with read_csv
.
-- Output the file to 'example.csv'
COPY (SELECT * FROM generate_series(1, 10)) TO 'example.csv';
-- And scan it back in.
SELECT * FROM read_csv('./example.csv');
-- Output:
-- ┌─────────────────┐
-- │ generate_series │
-- │ ── │
-- │ Int64 │
-- ╞═════════════════╡
-- │ 1 │
-- │ 2 │
-- │ 3 │
-- │ 4 │
-- │ 5 │
-- │ 6 │
-- │ 7 │
-- │ 8 │
-- │ 9 │
-- │ 10 │
-- └─────────────────┘