DuckDB is an in-process SQL OLAP database management system, and has support for querying data in CSV, JSON and Parquet formats from an AWS S3-compatible blob storage. This means you can query data stored in AWS S3, Google Cloud Storage, or Cloudflare R2. You can also use theDocumentation Index
Fetch the complete documentation index at: https://cubed3-docs-cub-2416-update-semantic-snowflake-semantic-vie.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
CUBEJS_DB_DUCKDB_DATABASE_PATH environment variable to
connect to a local DuckDB database.
Prerequisites
- A set of IAM credentials which allow access to the S3-compatible data source. Credentials are only required for private S3 buckets.
- The region of the bucket
- The name of a bucket to query data from
Setup
Manual
Add the following to a.env file in your Cube project:
Cube Cloud
In Cube Cloud, selectΒ DuckDB when creating a new deployment and fill in the required fields:If you are not using MotherDuck, leave theΒ MotherDuck Token
field blank.
You can also explore how DuckDB works with Cube if you create a demo
deployment in Cube Cloud.
Environment Variables
| Environment Variable | Description | Possible Values | Required |
|---|---|---|---|
CUBEJS_DB_DUCKDB_MEMORY_LIMIT | The maximum memory limit for DuckDB. Equivalent to SET memory_limit=<MEMORY_LIMIT>. Default is 75% of available RAM | A valid memory limit | β |
CUBEJS_DB_DUCKDB_SCHEMA | The default search schema | A valid schema name | β |
CUBEJS_DB_DUCKDB_MOTHERDUCK_TOKEN | The service token to use for connections to MotherDuck | A valid MotherDuck service token | β |
CUBEJS_DB_DUCKDB_DATABASE_PATH | The database filepath to use for connection to a local database. | A valid duckdb database file path | β |
CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID | The Access Key ID to use for database connections | A valid Access Key ID | β |
CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY | The Secret Access Key to use for database connections | A valid Secret Access Key | β |
CUBEJS_DB_DUCKDB_S3_ENDPOINT | The S3 endpoint | A valid S3 endpoint | β |
CUBEJS_DB_DUCKDB_S3_REGION | The region of the bucket | A valid AWS region | β |
CUBEJS_DB_DUCKDB_S3_USE_SSL | Use SSL for connection | A boolean | β |
CUBEJS_DB_DUCKDB_S3_URL_STYLE | To choose the S3 URL style(vhost or path) | vhost or path | β |
CUBEJS_DB_DUCKDB_S3_SESSION_TOKEN | The token for the S3 session | A valid Session Token | β |
CUBEJS_DB_DUCKDB_EXTENSIONS | A comma-separated list of DuckDB extensions to install and load | A comma-separated list of DuckDB extensions | β |
CUBEJS_DB_DUCKDB_COMMUNITY_EXTENSIONS | A comma-separated list of DuckDB community extensions to install and load | A comma-separated list of DuckDB community extensions | β |
CUBEJS_DB_DUCKDB_S3_USE_CREDENTIAL_CHAIN | A flag to use credentials chain for secrets for S3 connections | true, false. Defaults to false | β |
CUBEJS_CONCURRENCY | The number of concurrent queries to the data source | A valid number | β |
Pre-Aggregation Feature Support
count_distinct_approx
Measures of typecount_distinct_approx can
be used in pre-aggregations when using DuckDB as a source database. To learn
more about DuckDBβs support for approximate aggregate functions, click
here.
Pre-Aggregation Build Strategies
To learn more about pre-aggregation build strategies, head
here.
| Feature | Works with read-only mode? | Is default? |
|---|---|---|
| Batching | β | β |
| Export Bucket | - | - |