DuckDB connector

Query a DuckDB database file (or an in-memory DB) directly. CSV is a thin subclass of this connector, so you get the same engine — plus DuckDB's ability to read Parquet, JSON, and remote files via its extensions.

# sources.yaml
main:
  type: duckdb
  path: data/warehouse.duckdb    # omit for an in-memory database
Key Purpose
path DuckDB file (omit for in-memory).
csv_views Optional {view: csv_path} map to attach.

Resilience: if a query invalidates the connection (a fatal DuckDB error), query() rebuilds the connection and retries once — so one bad query can't break every later query on the long-lived connection.

Querying JSON and nested data #

DuckDB reads JSON (and Parquet) straight from a path or URL — no load step. Use an in-memory connector (omit path) and do the reading inside the SQL:

# sources.yaml
main:
  type: duckdb          # no `path:` → in-memory; files are read in the query
SELECT unnest(matches) AS m
FROM read_json_auto('data/x.json', maximum_object_size=20000000)

Remote files work too — DuckDB autoloads the httpfs extension on first use:

SELECT unnest(matches) AS m
FROM read_json_auto('https://example.com/x.json', maximum_object_size=20000000)

Working with the parsed structure:

  • Flatten arrays with unnest() — one row per element.
  • Access struct fields with a dot: m.score.ft.
  • Lists are 1-indexed: m.score.ft[1] is the first element (not [0]).
  • Quote reserved words used as identifiers or aliases with double quotes: m."group", … AS "minute", … AS "type", … AS "time".

Note

A ${param} always substitutes a quoted string literal (injection-safe — see Queries). To match it against a numeric column or key, compare as text: WHERE CAST(id AS VARCHAR) = '${id}'.

End to end — flatten a nested array, then filter one record by a route ${id}:

WITH games AS (
  SELECT unnest(games) AS g
  FROM read_json_auto('data/games.json', maximum_object_size=20000000)
)
SELECT
  g."group"      AS "group",      -- reserved word → quoted identifier/alias
  g.minute       AS "minute",
  g.score.ft[1]  AS home_goals    -- struct field + 1-indexed list element
FROM games
WHERE CAST(g.id AS VARCHAR) = '${id}'
ORDER BY g.minute

Extra: none — in the core install.

Generated