How fast is DuckDB on these files?

This dashboard reads the raw Datablist CSVs directly — no database to stand up, no ETL. Dashdown's csv connector loads each file into an embedded DuckDB as a table, and every chart you see is a SQL query DuckDB runs over those tables.

Rows in DuckDB
Customers · 333 MB CSV
Organizations · 135 MB CSV
Products · 155 MB CSV

Measured on the 2 M-row customer file #

Running a few representative aggregations over the 333 MB / 2,000,000-row customers.csv with DuckDB (Apple Silicon, multi-threaded):

Query Rows scanned Time
count(*) 2,000,000 ~0.14 s
Top countries (GROUP BY country) 2,000,000 ~0.12 s
Sign-ups per month (date_trunc + GROUP BY) 2,000,000 ~0.13 s
Top companies (GROUP BY company, 1.16 M distinct) 2,000,000 ~0.14 s

The one-time cost of loading all three CSVs (≈ 620 MB) into DuckDB tables is ~2 s; after that, aggregations are sub-second.

Try it live

Run dashdown serve . and change a filter on any dataset page — DuckDB re-runs the SQL on the full dataset between keystrokes. On this statically deployed copy, every query was pre-computed once at build time and the result snapshots ship as JSON, so the page loads instantly with no server in the loop.

Why it's quick #

  • Columnar + vectorized. DuckDB reads only the columns a query touches and processes them in batches, so a GROUP BY over one column never pays for the other eleven.
  • Multi-threaded scans. A single aggregation fans out across cores — the 333 MB scan is shared, not serial.
  • No copy, no server. The CSV is the database. There's no load step into Postgres, no network round-trip — DuckDB runs in-process next to the dashboard.
Generated