So you self-hosted DuckDB.
Now what?

Mehdi Ouazza · MotherDuck · mehdio.com

DuckCon 7 · Amsterdam · 24 June 2026

QR codeGrab the slides and the ducks 👆

Live demo

one prompt
# paste to Claude (MotherDuck MCP) Using the MotherDuck MCP, load and inspect this public dataset: s3://noaa-ghcn-pds/parquet/by_year/ Figure out what's in it, then help me understand how temperature has evolved over the years in the EU. Create a Flight to ingest/update the data, then a Dive to visualize the results.
ClaudeMotherDuck

Who am I?

Mehdi Ouazza
  • Husband and father of two.
  • 12y+ in data as a data / platform engineer.
  • From on-prem Spark clusters at AXA to cloud data platforms at Klarna, Back Market, and Trade Republic.
  • Joined MotherDuck in 2023 as its first DevRel. If you typed duckdb on YouTube, you'll see my face. Sorry.
  • All my content → mehdio.com

Everybody is at a different stage
of their DuckDB journey.

Buff dog vs sad dog meme
"I ran Quack and DuckLake in prod before 1.0."
"It said the database is locked… does someone have the password?"

Where are you on the road?

More concurrent writers push you right, more users push you up. Two axes decide your architecture.

more users / reads ↑
vanilla wall
single-node .duckdb
partition per slice
Quack server
DuckLake + Postgres
a database company
more concurrent writers →
1 writermany writers
single node handles it crosses the wall: protocol or catalog you run the whole database

Where the formula tips

Self-host cost = your rate × time building and maintaining.
Buy cost = managed price + value lost to its limits.

break-even BUILD BUY self-host managed + limits total cost: your time + money ↑ day 1 · small years in · at scale how long & how big you operate it →
self-host: your time, forever managed: a price, plus its limits
Watch for this icon: a box you build and/or maintain.

00

The honeymoon

Laptop · the honeymoon · full 810M-row scan · lower is better

laptop · DuckDB
M5 Max · 64 GB
Q1 · full scan SELECT … FROM lineitem
▸ click for query + dataset
local SSD
.duckdb filenative format
4.8 s
parquet fileraw, same SSD
8.5 s

10 ms point reads · full 810M-row scan in ~4.8 s — native .duckdb is ~1.8× faster than parquet.

-- TPC-H Q1 · full 810M-row lineitem scan + grouped aggregation
SELECT l_returnflag, l_linestatus,
       SUM(l_quantity)      AS qty,
       SUM(l_extendedprice) AS revenue,
       AVG(l_discount)      AS avg_disc,
       COUNT(*)             AS cnt
FROM   lineitem    -- native .duckdb table  |  or read_parquet('…')
WHERE  l_shipdate <= DATE '1998-09-02'   -- ~98% of rows: real full scan
GROUP BY 1, 2 ORDER BY 1, 2;
TPC-H SF135 810M-row lineitem parquet 20 GB native .duckdb 23 GB M5 Max · 64 GB DuckDB 1.5.3 warm · 18 threads

01

Move the data
to the cloud

Schema · keep compute local

your laptop
DuckDB
reads
S3 bucket
your data
gates
Auth / IAM
who reads what
connect
teammates
DuckDB clients

One bucket, then an auth box in front of it. Two pieces to operate already.

02

Move the compute
to the cloud

Schema · offload compute → an interface + a second auth

clients
users + agents
connect
Interface
SDK · DuckDB ext · WASM · REST · MCP
auth → compute
server ×N
your compute
auth → storage
Auth / IAM
reach the data
reads
S3
your data

Benchmark · storage tier · 21 GB · lower is better

EC2 · DuckDB
m6id.2xlarge · 8 vCPU · 32 GB
same querySELECT … FROM lineitem
local NVMe
instance store
EFS
S3
local NVMe
~10s
EFScold
70s
S3every query
105s

Read straight from S3 and it doesn't warm up on its own — ~105 s every run. You cache the hot set locally, or it stays slow.

Benchmark · compute · same 21 GB · lower is better

coordinator
python · boto3 · reduce
fan out 32 shardspart=0 … part=31
λ × 32
1 shard ~690 MB → partial SUM/COUNT
EC2 m6id8 vCPU / 32 GB · NVMe
~10s
1 big Lambdamax 10 GB RAM · ~6 vCPU · 1 worker
~100s
32 λ fan-outwarm · 32 × ~6 vCPU
5.7s

03

Scale up,
or out

scale up ↑

one bigger box + k8s. users fight over it.

or

scale out →

×N

many small workers + a coordinator to run.

Either way: another block to operate.

Watch out for the k8s iceberg

k8s iceberg meme

04

Many writers

Cross the wall · a protocol · 8 writers · higher is better

client
client
client
8 DuckDB procs
quack://
Auth · proxy
TLS + tokens
(web server)
serve
Quack server
1 DuckDB · write lock
m6id · 8 vCPU/32 GB
writes
storage
.duckdb on NVMe
vanilla DuckDB2nd writer
× locked
Quack · 8 writers16k inserts, 0 conflicts
6.2M rows/s

One server serializes every write. You moved the wall — but Quack's token isn't real auth: you run a reverse proxy in front for TLS + security.

Cross the wall · a catalog · 8 writers · higher is better

EC2 box · 8 writers
m6id · 8 vCPU/32 GB
auth
Auth / IAM
catalog + S3 creds
commit
Postgres catalog
RDS · arbitrates ACID
S3
Parquet data
vanilla DuckDB2nd writer
× locked
DuckLake · 8 writersbatched, 0 conflicts
2M rows · 1M/s

A shared catalog gives you ACID multi-writer. What you now run: a Postgres catalog, the auth to it, and batch + retry on writes.

DuckLake works — so which storage? · lower is better

1 box
m6id.2xlarge · 8 vCPU · 32 GB
same querySELECT … FROM lineitem
.duckdb
local NVMe
.duckdb
on S3
Parquet
on S3
DuckLake
PG + S3
.duckdb on NVMeinternal + local
5.1s · 1×
.duckdb on S3internal, remote
88s
Parquet on S3raw, reference
105s
DuckLake on S3parquet + catalog
103s

Format buys you ~15%. Location buys you ~17×.

05

Many readers

Benchmark · 128 readers · p99 · lower is better

self-host box
m6id · 8 vCPU / 32 GB
vs
MotherDuck
scales out
self-host box184 q/s, capped
833 ms
MotherDuck463 q/s, scaling
307 ms

p99 = the slowest 1% of queries — the tail your users actually feel.

One box is shared CPU — it caps. MotherDuck gives each user isolated compute with hypertenancy.

Bonus · many readers · push the compute into the browser

browser
DuckDB-WASM
browser
× N readers — compute on each laptop
range reads
S3
parquet
hot / small datain the browser
instant
21 GB full scanin the browser
× tab OOM
offload to clouddual execution
MotherDuck runs it

Browser tab caps near ~4 GB (32-bit WASM) — past that, offload to the cloud.

And once people depend on you

Interface

how users + agents reach it
  • a UI or API: run queries, deploy jobs, expose services
  • custom UI, client library, or a DuckDB extension
  • REST · API keys · MCP for agents

Observability

what's it doing
  • logs on query usage
  • dashboards you hand back to your users

Reliability

don't lose the data
  • snapshots: transient storage made durable
  • backups, point-in-time recovery, failover

You quietly built a database company

Storage
Auth
Access
Interface
Catalog
Scale
Observability
Reliability

MotherDuck

The same boxes, run for you

Clients
UI / SDKs / CLI
+ DuckDB-WASM (in-browser)
Postgres endpoint
MCP server
Dual
Execution
MotherDuck
Governance
auth · access · observability
Ducklings
serverless compute · scale
Data platform
Dives
notebooks + viz
Flights
python pipelines
Catalog
transactions · ACID
Storage + DuckLake
durable · backups · reliability
Query &
ingest
External
S3 / GCS / Azure
3rd-party integrations
BI · orchestration · ingestion
Databases
DuckLake BYOB

Now — get out there and build.

Slides (real one)
QR code to these slides
Thanks.

Go run your own demo.

I don't always commit, but when I do, I break the build
build and/or maintain