So you self-hosted DuckDB.
Now what?

Mehdi Ouazza · MotherDuck · mehdio.com

DuckCon 7 · Amsterdam · 24 June 2026

Grab the slides and the ducks 👆

Live demo

one prompt

# paste to Claude (MotherDuck MCP) Using the MotherDuck MCP, load and inspect this public dataset: s3://noaa-ghcn-pds/parquet/by_year/ Figure out what's in it, then help me understand how temperature has evolved over the years in the EU. Create a Flight to ingest/update the data, then a Dive to visualize the results.

Who am I?

Husband and father of two.
12y+ in data as a data / platform engineer.
From on-prem Spark clusters at AXA to cloud data platforms at Klarna, Back Market, and Trade Republic.
Joined MotherDuck in 2023 as its first DevRel. If you typed duckdb on YouTube, you'll see my face. Sorry.
All my content → mehdio.com

Everybody is at a different stage
of their DuckDB journey.

"I ran Quack and DuckLake in prod before 1.0."

"It said the database is locked… does someone have the password?"

Where are you on the road?

More concurrent writers push you right, more users push you up. Two axes decide your architecture.

more users / reads ↑

vanilla wall

single-node .duckdb

partition per slice

Quack server

DuckLake + Postgres

a database company

more concurrent writers →

1 writermany writers

single node handles it crosses the wall: protocol or catalog you run the whole database

Where the formula tips

Self-host cost = your rate × time building and maintaining.
Buy cost = managed price + value lost to its limits.

self-host: your time, forever managed: a price, plus its limits

Watch for this icon: a box you build and/or maintain.

00

The honeymoon

Laptop · the honeymoon · full 810M-row scan · lower is better

laptop · DuckDB

M5 Max · 64 GB

Q1 · full scan SELECT … FROM lineitem

▸ click for query + dataset

local SSD

.duckdb filenative format

4.8 s

parquet fileraw, same SSD

8.5 s

10 ms point reads · full 810M-row scan in ~4.8 s — native .duckdb is ~1.8× faster than parquet.

-- TPC-H Q1 · full 810M-row lineitem scan + grouped aggregation
SELECT l_returnflag, l_linestatus,
       SUM(l_quantity)      AS qty,
       SUM(l_extendedprice) AS revenue,
       AVG(l_discount)      AS avg_disc,
       COUNT(*)             AS cnt
FROM   lineitem    -- native .duckdb table  |  or read_parquet('…')
WHERE  l_shipdate <= DATE '1998-09-02'   -- ~98% of rows: real full scan
GROUP BY 1, 2 ORDER BY 1, 2;

TPC-H SF135 810M-row lineitem parquet 20 GB native .duckdb 23 GB M5 Max · 64 GB DuckDB 1.5.3 warm · 18 threads

01

Move the data
to the cloud

Schema · keep compute local

your laptop

DuckDB

reads

S3 bucket

your data

gates

Auth / IAM

who reads what

connect

teammates

DuckDB clients

One bucket, then an auth box in front of it. Two pieces to operate already.

02

Move the compute
to the cloud

Schema · offload compute → an interface + a second auth

clients

users + agents

connect

Interface

SDK · DuckDB ext · WASM · REST · MCP

auth → compute

server ×N

your compute

auth → storage

Auth / IAM

reach the data

reads

S3

your data

Benchmark · storage tier · 21 GB · lower is better

EC2 · DuckDB

m6id.2xlarge · 8 vCPU · 32 GB

same querySELECT … FROM lineitem

local NVMe

instance store

EFS

S3

local NVMe

~10s

EFScold

70s

S3every query

105s

Read straight from S3 and it doesn't warm up on its own — ~105 s every run. You cache the hot set locally, or it stays slow.

Benchmark · compute · same 21 GB · lower is better

coordinator

python · boto3 · reduce

fan out 32 shardspart=0 … part=31

λ × 32

1 shard ~690 MB → partial SUM/COUNT

EC2 m6id8 vCPU / 32 GB · NVMe

~10s

1 big Lambdamax 10 GB RAM · ~6 vCPU · 1 worker

~100s

32 λ fan-outwarm · 32 × ~6 vCPU

5.7s

03

Scale up,
or out

scale up ↑

↑

one bigger box + k8s. users fight over it.

or

scale out →

→

×N

many small workers + a coordinator to run.

Either way: another block to operate.

Watch out for the k8s iceberg

04

Many writers

Cross the wall · a protocol · 8 writers · higher is better

client

8 DuckDB procs

quack://

Auth · proxy

TLS + tokens
(web server)

serve

Quack server

1 DuckDB · write lock

m6id · 8 vCPU/32 GB

writes

storage

.duckdb on NVMe

vanilla DuckDB2nd writer

× locked

Quack · 8 writers16k inserts, 0 conflicts

6.2M rows/s

One server serializes every write. You moved the wall — but Quack's token isn't real auth: you run a reverse proxy in front for TLS + security.

Cross the wall · a catalog · 8 writers · higher is better

EC2 box · 8 writers

m6id · 8 vCPU/32 GB

auth

Auth / IAM

catalog + S3 creds

commit

Postgres catalog

RDS · arbitrates ACID

S3

Parquet data

vanilla DuckDB2nd writer

× locked

DuckLake · 8 writersbatched, 0 conflicts

2M rows · 1M/s

A shared catalog gives you ACID multi-writer. What you now run: a Postgres catalog, the auth to it, and batch + retry on writes.

DuckLake works — so which storage? · lower is better

1 box

m6id.2xlarge · 8 vCPU · 32 GB

same querySELECT … FROM lineitem

.duckdb

local NVMe

.duckdb

on S3

Parquet

on S3

DuckLake

PG + S3

.duckdb on NVMeinternal + local

5.1s · 1×

.duckdb on S3internal, remote

88s

Parquet on S3raw, reference

105s

DuckLake on S3parquet + catalog

103s

Format buys you ~15%. Location buys you ~17×.

05

Many readers

Benchmark · 128 readers · p99 · lower is better

self-host box

m6id · 8 vCPU / 32 GB

vs

MotherDuck

scales out

self-host box184 q/s, capped

833 ms

MotherDuck463 q/s, scaling

307 ms

p99 = the slowest 1% of queries — the tail your users actually feel.

One box is shared CPU — it caps. MotherDuck gives each user isolated compute with hypertenancy.

Bonus · many readers · push the compute into the browser

browser

DuckDB-WASM

browser

× N readers — compute on each laptop

range reads

S3

parquet

hot / small datain the browser

instant

21 GB full scanin the browser

× tab OOM

offload to clouddual execution

MotherDuck runs it

Browser tab caps near ~4 GB (32-bit WASM) — past that, offload to the cloud.

And once people depend on you…

Interface

how users + agents reach it

a UI or API: run queries, deploy jobs, expose services
custom UI, client library, or a DuckDB extension
REST · API keys · MCP for agents

Observability

what's it doing

logs on query usage
dashboards you hand back to your users

Reliability

don't lose the data

snapshots: transient storage made durable
backups, point-in-time recovery, failover

You quietly built a database company

Storage

Auth

Access

Interface

Catalog

Scale

Observability

Reliability

MotherDuck

The same boxes, run for you

Clients

UI / SDKs / CLI

+ DuckDB-WASM (in-browser)

Postgres endpoint

MCP server

↔Dual
Execution

MotherDuck

Governance

auth · access · observability

Ducklings

serverless compute · scale

Data platform

Dives

notebooks + viz

Flights

python pipelines

Catalog

transactions · ACID

Storage + DuckLake

durable · backups · reliability

↔Query &
ingest

External

S3 / GCS / Azure

3rd-party integrations

BI · orchestration · ingestion

Databases

DuckLake BYOB

Now — get out there and build.

Slides (real one)

Thanks.

mehdio.com · motherduck.com

Go run your own demo.

I don't always commit, but when I do, I break the build

So you self-hosted DuckDB.Now what?

Everybody is at a different stageof their DuckDB journey.

Where are you on the road?

Where the formula tips

The honeymoon

Move the datato the cloud

Move the computeto the cloud

Scale up,or out

Many writers

Many readers

And once people depend on you…

Interface

Observability

Reliability

The same boxes, run for you

Now — get out there and build.

So you self-hosted DuckDB.
Now what?

Everybody is at a different stage
of their DuckDB journey.

Move the data
to the cloud

Move the compute
to the cloud

Scale up,
or out