JouleDB schema design
The JouleDB physical store is a hybrid: each table sits in row format AND in column format simultaneously, and the planner picks per query. Most schema-design choices that work for Postgres work here; some choices unlock the columnar path that would otherwise stay dormant.
The dual format, briefly
Each table is partitioned in row chunks (~64 MB) and column chunks (Arrow-IPC, same physical files). Writes go to a row-format write-ahead log and flush to both formats. The query planner reads EXPLAIN-equivalent statistics — cardinality, predicate selectivity, projection list — and picks the format whose plan costs fewer joules.
You can see which path was taken in the execution_path field of EXPLAIN ANALYZE: row-scan, columnar-scan, or hybrid (joins between).
Column-friendly table design
To unlock the columnar path for analytical queries:
- Wide tables OK. Row stores penalize wide schemas because each row read pulls the whole row. The columnar path only reads referenced columns, so width is free.
- Use specific types.
INT4,INT8,DATE,TIMESTAMP,NUMERIC(p, s)all compress hard column-wise.JSONBdoesn't — if a JSON field is queried often, extract its hot keys into typed columns. - Order by a clustering key. Inserts batched by time (
created_at) or by tenant (tenant_id) cluster the columnar chunks; predicate pushdown can skip whole chunks. Set the clustering key viaCLUSTER BY:
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
tenant_id INT4 NOT NULL,
event_type TEXT NOT NULL,
occurred_at TIMESTAMP NOT NULL,
payload JSONB
) CLUSTER BY (tenant_id, occurred_at);
Vector columns (pgvector)
JouleDB ships pgvector-compatible. Use VECTOR(n) as the type. Index with ivfflat for small-to-medium sets, hnsw for large + high-recall:
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
document_uri TEXT,
embedding VECTOR(1536)
);
CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
For 1M+ vectors with sub-100ms recall, the hnsw indexes have ~10× lower joule cost per query than ivfflat, at higher index-build cost.
Indexing strategy
Three indexes are usually enough for analytic+OLTP hybrid tables:
- Primary key (you get this automatically).
- One predicate-friendly index for hot OLTP lookups.
- A composite index for the dominant analytical predicate, e.g.
(tenant_id, occurred_at).
JouleDB chunks already store min/max statistics per column per chunk, so adding more indexes often costs more in storage joules than it saves in query joules. Use jc db slowlog --by energy --top 50 to see what's actually expensive before adding indexes.
Partitioning
For tables > 100 GB, declarative range partitioning is essential. Standard Postgres syntax:
CREATE TABLE events (...) PARTITION BY RANGE (occurred_at);
CREATE TABLE events_2026q1 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026q2 PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
Old partitions can move to cold storage tier with a one-line policy:
ALTER TABLE events_2025q1 SET ACCESS METHOD jouledb_cold;
Cold-tier reads are ~3× more energy per byte but storage is 1/6 the cost; the math depends on read frequency.
Migrations
Standard Postgres migration tooling works: dbmate, Atlas, Flyway, Goose, sqlx-cli, prisma migrate. JouleDB-specific extensions (CLUSTER BY, jouledb_cold) are parsed as no-ops by Postgres-only tools; either run them in a separate migration step or use a tool that respects ALTER directives it doesn't recognize.
Energy-aware query patterns
- Project sparingly.
SELECT *on a wide columnar table reads every chunk;SELECT a, breads two. - Push down predicates. The planner uses chunk min/max for
WHEREpushdown.WHERE date > '2026-06-01'on a clustered table can skip 95% of chunks. - Avoid pre-aggregating with materialized views unless the underlying scan is genuinely expensive. The columnar path is fast enough for most "pre-aggregated" use cases.
- Use prepared statements. Plan caching saves 0.001 J per call, which adds up across millions of calls.
Limits to be aware of
- Triggers are supported but slow the columnar path; the planner falls back to row-format if a trigger would fire.
- Foreign keys are enforced at the row layer; the columnar layer trusts row-level constraint state.
- Logical replication is row-format only. Use
jc db snapshotfor cross-region columnar replication.
Reading the slowlog
jc db slowlog main --by energy --top 50 --since 7d
# qid total_J p99_ms exec_path normalised_sql
# qhash_a82c… 412.31 223 columnar-scan SELECT region, sum(revenue) ...
# qhash_b91d… 287.05 18 row-lookup SELECT * FROM users WHERE id=$1
# ...
The first query is doing 412 J of work over 7 days — that's the one to think about. The second is millions of cheap lookups; not the place to optimize.