JouleDB schema design

The JouleDB physical store is a hybrid: each table sits in row format AND in column format simultaneously, and the planner picks per query. Most schema-design choices that work for Postgres work here; some choices unlock the columnar path that would otherwise stay dormant.

The dual format, briefly

Each table is partitioned in row chunks (~64 MB) and column chunks (Arrow-IPC, same physical files). Writes go to a row-format write-ahead log and flush to both formats. The query planner reads EXPLAIN-equivalent statistics — cardinality, predicate selectivity, projection list — and picks the format whose plan costs fewer joules.

You can see which path was taken in the execution_path field of EXPLAIN ANALYZE: row-scan, columnar-scan, or hybrid (joins between).

Column-friendly table design

To unlock the columnar path for analytical queries:

CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    tenant_id INT4 NOT NULL,
    event_type TEXT NOT NULL,
    occurred_at TIMESTAMP NOT NULL,
    payload JSONB
) CLUSTER BY (tenant_id, occurred_at);

Vector columns (pgvector)

JouleDB ships pgvector-compatible. Use VECTOR(n) as the type. Index with ivfflat for small-to-medium sets, hnsw for large + high-recall:

CREATE TABLE chunks (
    id BIGSERIAL PRIMARY KEY,
    document_uri TEXT,
    embedding VECTOR(1536)
);

CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

For 1M+ vectors with sub-100ms recall, the hnsw indexes have ~10× lower joule cost per query than ivfflat, at higher index-build cost.

Indexing strategy

Three indexes are usually enough for analytic+OLTP hybrid tables:

  1. Primary key (you get this automatically).
  2. One predicate-friendly index for hot OLTP lookups.
  3. A composite index for the dominant analytical predicate, e.g. (tenant_id, occurred_at).

JouleDB chunks already store min/max statistics per column per chunk, so adding more indexes often costs more in storage joules than it saves in query joules. Use jc db slowlog --by energy --top 50 to see what's actually expensive before adding indexes.

Partitioning

For tables > 100 GB, declarative range partitioning is essential. Standard Postgres syntax:

CREATE TABLE events (...) PARTITION BY RANGE (occurred_at);
CREATE TABLE events_2026q1 PARTITION OF events
    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026q2 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

Old partitions can move to cold storage tier with a one-line policy:

ALTER TABLE events_2025q1 SET ACCESS METHOD jouledb_cold;

Cold-tier reads are ~3× more energy per byte but storage is 1/6 the cost; the math depends on read frequency.

Migrations

Standard Postgres migration tooling works: dbmate, Atlas, Flyway, Goose, sqlx-cli, prisma migrate. JouleDB-specific extensions (CLUSTER BY, jouledb_cold) are parsed as no-ops by Postgres-only tools; either run them in a separate migration step or use a tool that respects ALTER directives it doesn't recognize.

Energy-aware query patterns

Limits to be aware of

Reading the slowlog

jc db slowlog main --by energy --top 50 --since 7d

# qid             total_J   p99_ms   exec_path     normalised_sql
# qhash_a82c…     412.31    223      columnar-scan SELECT region, sum(revenue) ...
# qhash_b91d…     287.05    18       row-lookup    SELECT * FROM users WHERE id=$1
# ...

The first query is doing 412 J of work over 7 days — that's the one to think about. The second is millions of cheap lookups; not the place to optimize.