Tutorial: Build a RAG app

Retrieval-augmented generation over a corpus of PDFs. Joule Cloud Object Store holds the docs, JouleDB stores the embeddings + does vector search, Inference handles the questions. End-to-end: ~120 lines of Python, three deploy steps.

Architecture

Upload PDFs to jc://my-rag-corpus/
Index step: a Function triggers on bucket upload, chunks the PDF, generates embeddings via POST /v1/embeddings, writes to JouleDB
Query step: a Function receives a question, embeds it, finds the top-K chunks via JouleDB vector index, calls POST /v1/chat/completions with the retrieved context

Step 1 — set up the bucket and database

jc storage create my-rag-corpus --region eu-fi
jc db create rag --region eu-fi --size 50GB

# enable pgvector on JouleDB
psql "postgresql://jc_…@db.greenjoules.cloud:5432/rag" << 'EOF'
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks (
  id BIGSERIAL PRIMARY KEY,
  document_uri TEXT NOT NULL,
  ordinal INT NOT NULL,
  content TEXT NOT NULL,
  embedding VECTOR(1536) NOT NULL
);

CREATE INDEX chunks_embedding_idx ON chunks USING ivfflat (embedding vector_cosine_ops);
EOF

Step 2 — index Function (trigger on bucket upload)

# index.py
import os, io
import pypdf
from openai import OpenAI
import psycopg

client = OpenAI(
    base_url="https://api.greenjoules.cloud/v1",
    api_key=os.environ["JC_API_KEY"],
)
DB = os.environ["DATABASE_URL"]

def chunk(text, n=800):
    return [text[i:i+n] for i in range(0, len(text), n)]

def handler(req):
    """Triggered when an object lands in my-rag-corpus. The trigger payload has:
       { bucket, key, presigned_url } """
    payload = req.json()
    blob = client._client.get(payload["presigned_url"]).content
    reader = pypdf.PdfReader(io.BytesIO(blob))
    text = "\n".join(p.extract_text() or "" for p in reader.pages)

    chunks = chunk(text)
    embeds = client.embeddings.create(
        model="text-embedding-3-small", input=chunks,
    ).data

    with psycopg.connect(DB) as con, con.cursor() as cur:
        for i, (c, e) in enumerate(zip(chunks, embeds)):
            cur.execute(
                "INSERT INTO chunks (document_uri, ordinal, content, embedding) "
                "VALUES (%s, %s, %s, %s)",
                (f"jc://{payload['bucket']}/{payload['key']}", i, c, e.embedding),
            )
        con.commit()
    return {"indexed": len(chunks)}

# deploy as a Function triggered by the bucket
invisible fn deploy index.py \
  --route /index \
  --runtime python-3.13 \
  --memory 2GB \
  --on bucket/my-rag-corpus/object-created \
  --env DATABASE_URL=$(jc db url rag)

Step 3 — ask Function (HTTP)

# ask.py
import os, json
from openai import OpenAI
import psycopg

client = OpenAI(
    base_url="https://api.greenjoules.cloud/v1",
    api_key=os.environ["JC_API_KEY"],
)
DB = os.environ["DATABASE_URL"]

def handler(req):
    question = req.json()["question"]
    q_embed = client.embeddings.create(
        model="text-embedding-3-small", input=question,
    ).data[0].embedding

    with psycopg.connect(DB) as con, con.cursor() as cur:
        cur.execute(
            "SELECT content, document_uri FROM chunks "
            "ORDER BY embedding <=> %s::vector LIMIT 6",
            (q_embed,),
        )
        ctx = cur.fetchall()

    prompt = ("Answer the question using ONLY the provided context. Cite sources.\n\n"
              "Context:\n" + "\n---\n".join(c[0] for c in ctx)
              + f"\n\nQuestion: {question}\n\nAnswer:")

    r = client.chat.completions.create(
        model="auto",
        messages=[{"role": "user", "content": prompt}],
    )
    return {
        "answer": r.choices[0].message.content,
        "sources": list({c[1] for c in ctx}),
        "joules_used": r.response.headers.get("x-energy-joules"),
    }

invisible fn deploy ask.py \
  --route /ask \
  --runtime python-3.13 \
  --memory 1GB \
  --env DATABASE_URL=$(jc db url rag)

Step 4 — drop some PDFs in

jc storage cp ./annual-report-2025.pdf jc://my-rag-corpus/
jc storage cp ./annual-report-2024.pdf jc://my-rag-corpus/

The index Function fires automatically, embeds + writes the chunks. Look at jc fn logs index for progress.

Step 5 — query it

curl https://<account>.fn.greenjoules.cloud/ask \
  -H "content-type: application/json" \
  -d '{"question": "What was the operating margin in Q4 2024?"}'

# {
#   "answer": "...",
#   "sources": ["jc://my-rag-corpus/annual-report-2024.pdf"],
#   "joules_used": "0.47"
# }

The bill

For a 100-document corpus and 1000 queries/day, this stack typically costs $4-12/mo:

Object Store: ~0.0X J/GB-second · tiny for < 10 GB of PDFs
Embedding indexing: one-time joules per document, archived
JouleDB vector reads: ~0.001 J / similarity search
Inference: ~0.5 J / answer at the typical model auto-routed

Every line above is itemized on the receipt. Compare to your current pipeline.

Production hardening

Add a X-Customer-Tag header to each ask call to attribute energy to your own end-customer
Rate-limit by sender via the portal's per-token energy budget
Periodically REINDEX the pgvector index for accuracy
Cache embedding results for repeat queries in the same DB