Tutorial: Build a RAG app
Retrieval-augmented generation over a corpus of PDFs. Joule Cloud Object Store holds the docs, JouleDB stores the embeddings + does vector search, Inference handles the questions. End-to-end: ~120 lines of Python, three deploy steps.
Architecture
- Upload PDFs to
jc://my-rag-corpus/ - Index step: a Function triggers on bucket upload, chunks the PDF, generates embeddings via
POST /v1/embeddings, writes to JouleDB - Query step: a Function receives a question, embeds it, finds the top-K chunks via JouleDB vector index, calls
POST /v1/chat/completionswith the retrieved context
Step 1 — set up the bucket and database
jc storage create my-rag-corpus --region eu-fi
jc db create rag --region eu-fi --size 50GB
# enable pgvector on JouleDB
psql "postgresql://jc_…@db.greenjoules.cloud:5432/rag" << 'EOF'
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
document_uri TEXT NOT NULL,
ordinal INT NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(1536) NOT NULL
);
CREATE INDEX chunks_embedding_idx ON chunks USING ivfflat (embedding vector_cosine_ops);
EOF
Step 2 — index Function (trigger on bucket upload)
# index.py
import os, io
import pypdf
from openai import OpenAI
import psycopg
client = OpenAI(
base_url="https://api.greenjoules.cloud/v1",
api_key=os.environ["JC_API_KEY"],
)
DB = os.environ["DATABASE_URL"]
def chunk(text, n=800):
return [text[i:i+n] for i in range(0, len(text), n)]
def handler(req):
"""Triggered when an object lands in my-rag-corpus. The trigger payload has:
{ bucket, key, presigned_url } """
payload = req.json()
blob = client._client.get(payload["presigned_url"]).content
reader = pypdf.PdfReader(io.BytesIO(blob))
text = "\n".join(p.extract_text() or "" for p in reader.pages)
chunks = chunk(text)
embeds = client.embeddings.create(
model="text-embedding-3-small", input=chunks,
).data
with psycopg.connect(DB) as con, con.cursor() as cur:
for i, (c, e) in enumerate(zip(chunks, embeds)):
cur.execute(
"INSERT INTO chunks (document_uri, ordinal, content, embedding) "
"VALUES (%s, %s, %s, %s)",
(f"jc://{payload['bucket']}/{payload['key']}", i, c, e.embedding),
)
con.commit()
return {"indexed": len(chunks)}
# deploy as a Function triggered by the bucket
invisible fn deploy index.py \
--route /index \
--runtime python-3.13 \
--memory 2GB \
--on bucket/my-rag-corpus/object-created \
--env DATABASE_URL=$(jc db url rag)
Step 3 — ask Function (HTTP)
# ask.py
import os, json
from openai import OpenAI
import psycopg
client = OpenAI(
base_url="https://api.greenjoules.cloud/v1",
api_key=os.environ["JC_API_KEY"],
)
DB = os.environ["DATABASE_URL"]
def handler(req):
question = req.json()["question"]
q_embed = client.embeddings.create(
model="text-embedding-3-small", input=question,
).data[0].embedding
with psycopg.connect(DB) as con, con.cursor() as cur:
cur.execute(
"SELECT content, document_uri FROM chunks "
"ORDER BY embedding <=> %s::vector LIMIT 6",
(q_embed,),
)
ctx = cur.fetchall()
prompt = ("Answer the question using ONLY the provided context. Cite sources.\n\n"
"Context:\n" + "\n---\n".join(c[0] for c in ctx)
+ f"\n\nQuestion: {question}\n\nAnswer:")
r = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": prompt}],
)
return {
"answer": r.choices[0].message.content,
"sources": list({c[1] for c in ctx}),
"joules_used": r.response.headers.get("x-energy-joules"),
}
invisible fn deploy ask.py \
--route /ask \
--runtime python-3.13 \
--memory 1GB \
--env DATABASE_URL=$(jc db url rag)
Step 4 — drop some PDFs in
jc storage cp ./annual-report-2025.pdf jc://my-rag-corpus/
jc storage cp ./annual-report-2024.pdf jc://my-rag-corpus/
The index Function fires automatically, embeds + writes the chunks. Look at jc fn logs index for progress.
Step 5 — query it
curl https://<account>.fn.greenjoules.cloud/ask \
-H "content-type: application/json" \
-d '{"question": "What was the operating margin in Q4 2024?"}'
# {
# "answer": "...",
# "sources": ["jc://my-rag-corpus/annual-report-2024.pdf"],
# "joules_used": "0.47"
# }
The bill
For a 100-document corpus and 1000 queries/day, this stack typically costs $4-12/mo:
- Object Store: ~0.0X J/GB-second · tiny for < 10 GB of PDFs
- Embedding indexing: one-time joules per document, archived
- JouleDB vector reads: ~0.001 J / similarity search
- Inference: ~0.5 J / answer at the typical model auto-routed
Every line above is itemized on the receipt. Compare to your current pipeline.
Production hardening
- Add a
X-Customer-Tagheader to eachaskcall to attribute energy to your own end-customer - Rate-limit by sender via the portal's per-token energy budget
- Periodically REINDEX the pgvector index for accuracy
- Cache embedding results for repeat queries in the same DB