Migrate from Replicate
Replicate gives you per-second-billed GPU model APIs — SDXL, FLUX, Llama, Whisper, etc. Joule Cloud serves the same kinds of models, with energy-priced minutes instead of GPU-seconds. For the models we host on the router, migration is just an endpoint swap. For your own custom Cog models, you ship a container.
Inference API: same shape
Replicate uses a prediction-based API; ours follows the OpenAI shape. For models we already host (Llama, Mistral, DeepSeek, FLUX, SDXL, Whisper), point your inference SDK at api.greenjoules.cloud/v1 with model: <name>:
Image generation (FLUX or SDXL)
# Replicate
- import replicate
- out = replicate.run(
- "black-forest-labs/flux-schnell",
- input={"prompt": "a cat on a piano", "num_outputs": 1},
- )
# Joule Cloud (OpenAI image-gen shape)
+ from openai import OpenAI
+ client = OpenAI(base_url="https://api.greenjoules.cloud/v1", api_key="jc_…")
+ r = client.images.generate(
+ model="flux-schnell",
+ prompt="a cat on a piano",
+ n=1,
+ )
Audio transcription (Whisper)
# Replicate
- replicate.run("openai/whisper", input={"audio": open("clip.mp3","rb")})
# Joule Cloud
+ client.audio.transcriptions.create(
+ model="whisper-large-v3",
+ file=open("clip.mp3", "rb"),
+ )
Custom Cog models
Cog containerizes your Python model. The Cog-built image is OCI-shaped, so it can deploy onto Joule Cloud Compute directly:
# Build with Cog as usual
cog build -t ghcr.io/me/my-model:1.0
docker push ghcr.io/me/my-model:1.0
# Deploy
cat > invisible.hcl <<EOF
workload "my-model" {
image = "ghcr.io/me/my-model:1.0"
region = "eu-fi"
gpu = "a100"
memory = "40 GB"
}
route "my-model.example.com" {
to = workload.my-model
https = true
}
EOF
invisible deploy
Your model is now reachable at https://my-model.example.com/predictions with the Cog-defined shape; or wrap a thin OpenAI-shape adapter in front of it if you want it routable via api.greenjoules.cloud/v1 with the rest of your inference traffic.
What you keep
- The Cog tooling for building custom models
- GPU types: A10, A100, H100, H200, MI300
- Per-prediction billing semantics — no idle reservation
What you gain
- Energy-priced GPU minutes — you pay for what the silicon used, not the per-second tier price
- For listed models, openAI-shape API integrates with the rest of your inference stack with no Replicate-specific SDK
- Region pinning + per-call carbon receipt