Migrate from Replicate

Replicate gives you per-second-billed GPU model APIs — SDXL, FLUX, Llama, Whisper, etc. Joule Cloud serves the same kinds of models, with energy-priced minutes instead of GPU-seconds. For the models we host on the router, migration is just an endpoint swap. For your own custom Cog models, you ship a container.

Inference API: same shape

Replicate uses a prediction-based API; ours follows the OpenAI shape. For models we already host (Llama, Mistral, DeepSeek, FLUX, SDXL, Whisper), point your inference SDK at api.greenjoules.cloud/v1 with model: <name>:

Image generation (FLUX or SDXL)

# Replicate
- import replicate
- out = replicate.run(
-     "black-forest-labs/flux-schnell",
-     input={"prompt": "a cat on a piano", "num_outputs": 1},
- )

# Joule Cloud (OpenAI image-gen shape)
+ from openai import OpenAI
+ client = OpenAI(base_url="https://api.greenjoules.cloud/v1", api_key="jc_…")
+ r = client.images.generate(
+     model="flux-schnell",
+     prompt="a cat on a piano",
+     n=1,
+ )

Audio transcription (Whisper)

# Replicate
- replicate.run("openai/whisper", input={"audio": open("clip.mp3","rb")})

# Joule Cloud
+ client.audio.transcriptions.create(
+     model="whisper-large-v3",
+     file=open("clip.mp3", "rb"),
+ )

Custom Cog models

Cog containerizes your Python model. The Cog-built image is OCI-shaped, so it can deploy onto Joule Cloud Compute directly:

# Build with Cog as usual
cog build -t ghcr.io/me/my-model:1.0
docker push ghcr.io/me/my-model:1.0

# Deploy
cat > invisible.hcl <<EOF
workload "my-model" {
  image  = "ghcr.io/me/my-model:1.0"
  region = "eu-fi"
  gpu    = "a100"
  memory = "40 GB"
}

route "my-model.example.com" {
  to = workload.my-model
  https = true
}
EOF
invisible deploy

Your model is now reachable at https://my-model.example.com/predictions with the Cog-defined shape; or wrap a thin OpenAI-shape adapter in front of it if you want it routable via api.greenjoules.cloud/v1 with the rest of your inference traffic.

What you keep

The Cog tooling for building custom models
GPU types: A10, A100, H100, H200, MI300
Per-prediction billing semantics — no idle reservation

What you gain

Energy-priced GPU minutes — you pay for what the silicon used, not the per-second tier price
For listed models, openAI-shape API integrates with the rest of your inference stack with no Replicate-specific SDK
Region pinning + per-call carbon receipt