Routing & placement

The job of the router is simple to state: send each request to the cheapest capable silicon currently available on the mesh. Doing it well is the actual engineering. This page explains how it works so you can predict where your workloads land.

Inference: cost tiers

Each inference request is classified at the gateway into one of four cost tiers before model selection:

TierWhat it coversTypical energyWhere it can run
L0 — lookupcache hits, key-value reads, tiny embeddings~0.01 Jany node, including ARM edge
L1 — extractionshort summarization, classification, NER~0.05 JCPU / small GPU
L2 — aggregationRAG queries, mid-context summarization~0.3 Jmid-tier GPU (L4, A10, L40S)
L3 — reasoninglong-context reasoning, code gen, planning~6 Jtop-tier GPU (H100, H200, B200, MI300)

Classification is fast (sub-millisecond) and conservative — if there's any doubt, the router upgrades the tier rather than risking a quality regression. You can read the tier the router picked from the X-Tier response header.

What "cheapest capable" means

For a given tier, the router maintains a live ranking of every node in the mesh that can serve it, scored on:

The top-ranked node gets the request. If it fails health-check mid-flight, the request fails over to the next-ranked node automatically.

Workloads: carbon-aware placement

For container/function workloads (not inference), placement happens once at deploy time and then periodically — the workload may migrate if the scheduler finds a meaningfully better node. Set the placement strategy in invisible.hcl:

workload "site" {
  image  = "nginx:alpine"
  region = "auto"   # cheapest cleanest grid available
  # region = "eu"    # constrain to EU jurisdictions
  # region = "eu-fi" # pin to Finland
}

With region = "auto", the carbon-aware scheduler picks a node with low grid carbon intensity at deploy time. A batch job started in Virginia may finish in Helsinki if Nordic wind drops the energy cost meaningfully.

What you can control

WhatHow
Pin a specific modelmodel: "llama-3.3-70b-instruct" instead of "auto"
Constrain to a region or jurisdictionregion: "eu-fi" in invisible.hcl
Set an energy ceiling per workloadenergy_budget: "10 kJ/day"
Override tier classificationX-Force-Tier: L3 request header (audit-logged)

How to verify the routing

Every response carries the routing decision in headers:

X-Tier: L2
X-Routed-To: nebius/eu-helsinki/h100-sxm5
X-Routing-Reason: cheapest-capable

The same routing-decision metadata lives on the receipt for that request. If a request lands somewhere unexpected, the receipt explains why.

Next

To see what the per-request record contains, read Energy receipts. To see the underlying joule measurement, read What is a joule, here.