Dynamic Routing — Conditional Models, Budget Fallback & A/B Testing

Task

Build a dynamic route that routes different users to different AI models, falls back to a cheaper model when a budget is exceeded, and A/B tests new models on a percentage of traffic.

Why

Not every user needs the same model. Premium users may get a larger, more capable model, while free-tier users get a smaller, cost-effective one. When the premium model budget runs out, you want automatic fallback — not errors. And when evaluating a new model, you want to test it on a percentage of traffic before full rollout.

Dynamic routing lets you build all of this in the dashboard — no code changes, instant rollback, versioned history.

What You Are Configuring

A dynamic route named smart-route with three routing patterns:
1. Conditional routing — user-admin gets the large 70B model, everyone else gets the small 1B model
2. Budget fallback — the admin's 70B model has a cost cap; when exceeded, traffic falls back to the 1B model silently
3. A/B testing — non-admin traffic is split 50/50 between two models

Production pattern

The conditional user_id == "user-admin" is a simplified lab version of what you would build with Cloudflare Access groups. Premium users (identified by JWT claims or Access groups) get routed to the premium model. Budget fallback ensures cost stays controlled even for premium tiers. A/B splits let you test new models on a percentage of traffic before full rollout.

Step 1: Prepare Your Gateway

Before creating the dynamic route, ensure previous controls do not interfere.

Navigate to AI > AI Gateway > bootcamp-lab > Settings
If you have any rate limiting or spend limits configured from previous exercises, temporarily disable them or increase the values
Leave caching enabled or disabled — it will not interfere with routing

tip

You can also leave guardrails and DLP enabled. They apply regardless of which model serves the request — that is the point of gateway-level controls.

Step 2: Create a Dynamic Route

Navigate to AI > AI Gateway > bootcamp-lab
Click Dynamic Routes in the left menu
Sekect Start from scratch

Field	Value
Route name	`smart-route`

Create dynamic route dialog

Click Create
You are now in the visual flow builder

Visual flow builder

Step 3: Configure Conditional Routing by User ID

Build the first routing pattern: route user-admin to the large model, everyone else to the small model.

In the flow builder, add a Conditional node

Add conditional node

Configure the condition:

Field	Value
Condition field	`metadata.user_id`
Operator	`equals`
Value	`user-admin`

Configure conditional node

On the True branch (user-admin), add a Model node:

Field	Value
Provider	`Workers AI`
Model	`@cf/meta/llama-3.3-70b-instruct-fp8-fast`

Configure model node for large model

On the False branch (everyone else), add a Model node:

Field	Value
Provider	`Workers AI`
Model	`@cf/meta/llama-3.2-1b-instruct`

Configure model node for small model

Click Save to deploy the route

Step 4: Test Conditional Routing

In the AI Gateway Explorer app, change the Model dropdown to Dynamic Route: smart-route
Set User ID to user-anna
Click the NZ Travel quick prompt and click Send
Check the response — note the Model shown in the response panel

Response with small model

Now change User ID to user-admin
Click Send with the same prompt

Response with large model

Expected Result

user-anna → response served by @cf/meta/llama-3.2-1b-instruct (small model)
user-admin → response served by @cf/meta/llama-3.3-70b-instruct-fp8-fast (large model)

Step 5: Verify Model Selection in Logs

Navigate to AI > AI Gateway > bootcamp-lab > Logs
Find the two recent requests

Request	User ID	Model	Cost
First	`user-anna`	`@cf/meta/llama-3.2-1b-instruct`	Lower
Second	`user-admin`	`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	Higher

Use the filters to display the requests based on the metadata value

Same gateway, same route, same prompt — different users get different models. The routing decision is visible in every log entry. The cost difference between the 3B and 70B model is also visible.

user-anna

Logs showing user-anna request

user-admin

Logs showing user-admin request

This is the proof point

Same gateway, same route, same prompt — different users get different models. The routing decision is visible in every log entry. The cost difference between the 1B and 70B model is also visible.

Step 6: Add Budget-Aware Fallback

Now add a cost cap to the admin's premium model. When the budget is exceeded, traffic falls back to the cheaper model automatically — no errors.

Go back to Dynamic Routes > smart-route > Edit
On the True branch (user-admin), insert a Budget Limit node before the 70B model node:

Field	Value
Budget	`$1`
Window	`1 hour`
Key	`metadata.user_id`

Configure the fallback: when budget is exceeded → route to a Model node with @cf/meta/llama-3.2-1b-instruct

Budget limit node configuration

Click Save & Deploy

The admin branch now looks like:

user-admin → Budget Limit ($1/hr)
                ├── within budget → 70B model
                └── exceeded ────→ 1B model (fallback)

Custom Cost Override

Before testing the budget fallback, you need to understand the Custom Cost feature and why we use it in this lab.

What is `cf-aig-custom-cost`?

AI Gateway allows you to override the default per-token pricing on any request using the cf-aig-custom-cost header. This sets custom per_token_in and per_token_out values that AI Gateway uses for cost tracking, spend limits, and budget calculations — instead of the model's actual pricing.

Real-world use cases

Use Case	How Custom Cost Helps
Negotiated enterprise pricing	Your organisation has a discounted rate with a provider. Override the default pricing so gateway analytics and spend limits reflect your actual costs, not list prices.
Internal chargeback	Different teams are charged different rates. Attach custom cost per team so each team's budget tracks their internal rate.
Cost modelling	Simulate what would happen to your spend limits if token prices changed. Override costs to model price increases without waiting for actual pricing updates.
Fine-tuned model pricing	You are running a fine-tuned model variant with different economics. Custom cost lets you track the true cost even though the gateway does not know your model's pricing.

Why we use it in this lab

Workers AI pricing is very low — Llama 3.2 3B costs $0.051 per million input tokens and $0.34 per million output tokens. A typical request costs less than $0.0001. To reach even a $1 budget limit, you would need to send thousands of requests.

To test budget enforcement quickly, we use the Custom Cost toggle in the Explorer app. At $0.20 per token, a single request "costs" approximately $50 in the gateway's accounting — instantly triggering any budget limit.

Custom costs appear underlined in the gateway logs to distinguish them from actual model pricing.

warning

Remember to disable the Custom Cost toggle after testing budget features. Leaving it enabled will cause all subsequent requests to report inflated costs in your gateway analytics.

Step 7: Test Budget Fallback

In the Explorer app, enable the Custom Cost toggle and set both Token In and Token Out to $0.20
Set User ID to user-admin and Model to Dynamic Route: smart-route

Custom cost toggle

Send a prompt — the inflated custom cost will burn through the budget in 1–2 requests
Send a second prompt and watch the Model field in the Explorer response panel

Expected Result

Early requests: served by @cf/meta/llama-3.3-70b-instruct-fp8-fast (70B)
After budget depleted: served by @cf/meta/llama-3.2-1b-instruct (1B)
No errors — the user still gets a response, just from a smaller model

Budget fallback in action

Step 8: Verify Budget Fallback in Logs

Go to Logs and filter by metadata.user_id = user-admin or scroll through recent requests

Logs showing model transition from 70B to 3B after budget fallback

Early requests	Later requests
Model: `llama-3.3-70b-instruct-fp8-fast`	Model: `llama-3.2-1b-instruct`
Higher cost per request	Lower cost per request
Status: Success	Status: Success

Spend limits vs. budget routing

This is the critical difference:

Spend limits (Page 4) → hard block → returns 429 Too Many Requests → user sees an error
Budget-aware dynamic routing → soft redirect → cheaper model → user still gets a response

Spend limits protect your wallet. Dynamic routing protects the user experience.

Step 9: Review the Complete Route

Go to Dynamic Routes > smart-route
Review the complete visual flow:

The route flow you built:

                    ┌──────────────┐
                    │  Conditional  │
                    │  user_id ==   │
                    │ "user-admin"  │
                    └──────┬───────┘
                     yes   │   no
              ┌────────────┤────────────┐
              ▼                         ▼
     ┌────────────────┐        ┌──────────────┐
     │  Budget Limit   │        │    Model      │
     │  $0.01 / 1hr    │        │  Llama 3.2 3B │
     └───────┬────────┘        └──────────────┘
       ok    │  exceeded
       ▼     ▼
  ┌────────┐ ┌────────┐
  │  70B   │ │   3B   │
  │ model  │ │fallback│
  └────────┘ └────────┘

Version history

Dynamic routes are versioned. Each time you save changes, a new version is created. You can instantly roll back to a previous version if something goes wrong — no code deploys needed.

A/B Testing with Percentage Routing

Not a hands-on exercise

This section is knowledge only. Percentage-based routing is currently available via the API (BETA) and is not yet configurable in the dashboard visual flow builder. The goal is to understand the capability so you can position it in customer conversations.

What is percentage routing?

A Percentage node in a dynamic route splits traffic probabilistically across multiple model branches. This enables:

Gradual model rollout — send 10% of traffic to a new model, monitor quality, then increase to 50%, then 100%
A/B testing — compare two models side-by-side on real production traffic
Canary deployments — test a model update on a small percentage before full rollout

How it works

A Percentage node defines two or more output branches, each with a traffic percentage that sums to 100%. For each request, the gateway randomly selects a branch based on the configured weights.

Example: 50/50 split between two models

Branch	Percentage	Model
Branch A	`50%`	`@cf/meta/llama-3.2-3b-instruct`
Branch B	`50%`	`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`

In the gateway logs, you would see roughly half the requests served by each model — useful for comparing response quality, latency, and cost between models on real traffic.

API configuration example

Percentage routing can be configured via the AI Gateway API by defining a route with a percentage node. Here is an illustrative example of how the default (non-admin) branch in your smart-route could be extended with a 50/50 split:

{
  "nodes": [
    {
      "type": "conditional",
      "condition": "metadata.user_id == 'user-admin'",
      "true_branch": "budget_node",
      "false_branch": "percentage_node"
    },
    {
      "id": "percentage_node",
      "type": "percentage",
      "branches": [
        { "weight": 50, "next": "model_3b" },
        { "weight": 50, "next": "model_deepseek" }
      ]
    },
    {
      "id": "model_3b",
      "type": "model",
      "model": "@cf/meta/llama-3.2-3b-instruct"
    },
    {
      "id": "model_deepseek",
      "type": "model",
      "model": "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b"
    }
  ]
}

info

The JSON structure above is illustrative. Refer to the Dynamic Routing documentation for the exact API schema. When percentage routing is available in the dashboard visual builder, you will be able to configure it the same way you configured the conditional and budget nodes in this lab.

Customer Talk Track

"What happens when your primary AI model is over budget, or you want to test a new model without risking all your traffic? AI Gateway dynamic routing lets you build conditional logic, budget-aware fallbacks, and A/B test splits via the API — all without code changes. Versioned rollback and every routing decision is logged. You can give premium users a premium model and automatically degrade gracefully when budgets are hit.

And for transient failures — timeouts, provider outages — AI Gateway automatically retries requests with configurable backoff, before the user ever sees an error. Retries plus dynamic routing plus caching gives you a full resilience stack for AI inference."

Beyond Dynamic Routing: Automatic Retries

Not a hands-on exercise

This section is knowledge only — there are no steps to configure. The goal is to understand the full resilience capabilities AI Gateway provides beyond dynamic routing, so you can position them in customer conversations and PoC scoping.

Dynamic routing handles persistent failures — model over budget, provider unavailable, cost-driven fallback. But what about transient failures — a brief timeout, a 503 from an overloaded provider, a momentary network glitch?

AI Gateway supports automatic retries at the gateway level. When an upstream provider returns an error, the gateway retries the request based on the retry policy you configure — without any client-side changes.

Retry Configuration

You can configure retries in the gateway Settings or override per-request using headers:

Setting	Dashboard Path	Per-Request Header	Range
Retry count	Settings > Retry Requests	`cf-aig-max-attempts`	1–5 attempts
Retry delay	Settings > Retry Requests	`cf-aig-retry-delay`	100–5000 ms
Backoff strategy	Settings > Retry Requests	`cf-aig-backoff`	`constant`, `linear`, `exponential`
Request timeout	Settings > Retry Requests	`cf-aig-request-timeout`	milliseconds

Backoff strategies explained:

Constant — same delay between each retry (e.g., 500ms, 500ms, 500ms)
Linear — delay increases by the base amount each attempt (e.g., 500ms, 1000ms, 1500ms)
Exponential — delay doubles each attempt (e.g., 500ms, 1000ms, 2000ms) — best for overloaded providers

On the final retry attempt, the gateway waits for the request to complete regardless of how long it takes.

How Retries and Dynamic Routing Work Together

Retries and dynamic routing solve different problems and complement each other:

Capability	Handles	Example
Automatic retries	Transient failures — timeouts, 503s, brief provider issues	Provider has a momentary spike; retry succeeds on second attempt
Dynamic routing	Persistent failures — model over budget, provider down, cost-driven fallback	Primary model budget exceeded; route to cheaper fallback model
Caching	Repeat queries — same prompt served from cache	Identical question asked again; served instantly at $0 cost

Full resilience stack

AI Gateway provides three layers of resilience for AI inference:

Automatic retries — retry transient failures (timeouts, 503s) with configurable backoff, no client changes needed
Dynamic routing — fail over to a different model or provider when the primary is unavailable or over budget
Caching — serve cached responses when the model is completely unavailable, or for repeat queries at $0 cost

Together, these ensure your AI application stays responsive even when upstream providers have issues. All three are configured at the gateway level — no application code changes required.

Validation

Dynamic route smart-route created
user-anna routed to the small model
user-admin routed to the large model
Model selection differences visible in gateway logs
Budget fallback configured ($0.01/hr for admin)
After budget depletion, user-admin automatically falls back to the small model with no errors
Budget fallback transition visible in logs (70B → 3B)
Complete route flow visible in the visual builder
Understand how A/B percentage routing works (knowledge section)

Troubleshooting

Dynamic route not routing to the correct model

Verify the Model dropdown in the Explorer app is set to Dynamic Route: smart-route
Check that the conditional node uses metadata.user_id (not user_id)
Ensure the User ID in the Explorer app matches exactly (e.g., user-admin not admin)
Review the route in the visual builder to confirm the flow logic

Budget fallback not triggering

The $0.01 budget may not be exhausted yet — send more requests
Budget tracking is eventually consistent — a burst of concurrent requests may briefly exceed the limit
Check that the Budget Limit node's Key field is set to metadata.user_id
Verify the fallback branch connects to the correct 3B model node

"Model not found" error with dynamic route

The model name in the Explorer app must be dynamic/smart-route (not just smart-route)
Verify the route name in the dashboard matches exactly
Check that the route has been saved (not still in draft)

Task​

Why​

What You Are Configuring​

Step 1: Prepare Your Gateway​

Step 2: Create a Dynamic Route​

Step 3: Configure Conditional Routing by User ID​

Step 4: Test Conditional Routing​

Expected Result​

Step 5: Verify Model Selection in Logs​

Step 6: Add Budget-Aware Fallback​

Custom Cost Override​

What is cf-aig-custom-cost?​

Real-world use cases​

Why we use it in this lab​

Step 7: Test Budget Fallback​

Expected Result​

Step 8: Verify Budget Fallback in Logs​

Step 9: Review the Complete Route​

A/B Testing with Percentage Routing​

What is percentage routing?​

How it works​

API configuration example​

Customer Talk Track​

Beyond Dynamic Routing: Automatic Retries​

Retry Configuration​

How Retries and Dynamic Routing Work Together​

Validation​

Troubleshooting​

Task

Why

What You Are Configuring

Step 1: Prepare Your Gateway

Step 2: Create a Dynamic Route

Step 3: Configure Conditional Routing by User ID

Step 4: Test Conditional Routing

Expected Result

Step 5: Verify Model Selection in Logs

Step 6: Add Budget-Aware Fallback

Custom Cost Override

What is `cf-aig-custom-cost`?

Real-world use cases

Why we use it in this lab

Step 7: Test Budget Fallback

Expected Result

Step 8: Verify Budget Fallback in Logs

Step 9: Review the Complete Route

A/B Testing with Percentage Routing

What is percentage routing?

How it works

API configuration example

Customer Talk Track

Beyond Dynamic Routing: Automatic Retries

Retry Configuration

How Retries and Dynamic Routing Work Together

Validation

Troubleshooting