Skip to main content

Dynamic Routing — Conditional Models, Budget Fallback & A/B Testing

Task

Build a dynamic route that routes different users to different AI models, falls back to a cheaper model when a budget is exceeded, and A/B tests new models on a percentage of traffic.

Why

Not every user needs the same model. Premium users may get a larger, more capable model, while free-tier users get a smaller, cost-effective one. When the premium model budget runs out, you want automatic fallback — not errors. And when evaluating a new model, you want to test it on a percentage of traffic before full rollout.

Dynamic routing lets you build all of this in the dashboard — no code changes, instant rollback, versioned history.

What You Are Configuring

  • A dynamic route named smart-route with three routing patterns:
    1. Conditional routinguser-admin gets the large 70B model, everyone else gets the small 1B model
    2. Budget fallback — the admin's 70B model has a cost cap; when exceeded, traffic falls back to the 1B model silently
    3. A/B testing — non-admin traffic is split 50/50 between two models
Production pattern

The conditional user_id == "user-admin" is a simplified lab version of what you would build with Cloudflare Access groups. Premium users (identified by JWT claims or Access groups) get routed to the premium model. Budget fallback ensures cost stays controlled even for premium tiers. A/B splits let you test new models on a percentage of traffic before full rollout.


Step 1: Prepare Your Gateway

Before creating the dynamic route, ensure previous controls do not interfere.

  1. Navigate to AI > AI Gateway > bootcamp-lab > Settings
  2. If you have any rate limiting or spend limits configured from previous exercises, temporarily disable them or increase the values
  3. Leave caching enabled or disabled — it will not interfere with routing
tip

You can also leave guardrails and DLP enabled. They apply regardless of which model serves the request — that is the point of gateway-level controls.


Step 2: Create a Dynamic Route

  1. Navigate to AI > AI Gateway > bootcamp-lab
  2. Click Dynamic Routes in the left menu
  3. Sekect Start from scratch
FieldValue
Route namesmart-route

Create dynamic route dialog

  1. Click Create
  2. You are now in the visual flow builder

Visual flow builder


Step 3: Configure Conditional Routing by User ID

Build the first routing pattern: route user-admin to the large model, everyone else to the small model.

  1. In the flow builder, add a Conditional node

Add conditional node

  1. Configure the condition:
FieldValue
Condition fieldmetadata.user_id
Operatorequals
Valueuser-admin

Configure conditional node

  1. On the True branch (user-admin), add a Model node:
FieldValue
ProviderWorkers AI
Model@cf/meta/llama-3.3-70b-instruct-fp8-fast

Configure model node for large model

  1. On the False branch (everyone else), add a Model node:
FieldValue
ProviderWorkers AI
Model@cf/meta/llama-3.2-1b-instruct

Configure model node for small model

  1. Click Save to deploy the route

Step 4: Test Conditional Routing

  1. In the AI Gateway Explorer app, change the Model dropdown to Dynamic Route: smart-route
  2. Set User ID to user-anna
  3. Click the NZ Travel quick prompt and click Send
  4. Check the response — note the Model shown in the response panel

Response with small model

  1. Now change User ID to user-admin
  2. Click Send with the same prompt

Response with large model

Expected Result

  • user-anna → response served by @cf/meta/llama-3.2-1b-instruct (small model)
  • user-admin → response served by @cf/meta/llama-3.3-70b-instruct-fp8-fast (large model)

Step 5: Verify Model Selection in Logs

  1. Navigate to AI > AI Gateway > bootcamp-lab > Logs
  2. Find the two recent requests
RequestUser IDModelCost
Firstuser-anna@cf/meta/llama-3.2-1b-instructLower
Seconduser-admin@cf/meta/llama-3.3-70b-instruct-fp8-fastHigher
Use the filters to display the requests based on the metadata value

Same gateway, same route, same prompt — different users get different models. The routing decision is visible in every log entry. The cost difference between the 3B and 70B model is also visible.

user-anna

Logs showing user-anna request

user-admin

Logs showing user-admin request

This is the proof point

Same gateway, same route, same prompt — different users get different models. The routing decision is visible in every log entry. The cost difference between the 1B and 70B model is also visible.


Step 6: Add Budget-Aware Fallback

Now add a cost cap to the admin's premium model. When the budget is exceeded, traffic falls back to the cheaper model automatically — no errors.

  1. Go back to Dynamic Routes > smart-route > Edit
  2. On the True branch (user-admin), insert a Budget Limit node before the 70B model node:
FieldValue
Budget$1
Window1 hour
Keymetadata.user_id
  1. Configure the fallback: when budget is exceeded → route to a Model node with @cf/meta/llama-3.2-1b-instruct

Budget limit node configuration

  1. Click Save & Deploy

The admin branch now looks like:

user-admin → Budget Limit ($1/hr)
├── within budget → 70B model
└── exceeded ────→ 1B model (fallback)

Custom Cost Override

Before testing the budget fallback, you need to understand the Custom Cost feature and why we use it in this lab.

What is cf-aig-custom-cost?

AI Gateway allows you to override the default per-token pricing on any request using the cf-aig-custom-cost header. This sets custom per_token_in and per_token_out values that AI Gateway uses for cost tracking, spend limits, and budget calculations — instead of the model's actual pricing.

Real-world use cases

Use CaseHow Custom Cost Helps
Negotiated enterprise pricingYour organisation has a discounted rate with a provider. Override the default pricing so gateway analytics and spend limits reflect your actual costs, not list prices.
Internal chargebackDifferent teams are charged different rates. Attach custom cost per team so each team's budget tracks their internal rate.
Cost modellingSimulate what would happen to your spend limits if token prices changed. Override costs to model price increases without waiting for actual pricing updates.
Fine-tuned model pricingYou are running a fine-tuned model variant with different economics. Custom cost lets you track the true cost even though the gateway does not know your model's pricing.

Why we use it in this lab

Workers AI pricing is very low — Llama 3.2 3B costs $0.051 per million input tokens and $0.34 per million output tokens. A typical request costs less than $0.0001. To reach even a $1 budget limit, you would need to send thousands of requests.

To test budget enforcement quickly, we use the Custom Cost toggle in the Explorer app. At $0.20 per token, a single request "costs" approximately $50 in the gateway's accounting — instantly triggering any budget limit.

Custom costs appear underlined in the gateway logs to distinguish them from actual model pricing.

warning

Remember to disable the Custom Cost toggle after testing budget features. Leaving it enabled will cause all subsequent requests to report inflated costs in your gateway analytics.


Step 7: Test Budget Fallback

  1. In the Explorer app, enable the Custom Cost toggle and set both Token In and Token Out to $0.20
  2. Set User ID to user-admin and Model to Dynamic Route: smart-route

Custom cost toggle

  1. Send a prompt — the inflated custom cost will burn through the budget in 1–2 requests
  2. Send a second prompt and watch the Model field in the Explorer response panel

Expected Result

  • Early requests: served by @cf/meta/llama-3.3-70b-instruct-fp8-fast (70B)
  • After budget depleted: served by @cf/meta/llama-3.2-1b-instruct (1B)
  • No errors — the user still gets a response, just from a smaller model

Budget fallback in action


Step 8: Verify Budget Fallback in Logs

  1. Go to Logs and filter by metadata.user_id = user-admin or scroll through recent requests

Logs showing model transition from 70B to 3B after budget fallback

Early requestsLater requests
Model: llama-3.3-70b-instruct-fp8-fastModel: llama-3.2-1b-instruct
Higher cost per requestLower cost per request
Status: SuccessStatus: Success
Spend limits vs. budget routing

This is the critical difference:

  • Spend limits (Page 4) → hard block → returns 429 Too Many Requests → user sees an error
  • Budget-aware dynamic routing → soft redirect → cheaper model → user still gets a response

Spend limits protect your wallet. Dynamic routing protects the user experience.


Step 9: Review the Complete Route

  1. Go to Dynamic Routes > smart-route
  2. Review the complete visual flow:

The route flow you built:

                    ┌──────────────┐
│ Conditional │
│ user_id == │
│ "user-admin" │
└──────┬───────┘
yes │ no
┌────────────┤────────────┐
▼ ▼
┌────────────────┐ ┌──────────────┐
│ Budget Limit │ │ Model │
│ $0.01 / 1hr │ │ Llama 3.2 3B │
└───────┬────────┘ └──────────────┘
ok │ exceeded
▼ ▼
┌────────┐ ┌────────┐
│ 70B │ │ 3B │
│ model │ │fallback│
└────────┘ └────────┘
Version history

Dynamic routes are versioned. Each time you save changes, a new version is created. You can instantly roll back to a previous version if something goes wrong — no code deploys needed.


A/B Testing with Percentage Routing

Not a hands-on exercise

This section is knowledge only. Percentage-based routing is currently available via the API (BETA) and is not yet configurable in the dashboard visual flow builder. The goal is to understand the capability so you can position it in customer conversations.

What is percentage routing?

A Percentage node in a dynamic route splits traffic probabilistically across multiple model branches. This enables:

  • Gradual model rollout — send 10% of traffic to a new model, monitor quality, then increase to 50%, then 100%
  • A/B testing — compare two models side-by-side on real production traffic
  • Canary deployments — test a model update on a small percentage before full rollout

How it works

A Percentage node defines two or more output branches, each with a traffic percentage that sums to 100%. For each request, the gateway randomly selects a branch based on the configured weights.

Example: 50/50 split between two models

BranchPercentageModel
Branch A50%@cf/meta/llama-3.2-3b-instruct
Branch B50%@cf/deepseek-ai/deepseek-r1-distill-qwen-32b

In the gateway logs, you would see roughly half the requests served by each model — useful for comparing response quality, latency, and cost between models on real traffic.

API configuration example

Percentage routing can be configured via the AI Gateway API by defining a route with a percentage node. Here is an illustrative example of how the default (non-admin) branch in your smart-route could be extended with a 50/50 split:

{
"nodes": [
{
"type": "conditional",
"condition": "metadata.user_id == 'user-admin'",
"true_branch": "budget_node",
"false_branch": "percentage_node"
},
{
"id": "percentage_node",
"type": "percentage",
"branches": [
{ "weight": 50, "next": "model_3b" },
{ "weight": 50, "next": "model_deepseek" }
]
},
{
"id": "model_3b",
"type": "model",
"model": "@cf/meta/llama-3.2-3b-instruct"
},
{
"id": "model_deepseek",
"type": "model",
"model": "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b"
}
]
}
info

The JSON structure above is illustrative. Refer to the Dynamic Routing documentation for the exact API schema. When percentage routing is available in the dashboard visual builder, you will be able to configure it the same way you configured the conditional and budget nodes in this lab.


Customer Talk Track

"What happens when your primary AI model is over budget, or you want to test a new model without risking all your traffic? AI Gateway dynamic routing lets you build conditional logic, budget-aware fallbacks, and A/B test splits via the API — all without code changes. Versioned rollback and every routing decision is logged. You can give premium users a premium model and automatically degrade gracefully when budgets are hit.

And for transient failures — timeouts, provider outages — AI Gateway automatically retries requests with configurable backoff, before the user ever sees an error. Retries plus dynamic routing plus caching gives you a full resilience stack for AI inference."


Beyond Dynamic Routing: Automatic Retries

Not a hands-on exercise

This section is knowledge only — there are no steps to configure. The goal is to understand the full resilience capabilities AI Gateway provides beyond dynamic routing, so you can position them in customer conversations and PoC scoping.

Dynamic routing handles persistent failures — model over budget, provider unavailable, cost-driven fallback. But what about transient failures — a brief timeout, a 503 from an overloaded provider, a momentary network glitch?

AI Gateway supports automatic retries at the gateway level. When an upstream provider returns an error, the gateway retries the request based on the retry policy you configure — without any client-side changes.

Retry Configuration

You can configure retries in the gateway Settings or override per-request using headers:

SettingDashboard PathPer-Request HeaderRange
Retry countSettings > Retry Requestscf-aig-max-attempts1–5 attempts
Retry delaySettings > Retry Requestscf-aig-retry-delay100–5000 ms
Backoff strategySettings > Retry Requestscf-aig-backoffconstant, linear, exponential
Request timeoutSettings > Retry Requestscf-aig-request-timeoutmilliseconds

Backoff strategies explained:

  • Constant — same delay between each retry (e.g., 500ms, 500ms, 500ms)
  • Linear — delay increases by the base amount each attempt (e.g., 500ms, 1000ms, 1500ms)
  • Exponential — delay doubles each attempt (e.g., 500ms, 1000ms, 2000ms) — best for overloaded providers

On the final retry attempt, the gateway waits for the request to complete regardless of how long it takes.

How Retries and Dynamic Routing Work Together

Retries and dynamic routing solve different problems and complement each other:

CapabilityHandlesExample
Automatic retriesTransient failures — timeouts, 503s, brief provider issuesProvider has a momentary spike; retry succeeds on second attempt
Dynamic routingPersistent failures — model over budget, provider down, cost-driven fallbackPrimary model budget exceeded; route to cheaper fallback model
CachingRepeat queries — same prompt served from cacheIdentical question asked again; served instantly at $0 cost
Full resilience stack

AI Gateway provides three layers of resilience for AI inference:

  1. Automatic retries — retry transient failures (timeouts, 503s) with configurable backoff, no client changes needed
  2. Dynamic routing — fail over to a different model or provider when the primary is unavailable or over budget
  3. Caching — serve cached responses when the model is completely unavailable, or for repeat queries at $0 cost

Together, these ensure your AI application stays responsive even when upstream providers have issues. All three are configured at the gateway level — no application code changes required.


Validation

  • Dynamic route smart-route created
  • user-anna routed to the small model
  • user-admin routed to the large model
  • Model selection differences visible in gateway logs
  • Budget fallback configured ($0.01/hr for admin)
  • After budget depletion, user-admin automatically falls back to the small model with no errors
  • Budget fallback transition visible in logs (70B → 3B)
  • Complete route flow visible in the visual builder
  • Understand how A/B percentage routing works (knowledge section)

Troubleshooting

Dynamic route not routing to the correct model
  • Verify the Model dropdown in the Explorer app is set to Dynamic Route: smart-route
  • Check that the conditional node uses metadata.user_id (not user_id)
  • Ensure the User ID in the Explorer app matches exactly (e.g., user-admin not admin)
  • Review the route in the visual builder to confirm the flow logic
Budget fallback not triggering
  • The $0.01 budget may not be exhausted yet — send more requests
  • Budget tracking is eventually consistent — a burst of concurrent requests may briefly exceed the limit
  • Check that the Budget Limit node's Key field is set to metadata.user_id
  • Verify the fallback branch connects to the correct 3B model node
"Model not found" error with dynamic route
  • The model name in the Explorer app must be dynamic/smart-route (not just smart-route)
  • Verify the route name in the dashboard matches exactly
  • Check that the route has been saved (not still in draft)