Skip to main content
Fallback Policies automatically retry your requests with different models if one fails, ensuring your application stays reliable even when individual providers have issues.

How It Works

  1. Your request goes to the primary model first
  2. If it fails (timeout, rate limit, error, etc.), the router immediately tries the next model
  3. This continues down the chain until a model successfully responds
  4. Your application receives the successful response without knowing about the failures

Benefits

  • Higher success rates - No more failed requests due to provider issues
  • Zero downtime - Automatic failover without code changes
  • Cost optimization - Start with cheaper models, fall back to premium ones only when needed
  • No stalled workflows - Your users never see “model unavailable” errors

Creating a Fallback Policy

Step 1: Create the Policy

  1. Go to Routing Policies
  2. Click “Create Policy
  3. Select “Fallback Chain” as the policy type
Create Policy

Step 2: Configure Your Fallback Chain

Example Setup:
  • Policy Name: sonnet
  • Fallback Chain:
    • anthropic/claude-sonnet-4-5 (1 retry)
    • bedrock/claude-sonnet-4-5@eu-central-1 (1 retry)
Each model can have multiple retries. The router will:
  1. Try anthropic/claude-sonnet-4-5 once
  2. If it fails, retry anthropic/claude-sonnet-4-5 one more time
  3. If still failing, move to bedrock/claude-sonnet-4-5@eu-central-1 and try twice
  4. Continue down the chain until success

Step 3: Use the Policy in Your Code

This is the critical step: You need to change your model parameter to reference your policy. After creating a policy named sonnet, you’ll see it in your models list as:
policy/sonnet
Update your code to use this model identifier:
from openai import OpenAI

client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="your-requesty-api-key"
)

response = client.chat.completions.create(
    model="policy/sonnet",  # ← Use your policy name here
    messages=[{"role": "user", "content": "Hello!"}]
)
How to find your policy reference:
  1. Go to your Routing Policies
  2. Click the copy button next to your policy name
  3. Paste it directly into your model parameter

Use Cases

Cost-Effective GPT Chain

Start with cheaper models, only use expensive ones if needed:
Policy: cost-effective-gpt
├─ openai/gpt-4o-mini (2 retries)
├─ openai/gpt-4o (1 retry)
└─ openai/gpt-5.2 (1 retry)

Multi-Provider Reliability

Distribute across providers for maximum uptime:
Policy: multi-provider-safe
├─ openai/gpt-5.2 (1 retry)
├─ anthropic/claude-sonnet-4-5 (1 retry)
└─ google/gemini-2.5-pro (1 retry)

Regional Failover

Try regional endpoints before falling back to global:
Policy: regional-claude
├─ bedrock/claude-sonnet-4-5@eu-central-1 (2 retries)
└─ anthropic/claude-sonnet-4-5 (2 retries)

How Retries Work

Each model in the chain can have 0-10 retries. The router uses:
  • Exponential backoff - Wait time increases between retries (500ms → 1s → 2s → 4s)
  • Jitter - Random variation (±10%) to prevent thundering herd
  • Immediate failover - On non-retryable errors (invalid request, auth failure)
Model Compatibility: Make sure all models in your fallback chain support your request parameters (context length, features like streaming, tool calling, etc.). If a model can’t handle the request, the policy will skip to the next model without warning.

Key Selection (BYOK)

For each model, you can choose which API key to use:
  • Requesty provided key - Use Requesty’s managed keys (default)
  • My own key - Use your Bring-Your-Own-Key (BYOK) credentials
  • Try Requesty provided key first, then use my own - Fallback to BYOK if Requesty key fails
  • Try my own key first, then Requesty’s - Prefer BYOK, fallback to Requesty

Monitoring & Debugging

Track your fallback policy performance:
  1. Go to Analytics
  2. Filter by your policy name
  3. See which models succeeded, failed, and how often fallback occurred

FAQ

The request returns an error with details about the last model attempted. You’ll see all the failures in your request logs.
Yes! A fallback policy can reference another policy as one of its fallback options. For example:
Policy A (fallback):
├─ openai/gpt-4
└─ policy/multi-provider-backup  ← Another policy
No. You only pay for successful requests that return tokens. Failed attempts don’t incur costs.
Click the edit icon next to your policy in the Routing Policies page. Changes take effect immediately - no code deployment needed.