How It Works
- You assign weights to each model (e.g., 70%, 20%, 10%)
- Each incoming request is consistently routed to one model based on the distribution
- Requests with the same
trace_idoruser_idalways go to the same model (consistency guaranteed)
Benefits
- A/B Testing - Compare model performance with real traffic
- Gradual Rollouts - Send 10% to new model, 90% to stable model
- Cost Optimization - Route most traffic to cheaper models
- Consistent Experiences - Same user always gets same model (maintains conversation context)
- Policy Rollouts - Load balance between entire routing policies, not just models
Creating a Load Balancing Policy
Step 1: Create the Policy
- Go to Routing Policies
- Click βCreate Policyβ
- Select βLoad Balancingβ as the policy type

Step 2: Configure Weights
Example Setup:- Policy Name:
sonnet-distribution - Load Balancing:
anthropic/claude-sonnet-4-5: 50% (weight: 50)bedrock/claude-sonnet-4-5@eu-central-1: 50% (weight: 50)
Step 3: Use the Policy in Your Code
After creating the policy, reference it withpolicy/your-policy-name:
Consistency Guarantee
Load balancing uses deterministic hashing to ensure the same user always gets the same model:- With
trace_id: All requests with the sametrace_idroute to the same model - Without
trace_id: Requesty generates a uniquerequest_idfor each request
- β Multi-turn conversations stay on the same model (preserves context)
- β User sessions get consistent behavior
- β A/B test groups are stable
Maintaining Consistency Across Requests
To keep a user on the same model across multiple requests, pass atrace_id:
Pro Tip: Use your internal user ID as the
trace_id to ensure each user gets a consistent model experience while still benefiting from A/B testing.Load Balancing Between Policies
You can load balance between entire routing policies, not just individual models. This is powerful for:- Canary deployments of policy changes
- A/B testing different routing strategies
- Gradual migration from one policy to another
Example: Policy Rollout
Letβs say you have two fallback policies: Policy A (stable):policy/gradual-rollout in your code. As you gain confidence, adjust the weights to 50/50, then 0/100.
Use Cases
A/B Testing New Models
Compare GPT-5.2 vs Gemini 2.5 Pro on real traffic:Gradual Model Rollout
Carefully introduce a new model:gpt-5.2 as you validate quality.
Cost-Optimized Distribution
Route most traffic to cheaper models, some to premium:Multi-Provider Redundancy
Distribute across providers for resilience:Key Selection (BYOK)
For each model in your load balancing policy, you can choose:- Requesty provided key - Use Requestyβs managed keys (default)
- My own key - Use your BYOK credentials
Monitoring & Analytics
Track your load balancing performance:- Go to Analytics
- Filter by your policy name
- See the actual distribution of requests across models
- Compare latency, cost, and success rates between models
FAQ
How does consistent hashing work?
How does consistent hashing work?
Requesty uses the xxhash algorithm on your
trace_id (or request_id if no trace_id) to deterministically select a model. The same ID always produces the same hash, which maps to the same model.What happens if I change the weights?
What happens if I change the weights?
Changing weights will re-distribute traffic. Some users may switch to different models. If you need stability, avoid changing weights frequently, or use separate policies for stable vs experimental traffic.
Can I load balance and have fallback?
Can I load balance and have fallback?
Yes! Create a load balancing policy that points to fallback policies:This gives you both load balancing AND automatic failover.
Do all models need to be compatible?
Do all models need to be compatible?
Yes. All models in a load balancing policy should support the same request format and features. Donβt mix chat models with embedding models, or models with different context lengths.
How do I ensure exactly 20% of users see the new model?
How do I ensure exactly 20% of users see the new model?
Use a stable
trace_id (like user ID). With 100+ unique users, the distribution will converge to your configured weights (e.g., 20%). With small sample sizes, expect Β±5% variance.