Requestyโs auto caching automatically caches long system prompts and repeated content to reduce costs on providers that support prompt caching (Anthropic, Gemini). This is especially effective for applications with large knowledge bases or system prompts โ cache hits are billed at a fraction of the normal input token cost.
The router provides an auto_cache flag that allows you to explicitly control the caching behavior for your requests on supported providers.
How Auto Cache Works
The auto_cache flag is a boolean parameter that can be sent within a custom requesty field in your request payload.
"auto_cache": true: This will instruct the router to attempt to cache the response from the provider. If a similar request has been cached previously, it might be served from the cache (depending on the providerโs caching strategy and TTL).
"auto_cache": false: This will instruct the router to bypass any automatic caching logic for this specific request and always fetch a fresh response from the provider.
- If
auto_cache is not provided: The router falls back to a default caching behavior which can depend on the origin of the request (e.g., calls from Cline or Roo Code default to caching).
This flag provides an explicit override to the default caching logic determined by the request origin or other implicit factors.
How to Use Auto Cache
To use the auto_cache flag, include it within the requesty object in your request.
{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Tell me a joke."}],
"requesty": {
"auto_cache": true
}
}
Example with Auto Cache
This example demonstrates how to set the auto_cache flag using the OpenAI Python client. The requesty field is passed as an additional parameter.
import openai
requesty_api_key = "YOUR_REQUESTY_API_KEY" # Safely load your API key
client = openai.OpenAI(
api_key=requesty_api_key,
base_url="https://router.requesty.ai/v1",
)
system_prompt = "YOUR ENTIRE KNOWLEDGEBASE" # Replace this with you actual long prompt
response = client.chat.completions.create(
model="vertex/anthropic/claude-3-7-sonnet",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "What is the capital of France?"}
],
extra_body={
"requesty": {
"auto_cache": True
}
}
)
print("Response:", response.choices[0].message.content)
Javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: "YOUR_REQUESTY_API_KEY",
baseURL: "https://router.requesty.ai/v1",
});
// Make request with auto_cache enabled
const response = await client.chat.completions.create({
model: "anthropic/claude-3-7-sonnet-latest",
messages: [
{ role: "system", content: "YOUR ENTIRE KNOWLEDGEBASE" },
{ role: "user", content: "What is the capital of France?" }
],
requesty: {
auto_cache: true
}
});
console.log("Response:", response.choices[0].message.content);
Important Notes
- Explicit Control:
auto_cache provides explicit control. true attempts to cache, false prevents caching for providers where cache writes incur extra costs.
- Default Behavior: If
auto_cache is not specified in the requesty field, the caching behavior reverts to defaults.
- Provider Support: This flag is respected by providers/models where cache writes incur extra costs, e.g. Anthropic and Gemini.
Managed Caching
If you want Requesty to manage caching on your behalf โ including custom TTL, cache warming, or advanced caching strategies โ reach out to support@requesty.ai and weโll help you set it up. Last modified on April 8, 2026