Create Response
Send input to an OpenAI-compatible model using the Responses API format and receive a response.
Send input to an OpenAI-compatible model and receive a response. This endpoint follows the OpenAI Responses API format and supports all OpenAI models that expose the Responses API natively, as well as compatible models from other providers through Requesty’s routing.Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
Base URL
Authentication
The Responses endpoint accepts either OpenAI-style bearer auth or Anthropic-stylex-api-key auth. Use whichever your client library expects.
Headers
| Header | Required | Description |
|---|---|---|
Authorization | ✅ * | Bearer token with your Requesty key |
x-api-key | ✅ * | Your Requesty API key (alternative) |
Content-Type | ✅ | Must be application/json |
Authorization or x-api-key.
Example Request
Using the OpenAI SDK
The Responses endpoint is fully compatible with the official OpenAI SDK. Just pointbase_url at Requesty:
Model Selection
You can use any model available in the Model Library. Requesty translates the request shape for non-OpenAI providers automatically.- OpenAI Models:
openai-responses/gpt-5,openai-responses/gpt-5-mini,openai-responses/gpt-4.1,openai-responses/gpt-4o - Anthropic Models:
anthropic/claude-sonnet-4-5,anthropic/claude-opus-4 - Google Models:
google/gemini-2.5-pro,google/gemini-2.5-flash - Other Providers:
mistral/mistral-large-2411,meta/llama-3.3-70b-instruct
response.* event stream), use the openai-responses/ prefix. The standard openai/ prefix routes through Chat Completions under the hood.Input Formats
Theinput field accepts either a plain string or an array of input items. Use the array form for multi-turn conversations, tool results, and rich content.
String input
Multi-turn input
Instructions
Use theinstructions parameter to set a system-level prompt that applies to the entire request. It is equivalent to a system or developer message at the start of the conversation.
Streaming
Enable streaming by settingstream: true. The response is delivered as Server-Sent Events using the OpenAI Responses event format (response.created, response.output_text.delta, response.completed, etc.).
usage block with cost on streaming requests, no additional parameter is required. The response.completed event includes the full usage object.
Vision Support
Send images using theinput_image content type. You can pass an image URL or a base64 data URL.
PDF Support
Send PDFs using theinput_file content type. You can provide the PDF as either a base64 data URL or a remote URL.
Tool Use
Define tools the model may call. The Responses API uses a flatter shape than Chat Completions:name, description, and parameters live at the top level of each tool entry.
function_call_output item in input:
Reasoning
For reasoning-capable models (e.g.openai-responses/gpt-5, openai-responses/o3), configure reasoning effort and the optional summary:
effort:low,medium, orhigh. Lower effort produces faster responses with fewer reasoning tokens.summary:auto,concise, ordetailed. Controls whether the model returns a reasoning summary alongside the final answer.
Structured Outputs
Settext.format to enforce JSON-mode or a strict JSON Schema on the output.
Response Format
A successful response follows the OpenAI Responses format:cost field inside usage is a Requesty extension and reports the USD cost of the request. It is returned by default on non-streaming responses, and on the final response.completed event when streaming. See Cost Tracking.
Error Handling
The API returns standard HTTP status codes:200- Success400- Bad Request (invalid parameters)401- Unauthorized (invalid API key)403- Forbidden (insufficient permissions)429- Rate Limited500- Internal Server Error
Key Differences from OpenAI Chat Completions
inputinstead ofmessages: Accepts a string or a list of typed items (messages, tool calls, tool results, reasoning).instructionsinstead of system messages: System prompts are passed via the top-levelinstructionsfield.- Flat tool shape: Tools declare
name,description, andparametersdirectly, without the nestedfunctionwrapper. - Content types are prefixed:
input_text,input_image,input_filefor user inputs;output_textandoutput_refusalfor model outputs. - Event-typed streaming: Streaming uses named events (
response.created,response.output_text.delta,response.completed) rather than choice deltas. max_output_tokensinstead ofmax_tokens: Caps the total of visible and reasoning tokens.
Headers
Your Requesty API key. Alternative to the standard Authorization: Bearer header.
Body
The model to use for the response. To route OpenAI models through their native Responses API, use the openai-responses/ prefix (e.g. openai-responses/gpt-5).
"openai-responses/gpt-5"
Text, image, or file inputs to the model. Either a plain string or an array of typed input items.
"Tell me a three sentence bedtime story about a unicorn."
Inserts a system (or developer) message as the first item in the model's context.
Upper bound for the number of tokens that can be generated, including visible output tokens and reasoning tokens.
x >= 1If true, the response is streamed to the client as it is generated using server-sent events.
Sampling temperature between 0 and 2. Higher values produce more random output.
0 <= x <= 2Nucleus sampling: consider tokens with cumulative probability mass up to top_p.
0 <= x <= 1Whether to allow the model to run tool calls in parallel.
Controls which (if any) tool is called by the model.
auto, none, required Tools the model may call.
Reasoning configuration for reasoning-capable models.
Output text configuration, including structured output format.
Specify additional output data to include in the model response.
Set of key-value pairs that can be attached to the request.
Whether to store the generated model response for later retrieval via API.
The truncation strategy to use for the model response.
A unique identifier representing your end-user.
Response
Response
Unique identifier for this response.
Object type.
response Unix timestamp (in seconds) of when the response was created.
Model ID used to generate the response.
Status of the response generation.
completed, failed, in_progress, incomplete Output items from the model. Typically one or more message, function_call, or reasoning items.