Create Speech
Synthesizes audio from input text using a text-to-speech model. By default the response is a binary audio stream in the requested format. When stream_format is sse, the response is a Server-Sent Events stream of speech.audio.delta and speech.audio.done events with base64-encoded audio chunks.
Synthesize natural sounding speech from text using OpenAIโs text-to-speech models through Requestyโs routing.Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
Base URL
Authentication
Include your Requesty API key in the request headers:Example Request
speech.mp3.
OpenAI SDK
The endpoint is fully compatible with the OpenAI SDK. Just point the client at Requestyโs base URL:Supported Models
Browse the full catalog on the Speech model library. Today the available speech models are all from OpenAI:| Model | Best for | Notes |
|---|---|---|
openai/gpt-4o-mini-tts | Most use cases | Highest quality. Supports instructions and SSE streaming. |
openai/tts-1 | Real time, low latency | Lightweight, no instructions, no SSE. |
openai/tts-1-hd | Higher fidelity offline use | No instructions, no SSE. |
openai/gpt-4o-mini-tts-2025-12-15) are also available when you need a stable model version.
Voices
The following voices are available across the supported models. Audio previews are on the OpenAI text to speech guide.alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse
Voice Steering with instructions
Use instructions to steer tone, accent, pacing, and emotion. Only openai/gpt-4o-mini-tts supports this field. It is ignored by openai/tts-1 and openai/tts-1-hd.
Output Formats
Setresponse_format to control the audio container of the returned bytes.
| Format | Content-Type | Notes |
|---|---|---|
mp3 (default) | audio/mpeg | Compressed. Good for storage and general playback. |
opus | audio/opus | Compressed, very low latency. Good for streaming. |
aac | audio/aac | Compressed, broad device compatibility. |
flac | audio/flac | Lossless compression. |
wav | audio/wav | Uncompressed. Easy to decode. |
pcm | audio/pcm | Raw 24 kHz, 16 bit, mono PCM samples. Lowest latency for real time pipelines. |
Streaming with Server-Sent Events
Setstream_format to sse to receive a Server-Sent Events stream of speech.audio.delta events with base64 encoded audio chunks, terminated by a speech.audio.done event with usage information. Only openai/gpt-4o-mini-tts supports SSE.
stream_format is optional and most clients should omit it. Without it, every supported model returns the raw audio bytes in the requested response_format. Set stream_format to sse only with openai/gpt-4o-mini-tts to opt in to the SSE event stream. Setting sse with openai/tts-1 or openai/tts-1-hd, or audio with openai/gpt-4o-mini-tts, returns a 400.delta.audio field with base64 and concatenate the bytes to get the full audio payload.
Speed
Usespeed to scale playback (0.25 to 4.0, default 1.0).
Pricing
Speech models are priced per character of input for character billed models, and per token for token billed models. The exact rate per model is on the Speech model library. Charges appear in your usage dashboard immediately after the request completes.Error Handling
The API returns standard HTTP status codes:200Success400Bad Request (invalid parameters, unsupportedresponse_format, or unsupportedstream_formatfor the chosen model)401Unauthorized (invalid API key)404Model not found or not approved for your organization429Rate limited500Internal Server Error
client.audio.speech.create() method directly.Authorizations
API key for authentication
Body
The text-to-speech model to use, prefixed with the provider slug. Currently only OpenAI models are supported.
"openai/gpt-4o-mini-tts"
The text to synthesize into speech. Maximum length is 4096 characters.
4096"The quick brown fox jumped over the lazy dog."
The voice to use when generating the audio.
alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse "alloy"
Additional steering for the voice (tone, accent, pacing). Supported by openai/gpt-4o-mini-tts only. Ignored by openai/tts-1 and openai/tts-1-hd.
"Speak in a warm, friendly tone."
The audio container format for the synthesized output.
mp3, opus, aac, flac, wav, pcm "mp3"
Playback speed of the generated audio. 1.0 is normal speed.
0.25 <= x <= 41
Optional and not recommended for most clients. Omit this field to get the default response shape: raw audio bytes in the requested response_format. Set to sse only with openai/gpt-4o-mini-tts to receive a Server-Sent Events stream of speech.audio.delta and speech.audio.done events with base64-encoded audio chunks. The router rejects sse with openai/tts-1 or openai/tts-1-hd, and rejects audio with openai/gpt-4o-mini-tts.
Response
Audio bytes stream (when stream_format is audio) or Server-Sent Events stream (when stream_format is sse).
The response is of type file.