Speech to Text
Create Transcription
Transcribes audio into text using a speech-to-text model. The audio file is sent as multipart/form-data.
POST
Create transcription
Transcribe audio into text using OpenAIโs speech-to-text models through Requestyโs routing.Documentation Index
Fetch the complete documentation index at: https://docs.requesty.ai/llms.txt
Use this file to discover all available pages before exploring further.
Base URL
Authentication
Include your Requesty API key in the request headers:Example Request
The endpoint acceptsmultipart/form-data. Send the audio as the file field and the model identifier as the model field.
OpenAI SDK
The endpoint is fully compatible with the OpenAI SDK. Just point the client at Requestyโs base URL:Supported Models
Browse the full catalog on the Transcription model library. Today the available transcription models are all from OpenAI:| Model | Best for | Billing |
|---|---|---|
openai/gpt-4o-transcribe | Highest accuracy, multilingual | Token based |
openai/gpt-4o-mini-transcribe | Fast and cost efficient | Token based |
openai/whisper-1 | Drop in replacement for legacy Whisper | Duration based (per second of audio) |
openai/gpt-4o-mini-transcribe-2025-12-15) are also available when you need a stable model version.
Supported Audio Formats
Thefile field accepts the following formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
The maximum upload size per request is 32 MB. For longer recordings, split the audio into chunks and concatenate the resulting transcripts on your side.
Language Hint
Setlanguage to the ISO 639-1 code of the spoken language to improve accuracy and latency. When omitted, the model auto detects the language.
Response Format
The response is always a JSON object with the transcribedtext and a usage block. The usage block has two possible shapes depending on the model:
Token usage (gpt-4o-transcribe, gpt-4o-mini-transcribe)
Duration usage (whisper-1)
type discriminator to decide how to render or aggregate usage on your side.
Pricing
Transcription models are priced either per token of input audio (forgpt-4o-transcribe and gpt-4o-mini-transcribe) or per second of input audio (for whisper-1). The exact rate per model is on the Transcription model library. Charges appear in your usage dashboard immediately after the request completes.
Error Handling
The API returns standard HTTP status codes:200Success400Bad Request (missingfileormodel, unsupported audio format)401Unauthorized (invalid API key)404Model not found or not approved for your organization413Payload Too Large (audio file exceeds 32 MB)429Rate limited500Internal Server Error
This endpoint is fully compatible with the OpenAI Audio Transcriptions API. You can use the OpenAI SDKโs
client.audio.transcriptions.create() method directly.Authorizations
API key for authentication
Body
multipart/form-data
The audio file to transcribe. Supported formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Maximum upload size is 32 MB.
The speech-to-text model to use, prefixed with the provider slug. Currently only OpenAI models are supported.
Example:
"openai/gpt-4o-transcribe"
The language of the input audio in ISO 639-1 format (for example, en, fr, ja). Supplying the language improves accuracy and latency. Auto-detected when omitted.
Last modified on May 2, 2026
Create transcription