Requesty normalizes the schema across models and providers, so you don’t waste time with custom integrations.

Endpoints

Requesty provides two main endpoints:

Chat Completions (/v1/chat/completions)

For generating text completions and conversations with AI models.

Embeddings (/v1/embeddings)

For creating vector embeddings from text, which can be used for semantic search, similarity matching, and other AI applications.

Chat Completions Request Structure

Your request body to /v1/chat/completions closely follows the OpenAI Chat Completion schema:
  • Required Fields:
    • messages: An array of message objects with role and content
    • Roles can be user, assistant, system, or tool
    • model: The model name. If omitted, defaults to the user’s or payer’s default model. Here is a full list of the supported models
  • Optional Fields:
    • prompt: Alternative to messages for some providers.
    • stream: A boolean to enable Server-Sent Events (SSE) streaming responses.
    • max_tokens, temperature, top_p, etc.: Standard language model parameters.
    • tools / functions : Allows function calling with a schema defined. See OpenAI’s function calling documentation for the structure of these requests.
    • tool_choice : Specifies how tool calling should be handled.
    • response_format : For structured responses (some models only).

Example Request Body

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "max_tokens": 200,
  "temperature": 0.7,
  "stream": true,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City and state"},
            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
          },
          "required": ["location"]
        }
      }
    }
  ]
}
Here, we also provide a tool (get_current_weather) that the model can call if it decides the user request involves weather data. Some request fields require a different function, for example if you use response_format you’ll need to update the request to client.beta.chat.completions.parse and you may want to use the Pydantic or Zod format for your structure.

Response Structure

The response is normalized to an OpenAI-style ChatCompletion object:
  1. Streaming: If stream: true, responses arrive incrementally as SSE events with data: lines. See Streaming for documentation on streaming.
  2. Function Calls (Tool Calls): If the model decides to call a tool, it will return a function_call in the assistant message. You then execute the tool, append the tool’s result as a role: "tool" message, and send a follow-up request. The LLM will then integrate the tool output into its final answer.

Non-Streaming Response Example

{
  "id": "chatcmpl-xyz123",
  "object": "chat.completion",
  "created": 1687623702,
  "model": "openai/gpt-4o",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 50,
    "total_tokens": 60
  },
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ]
}
Function Call Example: If the model decides it needs the weather tool:
{
  "id": "chatcmpl-abc456",
  "object": "chat.completion",
  "created": 1687623800,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_current_weather",
          "arguments": "{ "location": "Boston, MA"}"
        }
      },
      "finish_reason": "function_call"
    }
  ]
}
You would then call the get_current_weather function externally, get the result, and send it back as:
{
  "model": "openai/gpt-4o",
  "messages": [
    {"role": "user", "content": "What is the weather in Boston?"},
    {
      "role": "assistant",
      "content": null,
      "function_call": {
        "name": "get_current_weather",
        "arguments": "{ "location": "Boston, MA" }"
      }
    },
    {
      "role": "tool",
      "name": "get_current_weather",
      "content": "{"temperature": "22", "unit": "celsius", "description": "Sunny"}"
    }
  ]
}
The next completion will return a final answer integrating the tool’s response.

Embeddings Request Structure

Your request body to /v1/embeddings follows the OpenAI Embeddings schema:
  • Required Fields:
    • input: The text to embed. Can be a string, array of strings, array of tokens, or array of token arrays
    • model: The model name to use for embedding generation (e.g., openai/text-embedding-3-small)
  • Optional Fields:
    • dimensions: The number of dimensions for the output embeddings (only supported in text-embedding-3 and later models)
    • encoding_format: The format to return embeddings in (float or base64, defaults to float)
    • user: A unique identifier representing your end-user

Example Embeddings Request Body

{
  "model": "openai/text-embedding-3-small",
  "input": "The food was delicious and the service was excellent.",
  "encoding_format": "float"
}
For multiple texts:
{
  "model": "openai/text-embedding-3-small",
  "input": [
    "The food was delicious and the service was excellent.",
    "The restaurant had poor service and cold food.",
    "Amazing atmosphere with friendly staff."
  ],
  "encoding_format": "float"
}

Embeddings Response Structure

The response is normalized to an OpenAI-style Embedding object:
{
  "data": [
    {
      "embedding": [0.0023064255, -0.009327292, ...],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "openai/text-embedding-3-small",
  "object": "list",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}