These tokens offer insight into the model’s reasoning process, providing a transparent view of its thought steps. Since Reasoning Tokens are considered output tokens, they are billed accordingly.

To enable reasoning, specify reasoning_effort with one of the supported values in your API request.

Notes

  • OpenAI does NOT share the actual reasoning tokens. You will not see them in the response.
  • Deepseek reasoning models enable reasoning automatically, you don’t need to specify anything in the request to enable that.
  • When using Deepseek and Anthropic, the reasoning content in the response will be under ‘reasoning_content’.

Reasoning effort values

Anthropic expects a specific number that sets the upper limit of thinking tokens. The limit must be less than the specified max tokens value.

OpenAI models expect one of the following ‘effort’ values:

  • low
  • medium
  • high

Google Gemini expects a specific number when using Vertex AI, and supports OpenAI’s reasoning efforts via the Google AI Studio (their OpenAI-compatible API).

Requesty introduces a new ‘effort’ value: ‘max’ to support the upper limit for models that support budgets.

When using OpenAI via Requesty:

  • If the client specifies a standard reasoning effort string, i.e. “low”/“medium”/“high”, Requesty forwards the same value to OpenAI.
  • If the client specifies a the ‘max’ reasoning effort string, Requesty forwards the value ‘high’ to OpenAI.
  • If the client specifies a reasoning budget string (e.g. “10000”), Requesty converts it to an effort, based on the conversion table below.

Converstion table from budget to effort:

  • 0-1024 -> “low”
  • 1025-8192 -> “medium”
  • 8193 or higher -> “high”

When using Anthropic via Requesty:

  • If the client specifies a reasoning effort string (“low”/“medium”/“high”/“max”), Requesty converts it to a budget, based on the conversion table below.
  • If the client specifies a reasoning budget string (e.g. “10000”), Requesty passes this value to Google. If the budget is larger than the model’s maximum output tokens, it will automatically be reduced to stay within that token limit.

Converstion table from effort to budget:

  • “low” -> 1024
  • “medium” -> 8192
  • “high” -> 16384
  • “max” -> max output tokens for model minus 1 (i.e. 63999 for Sonnet 3.7 or 4, 31999 for Opus 4)

When using Vertex AI via Requesty:

  • If the client specifies a reasoning effort string (“low”/“medium”/“high”/“max”), Requesty converts it to a budget, based on the conversion table below.
  • If the client specifies a reasoning budget string (e.g. “10000”), Requesty passes this value to Google. If the budget is larger than the model’s maximum output tokens, it will automatically be reduced to stay within that token limit.

Converstion table from effort to budget:

  • “low” -> 1024
  • “medium” -> 8192
  • “high” -> 24576
  • “max” -> max output tokens for model

This conversion table is compatible with the Google AI Studio documentation.

When using Google AI Studio via Requesty:

Same as using OpenAI. See above.

Reasoning code example

For both tests, you can use either an OpenAI or Anthropic reasoning model, for example:

  • “openai/o3-mini”
  • “anthropic/claude-sonnet-4-0”

Javascript example using reasoning effort

import OpenAI from 'openai';

const requesty_api_key = "YOUR_REQUESTY_API_KEY"  # Safely load your API key

const client = new OpenAI({
    apiKey: requesty_api_key,
    baseURL: 'https:/router.requesty.ai/v1',
});

async function testReasoningEffort() {
    try {
        const prompt = `
            Write a bash script that takes a matrix represented as a string with
            format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.
        `.trim();

        console.log('Sending request to reasoning model...');

        const completion = await client.chat.completions.create({
            model: "openai/o3-mini",
            reasoning_effort: "medium",
            messages: [
                {
                    role: "user",
                    content: prompt
                }
            ]
        });

        console.log('\nCompletion Response:');
        console.log('-------------------');
        if (completion.choices[0]?.message?.content) {
            console.log(completion.choices[0].message.content);
        }

        console.log('\nToken Usage Details:');
        console.log('-------------------');
        if (completion.usage) {
            const usageDetails = {
                prompt_tokens: completion.usage.prompt_tokens,
                completion_tokens: completion.usage.completion_tokens,
                total_tokens: completion.usage.total_tokens
            };
            console.log(JSON.stringify(usageDetails, null, 2));

            // Log specific reasoning token details if available
            if ('completion_tokens_details' in completion.usage) {
                console.log('\nReasoning Token Details:');
                console.log('----------------------');
                console.log(JSON.stringify(completion.usage.completion_tokens_details, null, 2));
            }
        }

    } catch (error) {
        console.error('Error:', error);
    }
}

testReasoningEffort();

Python example using reasoning budget

import json
import openai

# Safely load your API key
requesty_api_key = "YOUR_REQUESTY_API_KEY"

client = openai.OpenAI(
    api_key=requesty_api_key,
    base_url='https://router.requesty.ai/v1'
)

def test_reasoning_budget():
    try:
        prompt = """
            Write a bash script that takes a matrix represented as a string with
            format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.
        """.strip()

        print('Sending request to reasoning model...')

        completion = client.chat.completions.create(
            model="openai/o3-mini",
            reasoning_effort="10000",
            messages=[
                {
                    "role": "user",
                    "content": prompt
                }
            ]
        )

        # Log the completion details
        print('\nCompletion Response:')
        print('-------------------')
        if completion.choices[0].message.content:
            print(completion.choices[0].message.content)

        # Log token usage details
        print('\nToken Usage Details:')
        print('-------------------')
        if completion.usage:
            usage_details = {
                "prompt_tokens": completion.usage.prompt_tokens,
                "completion_tokens": completion.usage.completion_tokens,
                "total_tokens": completion.usage.total_tokens
            }
            print(json.dumps(usage_details, indent=2))

            # Log specific reasoning token details if available
            if completion.usage.completion_tokens_details:
                print('\nReasoning Token Details:')
                print('----------------------')
                print(completion.usage.completion_tokens_details)

    except Exception as error:
        print(f'Error: {str(error)}')

if __name__ == '__main__':
    test_reasoning_budget()