Skip to main content

Chat Completion

/v1/chat/completions

HTTP Request

curl https://api.apertis.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <APERTIS_API_KEY>" \
-d '{
"model": "<MODEL_ALIAS>",
"messages": [
{
"role": "system",
"content": "<MESSAGES>"
}
]
}'
  • <APERTIS_API_KEY>: Your API key
  • <MODEL_ALIAS>: The alias of the model to use
  • <MESSAGES>: The messages to send to the model

Optional Headers

HeaderTypeDescription
X-TimeoutintegerTotal request timeout in milliseconds. Caps the entire request lifecycle including all channel retries. Range: 5,000–300,000 ms. Default: 60,000 ms (60s). See Request Timeout.

Optional Parameters

ParameterTypeDescription
temperaturenumberSampling temperature (0-2). Default: 1
max_tokensintegerMaximum tokens in the response
top_pnumberNucleus sampling threshold (0-1)
streambooleanEnable streaming. Default: false
compressionobjectContext compression configuration

Context Compression

Add a compression object to automatically summarize older conversation history and reduce token usage for long conversations:

curl https://api.apertis.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <APERTIS_API_KEY>" \
-d '{
"model": "gpt-4.1-mini",
"messages": [{"role": "user", "content": "Hello!"}],
"compression": {"enabled": true, "model": "gpt-4.1-mini"}
}'

See Context Compression for full documentation.

Request Timeout

Use the X-Timeout header to limit the total time a request can take, including all internal channel retries. This prevents long waits when multiple upstream providers are slow or unavailable.

curl https://api.apertis.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <APERTIS_API_KEY>" \
-H "X-Timeout: 20000" \
-d '{
"model": "gemini-3-flash-preview",
"messages": [{"role": "user", "content": "Hello!"}]
}'
from openai import OpenAI

client = OpenAI(
api_key="your-apertis-key",
base_url="https://api.apertis.ai/v1"
)

response = client.chat.completions.create(
model="gemini-3-flash-preview",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={"X-Timeout": "20000"} # 20 seconds
)
SettingValue
Default60,000 ms (60 seconds)
Minimum5,000 ms (5 seconds)
Maximum300,000 ms (5 minutes)

When the timeout is exceeded, the API returns HTTP 408 with diagnostic headers:

HeaderDescription
X-Timeout-AttemptsNumber of upstream channels attempted
X-Timeout-Elapsed-MsActual elapsed time in milliseconds
{
"error": {
"message": "Total request timeout (20000ms) exceeded after 3 channel attempts. Set X-Timeout header to adjust.",
"type": "timeout",
"code": "request_timeout"
}
}
Streaming & Thinking Models

For streaming requests, the timeout only applies to the initial connection phase. Once the first response chunk arrives, the timer is stopped and the stream can run as long as needed. This ensures thinking models (e.g., claude-opus-4-6) are not interrupted mid-generation.

See also: Fallback Models for per-model fallback timeout configuration.