Features
Fallbacks
Automatically retry with alternative models when the primary returns an error or times out. Never lose a request to provider outages.
Automatic fallbacks
When using model: "auto", fallbacks are enabled automatically. If the selected model returns a 503 or times out, Auraon transparently retries with the next best model at no additional latency.
Manual fallback chains
For specific model requirements, use the x-auraon-fallbacks header to define your own fallback chain:
bash
curl https://api.auraon.ai/v1/chat/completions \
-H "Authorization: Bearer ar-your-key" \
-H "x-auraon-fallbacks: gpt-4o,claude-sonnet-4-6,llama-3-3-70b" \
-d '{"model": "claude-opus-4", "messages": [...]}'If claude-opus-4 fails, Auraon tries gpt-4o, then claude-sonnet-4-6, and so on.
Fallback triggers
503 / 502Provider is down or overloaded
429Primary model rate limit exceeded
TimeoutResponse exceeds configured timeout (default: 30s)
Context overflowPrompt exceeds model's context window
Retry configuration
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_headers={
"x-auraon-fallbacks": "claude-sonnet-4-6,llama-3-3-70b",
"x-auraon-timeout": "15000", # 15 second timeout
"x-auraon-max-retries": "2",
}
)