Features

Fallbacks

Automatically retry with alternative models when the primary returns an error or times out. Never lose a request to provider outages.

Automatic fallbacks

When using model: "auto", fallbacks are enabled automatically. If the selected model returns a 503 or times out, Auraon transparently retries with the next best model at no additional latency.

Manual fallback chains

For specific model requirements, use the x-auraon-fallbacks header to define your own fallback chain:

bash

curl https://api.auraon.ai/v1/chat/completions \
  -H "Authorization: Bearer ar-your-key" \
  -H "x-auraon-fallbacks: gpt-4o,claude-sonnet-4-6,llama-3-3-70b" \
  -d '{"model": "claude-opus-4", "messages": [...]}'

If claude-opus-4 fails, Auraon tries gpt-4o, then claude-sonnet-4-6, and so on.

Fallback triggers

503 / 502

Provider is down or overloaded

429

Primary model rate limit exceeded

Timeout

Response exceeds configured timeout (default: 30s)

Context overflow

Prompt exceeds model's context window

Retry configuration

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_headers={
        "x-auraon-fallbacks": "claude-sonnet-4-6,llama-3-3-70b",
        "x-auraon-timeout": "15000",   # 15 second timeout
        "x-auraon-max-retries": "2",
    }
)