Home/Docs/Failover Policy

Failover Policy & Circuit Breaker Mechanics

Selixes employs a dual-stage circuit breaker architecture designed to handle upstream outages, rate limits, and latency spikes without failing back to the calling client application.

How the Circuit Breaker Works

Every upstream request is monitored for status codes (5xx, 429) and execution duration. If the rolling error rate exceeds the trip threshold within a configured sliding window, the circuit transitions from Closed to Open.

  • Trip Threshold: Default is 40% error rate over a rolling 10-second window.
  • Recovery Period: After 30 seconds of outage isolation, the circuit goes Half-Open, dispatching a single probe request to check upstream recovery.

Configuring Custom Failover Headers

Control the failover behavior and retry policy on a per-request basis using standard gateway headers:

# 1. Enable circuit breaker and standby routing
x-selixes-failover-policy: failover-to-standby

# 2. Configure max retries for the primary provider
x-selixes-max-retries: 3

# 3. Configure connection timeouts in milliseconds
x-selixes-timeout-ms: 8000

Standby Provider Fallbacks

When the primary provider (e.g. OpenAI GPT-4o) fails, Selixes seamlessly translates request bodies and paths to standby providers, checking targets in priority order:

  1. Primary Target: OpenAI gpt-4o
  2. Standby Target A: Anthropic claude-3-5-sonnet-latest
  3. Standby Target B: Google Gemini gemini-1.5-pro
  4. Emergency Fallback: Local continuity node (Ollama llama3.1:8b)