The Error Format Minefield
Let me describe a scenario that has probably already bitten you—or will very soon.
You build an AI agent. It talks to the OpenAI API. Everything works perfectly. Your error handling is clean: you catch the standard {"error": {"message": "...", "type": "...", "code": "..."}} schema, log it, retry if appropriate, and move on.
Then you decide to add a second provider. Maybe you want Ollama for local inference, or Anthropic for Claude, or MiniMax for a cost-effective alternative. You point your agent at a proxy that routes to all of them.
And everything works great—until the first time a provider fails in a non-standard way.
Here’s what a remote Ollama endpoint actually returns when it rate-limits you:
{"error": "you (sebastianschkudlara) have reached your weekly usage limit, upgrade for higher limits"}
Notice something? That’s not the OpenAI error schema. There’s no type field. No code field. The error key maps to a raw string instead of an object.
Your agent’s error parser, which expects response.error.message, hits a TypeError. The entire agent workflow crashes. If this was running in the background at 2 AM, you wake up to silence and a dead process.
It Gets Worse at Scale
This isn’t just an Ollama problem. Every provider has its own creative interpretation of how to format errors:
- Anthropic returns errors with
{"type": "error", "error": {"type": "rate_limit_error", "message": "..."}}— a completely different nesting structure - Some self-hosted endpoints return raw HTML error pages when they crash, because the web server (nginx, caddy) catches the failure before the model server does
- Custom vLLM deployments sometimes return plain-text stack traces with a 500 status code
In a multi-provider setup, your error handling code needs to account for all of these formats. That’s a maintenance nightmare. Every new provider you add potentially introduces a new error schema that could crash your parsing logic.
And if you’re using a Python-based proxy like LiteLLM, you’re depending on that proxy to catch and normalize every possible provider error format. Given that LiteLLM supports 100+ providers, the surface area for edge cases is enormous. GitHub issue discussions regularly surface cases where specific provider errors slip through unnormalized.
The Schema Translation Layer
This is one of those problems that sounds trivial until you’re debugging it at 3 AM. The solution is conceptually simple: your proxy should intercept every upstream response and guarantee that what reaches your application always conforms to a single, predictable schema.
SwitchAILocal does this natively. During a recent load test, my remote Ollama endpoint hard-banned my API key with that custom error string. But my agents never saw the raw response. The proxy caught it at the edge and transformed it into a clean OpenAI-compatible error:
{
"error": {
"message": "Ollama returned status 429: {\"error\":\"you have reached your weekly usage limit\"}",
"type": "server_error",
"code": "internal_server_error"
}
}
My agent’s existing error handler parsed this without issue. It logged the failure, backed off, and continued processing on the next available provider. No crash. No custom error-handling branch for Ollama. The same code path that handles OpenAI errors also handles Ollama errors, Anthropic errors, and any future provider I might add.
Why This Matters for Agentic Workflows
Modern AI agents are increasingly multi-step and multi-provider. A single agent task might:
- Query a fast model for planning (MiniMax)
- Call a reasoning model for complex analysis (Claude)
- Hit a local model for code generation (Ollama)
- Use an embedding endpoint for RAG retrieval
If any of those four providers returns a non-standard error format, and your proxy doesn’t normalize it, the entire chain breaks.
In a 10-step agentic workflow where each step adds 40–200ms of proxy overhead (the measured range for Python-based gateways), error normalization needs to happen without adding even more latency. SwitchAILocal handles this in the Go request pipeline—same Goroutine, same memory allocation, no extra round-trip. The normalization overhead is effectively zero.
Write Your Error Handler Once
The point isn’t that other gateways can’t do error normalization. Many do some form of it. The point is that when you combine schema translation with a 30 MB memory footprint, sub-millisecond proxy overhead, and a native circuit breaker that prevents cascading failures—you get an infrastructure layer that quietly handles the ugly parts of multi-provider AI routing so you can focus on building the actual agent logic.
SwitchAILocal is open-source and growing fast. If you’ve been bitten by provider error inconsistencies, or you just want a proxy that doesn’t eat a gigabyte of RAM, come check it out:
- 📚 Documentation: ail.traylinx.com/introduction
- 💻 GitHub: github.com/traylinx/switchAILocal
PRs, issues, and stars are all welcome. Let’s make multi-provider AI routing less painful for everyone.
Sebastian Schkudlara