ai-infrastructure, open-source,

Every AI Provider Returns Errors Differently. That's a Huge Problem.

Sebastian Schkudlara Sebastian Schkudlara Follow Mar 29, 2026 · 4 mins read
Every AI Provider Returns Errors Differently. That's a Huge Problem.
Share this

The Error Format Minefield

Let me describe a scenario that has probably already bitten you—or will very soon.

You build an AI agent. It talks to the OpenAI API. Everything works perfectly. Your error handling is clean: you catch the standard {"error": {"message": "...", "type": "...", "code": "..."}} schema, log it, retry if appropriate, and move on.

Then you decide to add a second provider. Maybe you want Ollama for local inference, or Anthropic for Claude, or MiniMax for a cost-effective alternative. You point your agent at a proxy that routes to all of them.

And everything works great—until the first time a provider fails in a non-standard way.

Here’s what a remote Ollama endpoint actually returns when it rate-limits you:

{"error": "you (sebastianschkudlara) have reached your weekly usage limit, upgrade for higher limits"}

Notice something? That’s not the OpenAI error schema. There’s no type field. No code field. The error key maps to a raw string instead of an object.

Your agent’s error parser, which expects response.error.message, hits a TypeError. The entire agent workflow crashes. If this was running in the background at 2 AM, you wake up to silence and a dead process.


It Gets Worse at Scale

This isn’t just an Ollama problem. Every provider has its own creative interpretation of how to format errors:

  • Anthropic returns errors with {"type": "error", "error": {"type": "rate_limit_error", "message": "..."}} — a completely different nesting structure
  • Some self-hosted endpoints return raw HTML error pages when they crash, because the web server (nginx, caddy) catches the failure before the model server does
  • Custom vLLM deployments sometimes return plain-text stack traces with a 500 status code

In a multi-provider setup, your error handling code needs to account for all of these formats. That’s a maintenance nightmare. Every new provider you add potentially introduces a new error schema that could crash your parsing logic.

And if you’re using a Python-based proxy like LiteLLM, you’re depending on that proxy to catch and normalize every possible provider error format. Given that LiteLLM supports 100+ providers, the surface area for edge cases is enormous. GitHub issue discussions regularly surface cases where specific provider errors slip through unnormalized.


The Schema Translation Layer

This is one of those problems that sounds trivial until you’re debugging it at 3 AM. The solution is conceptually simple: your proxy should intercept every upstream response and guarantee that what reaches your application always conforms to a single, predictable schema.

SwitchAILocal does this natively. During a recent load test, my remote Ollama endpoint hard-banned my API key with that custom error string. But my agents never saw the raw response. The proxy caught it at the edge and transformed it into a clean OpenAI-compatible error:

{
  "error": {
    "message": "Ollama returned status 429: {\"error\":\"you have reached your weekly usage limit\"}",
    "type": "server_error",
    "code": "internal_server_error"
  }
}

My agent’s existing error handler parsed this without issue. It logged the failure, backed off, and continued processing on the next available provider. No crash. No custom error-handling branch for Ollama. The same code path that handles OpenAI errors also handles Ollama errors, Anthropic errors, and any future provider I might add.


Why This Matters for Agentic Workflows

Modern AI agents are increasingly multi-step and multi-provider. A single agent task might:

  1. Query a fast model for planning (MiniMax)
  2. Call a reasoning model for complex analysis (Claude)
  3. Hit a local model for code generation (Ollama)
  4. Use an embedding endpoint for RAG retrieval

If any of those four providers returns a non-standard error format, and your proxy doesn’t normalize it, the entire chain breaks.

In a 10-step agentic workflow where each step adds 40–200ms of proxy overhead (the measured range for Python-based gateways), error normalization needs to happen without adding even more latency. SwitchAILocal handles this in the Go request pipeline—same Goroutine, same memory allocation, no extra round-trip. The normalization overhead is effectively zero.


Write Your Error Handler Once

The point isn’t that other gateways can’t do error normalization. Many do some form of it. The point is that when you combine schema translation with a 30 MB memory footprint, sub-millisecond proxy overhead, and a native circuit breaker that prevents cascading failures—you get an infrastructure layer that quietly handles the ugly parts of multi-provider AI routing so you can focus on building the actual agent logic.

SwitchAILocal is open-source and growing fast. If you’ve been bitten by provider error inconsistencies, or you just want a proxy that doesn’t eat a gigabyte of RAM, come check it out:

PRs, issues, and stars are all welcome. Let’s make multi-provider AI routing less painful for everyone.

Bridging Architecture & Execution

Struggling to implement Agentic AI or Enterprise Microservices in your organization? I help CTOs and technical leaders transition from architectural bottlenecks to production-ready systems.

View My Full Profile & Portfolio
Sebastian Schkudlara
Written by Sebastian Schkudlara Follow View Profile →
Hi, I am Sebastian Schkudlara, the author of Jevvellabs. I hope you enjoy my blog!