Let’s be honest: the local AI landscape is plagued by “Dumb Pipes.”
You know the drill: you run a proxy, it takes a request, matches a regex, and forwards it to some model. It works. It’s fast. But it’s also deaf, dumb, and blind.
It doesn’t remember that you prefer Claude for Python but Llama 3 for creative writing. It doesn’t know that your local Ollama instance is currently struggling or that a specific provider is timing out. It certainly doesn’t care that you’re about to hit a hard API quota.
switchAILocal started as one of those pipes. But today, we’re changing the game. We’re killing the pipe and building a Gateway.
The Paradigm Shift: Stateless to Stateful
The biggest maturity milestone for any infrastructure is the move from Stateless (fresh start every time) to Stateful (learning and remembering).
We are introducing the Option C Hybrid Architecture. This isn’t just a fancy name—it’s a philosophical stance on how AI infrastructure should behave.
graph TD
User[User Request] --> Proxy[Go Proxy Core]
Proxy -->|Reflex Tier| Reflex[Regex Matcher]
Proxy -->|Semantic Tier| Brain[Intelligence Service]
Brain -->|Query| State[(State Box)]
State -->|Context| Brain
Brain -->|Routing Decision| Proxy
Proxy -->|Forward| Model[AI Provider]
The 3 Core Pillars
- Plugin Independence (The Spine): The core proxy functionality must never break. If the brain crashes (or is simply turned off), the spine keeps you walking.
- Graceful Degradation (The Safety Net): Features are tiers, not dominoes.
- Tier 3 (Cognitive): “Let me think about this…” (Smart but slow)
- Tier 2 (Semantic): “I’ve seen this before!” (Fast and smart)
- Tier 1 (Reflex): “Just do it.” (Instant) If Tier 3 fails, we drop to Tier 2. If that fails, Tier 1 takes over. You never get a 500 decision.
- Service Independence (The Opt-In): You don’t need the brain. By default, switchAILocal remains the lightweight, blazing-fast tool you love. But flip the
intelligence: trueswitch, and the system wakes up.
What’s New?
We’ve rewritten the rulebook with a new Intelligence Service in Go:
- Discovery Service: It doesn’t wait for config files. It proactively scans your ports (Ollama, LM Studio) and finds models.
- Dynamic Matrix: Static configs are dead. The router builds a living routing matrix based on what’s actually alive.
- Semantic Tier: We embedded a vector engine directly into the binary. It “understands” your prompt’s intent in <20ms.
The Agentic Difference
This is the difference between a tool handling your traffic and an agent managing your workflow.
Your proxy should know your intent (“This looks like a complex refactoring task”). It should check its memory (“The user usually selects Sonnet 3.5 for this”). It should verify health (“Sonnet is active, but my quota is low; let’s verify if DeepSeek won’t do the job effectively”).
And then? It should just work.
This is Part 1 of a series. Next up: A deep dive into the 20ms routing engine that makes this possible.
Sebastian Schkudlara
The Journey to OpenClaw: Rebranding the Local AI Agent