Cortex Phase 2: 20ms to Understand Your Soul

Routing should feel instantaneous. If you have to wait for your router to “think,” you’ve already lost the game.

The mandate for Cortex Router Phase 2 was simple but brutal: Route intelligently in under 20 milliseconds.

We didn’t want standard, brittle “keyword matching.” We wanted true Semantic Understanding. But we also couldn’t afford the massive latency of an LLM call for every single request.

The solution? A biological architecture. We mimicked the human brain’s own efficiency layers.

The 4 Tiers of Cognition

We built a tiered system where every request fights its way up the evolutionary ladder.

Tier 0: The Semantic Cache (`< 1ms`)

“Déjà vu.” Before doing any work, the system hits the vector cache. If we’ve seen a semantically similar prompt (Cosine Similarity > 0.95), we don’t think. We react. We replay the exact successful routing decision from last time.

Latency: Net Zero.

Tier 1: The Reflex Tier (`< 1ms`)

“The Reptilian Brain.” If the cache misses, we drop to optimized Regex patterns for immediate safety.

Security: Catches PII (API keys, email addresses) before they ever hit a model.
Safety: Spots massive binary pastes that would choke an LLM.
Overrides: Respects explicit model="gpt-4" demands from the user.

This is the fail-safe. If everything else burns, the Reflex Tier still keeps the traffic moving.

Tier 2: The Semantic Tier (`< 20ms`)

“The Instinct.” This is the breakthrough. We run a quantised embedding model via ONNX Runtime directly inside the Go binary. We vectorise your prompt and compare it against a pre-computed registry of 15+ Intents (Coding, Creative, Reasoning) and Skills (SQL Optimization, Go Refactoring).

If we find a high-confidence match (> 0.85), we bypass the classification LLM entirely.

“Write a Python script…” -> Vector Match: CODING -> Route to Claude 3.5 Sonnet.
Cost: $0.00. Time: 12ms.

Tier 3: The Cognitive Tier (`200ms+`)

“The Prefrontal Cortex.” Only when the prompt is truly ambiguous do we wake up the Router LLM (Gemma 2 or Haiku). But we changed the game here too. We don’t just ask for a classification. We implemented a Verification Loop.

If the Router is unsure (Confidence < 0.60), it triggers a “Reasoning Check”—a secondary prompt to verify its own logic. “Measure twice, cut once.”

The Dynamic Matrix: Survival of the Fittest

Static config.yaml files are a relic.

The Intelligence Service runs a background process called the Capability Analyzer. It constantly tests your available providers.

Is Ollama responding?
Is the Gemini quota full?
Is the latency on DeepSeek spiking?

It builds a Dynamic Matrix. When the router says “I need a Coder,” it doesn’t look at a stale config file. It asks the Matrix: “Who is the best available Coder on the network right now?”

If your primary model dies, the system doesn’t error out. It seamlessly degrades (or promotes) the next best “Coder” in the matrix. You never even notice the switch.

Data Structures

For the engineers, here’s what the brain looks like in Go:

type IntelligenceService struct {
    discovery  *DiscoveryService
    capability *CapabilityAnalyzer
    matrix     *DynamicMatrixBuilder
    embedding  *EmbeddingEngine   // ONNX Runtime
    semantic   *SemanticTier      // Vector Logic
    cache      *SemanticCache     // Redis/In-Memory
    confidence *ConfidenceScorer
    verifier   *Verifier
}

This isn’t just routing. It’s orchestration. It’s the difference between a switchboard operator and a traffic controller.

Cortex Phase 2: 20ms to Understand Your Soul

The 4 Tiers of Cognition

Tier 0: The Semantic Cache (`< 1ms`)

Tier 1: The Reflex Tier (`< 1ms`)

Tier 2: The Semantic Tier (`< 20ms`)

Tier 3: The Cognitive Tier (`200ms+`)

The Dynamic Matrix: Survival of the Fittest

Data Structures

Bridging Architecture & Execution

Written by Sebastian Schkudlara Follow

Cortex Phase 2: 20ms to Understand Your Soul

The 4 Tiers of Cognition

Tier 0: The Semantic Cache (< 1ms)

Tier 1: The Reflex Tier (< 1ms)

Tier 2: The Semantic Tier (< 20ms)

Tier 3: The Cognitive Tier (200ms+)

The Dynamic Matrix: Survival of the Fittest

Data Structures

Bridging Architecture & Execution

Written by Sebastian Schkudlara Follow

Data Protocol / Consent

Data Protocol / Consent

Tier 0: The Semantic Cache (`< 1ms`)

Tier 1: The Reflex Tier (`< 1ms`)

Tier 2: The Semantic Tier (`< 20ms`)

Tier 3: The Cognitive Tier (`200ms+`)