A proxy without memory is essentially a digital goldfish. It swims around, processes a request, and then immediately forgets that it ever happened.
If you route a complex task to the wrong model and it fails, a standard proxy doesn’t learn from that mistake. If you intervene and manually select a different model, the proxy doesn’t notice your preference.
We looked at the core autonomous agent architecture and asked: What if our infrastructure had a memory?
The Observer-Critic Duality
We adopted a core agentic concept: Separation of Doing and Watching.
- The Observer: Watches every transaction. Records the inputs, the decisions, and most importantly, the outcomes.
This duality powers our five new “High-Impact Patterns.”
1. The Memory System 🧠
“I remember you.” We built a persistent log (JSONL) of every routing decision. But it’s not just a log; it’s a Behavioral Profile.
- Observation: “User
sk-123rejects 80% of Llama 3 coding responses.” - Action: “Automatically downgrade Llama 3 confidence for
sk-123on future coding tasks.”
Now, when you come back, the system is already tuned to you.
2. Heartbeat Monitoring ❤️
“Zero Downtime.” Reactive error handling is lazy engineering. Our Heartbeat Monitor pings every provider every 5 minutes. It maintains a Liveness Cache. If Ollama crashes, the monitor spots it. It removes Ollama from the routing pool. When your request comes in 1 second later, it doesn’t fail. It routes to the backup. You (the user) never even knew there was a crisis.
Sometimes you don’t want AI magic. You want deterministic rules. We introduced steering.yaml.
rules:
- if: context.contains("CONFIDENTIAL")
then: use_model("local:llama3")
- if: time.between("09:00", "17:00")
then: prefer("fastest")
It’s Policy-as-Code for your AI gateway.
4. The Hook System 🪝
“Automate the Pain Away.” This is “GitHub Actions” for your prompts.
- Trigger:
quota_warning(Provider: OpenAI) - Action:
slack_notify(“Refilling credits…”) - Action:
switch_profile(“Use OpenRouter Backup”)
5. The Learning Engine 📈
“Compound Interest on Intelligence.” This is the long game. The Learning Engine analyzes the Memory logs offline. It looks for patterns—biases, recurring failures, hidden successes. It updates the Confidence Weights in the Cortex Router. The system literally gets smarter the longer you use it.
The Half-Written Brain Problem
Building stateful systems is dangerous. What if power fails while writing memory? We solved this with the Atomic Durability Pattern (inspired by database design).
- Write to
memory.json.tmp. fsync()(Force flush to disk hardware).- Atomic
rename()tomemory.json.
We never corrupt the brain.
Conclusion
We are moving away from “Smart Proxies.” A proxy is a middleman. We are building an Intelligent Partner. One that watches your back, learns your quirks, and fixes problems before you even know they exist.
Sebastian Schkudlara
Cortex Phase 2: 20ms to Understand Your Soul