Intelligent Proxy Patterns: Building a Gateway That Learns

A proxy without memory is essentially a digital goldfish. It swims around, processes a request, and then immediately forgets that it ever happened.

If you route a complex task to the wrong model and it fails, a standard proxy doesn’t learn from that mistake. If you intervene and manually select a different model, the proxy doesn’t notice your preference.

We looked at the core autonomous agent architecture and asked: What if our infrastructure had a memory?

The Observer-Critic Duality

We adopted a core agentic concept: Separation of Doing and Watching.

The Observer: Watches every transaction. Records the inputs, the decisions, and most importantly, the outcomes.

This duality powers our five new “High-Impact Patterns.”

1. The Memory System 🧠

“I remember you.” We built a persistent log (JSONL) of every routing decision. But it’s not just a log; it’s a Behavioral Profile.

Observation: “User sk-123 rejects 80% of Llama 3 coding responses.”
Action: “Automatically downgrade Llama 3 confidence for sk-123 on future coding tasks.”

Now, when you come back, the system is already tuned to you.

2. Heartbeat Monitoring ❤️

“Zero Downtime.” Reactive error handling is lazy engineering. Our Heartbeat Monitor pings every provider every 5 minutes. It maintains a Liveness Cache. If Ollama crashes, the monitor spots it. It removes Ollama from the routing pool. When your request comes in 1 second later, it doesn’t fail. It routes to the backup. You (the user) never even knew there was a crisis.

Sometimes you don’t want AI magic. You want deterministic rules. We introduced steering.yaml.

rules:
  - if: context.contains("CONFIDENTIAL")
    then: use_model("local:llama3")
  - if: time.between("09:00", "17:00")
    then: prefer("fastest")

It’s Policy-as-Code for your AI gateway.

4. The Hook System 🪝

“Automate the Pain Away.” This is “GitHub Actions” for your prompts.

Trigger: quota_warning (Provider: OpenAI)
Action: slack_notify (“Refilling credits…”)
Action: switch_profile (“Use OpenRouter Backup”)

5. The Learning Engine 📈

“Compound Interest on Intelligence.” This is the long game. The Learning Engine analyzes the Memory logs offline. It looks for patterns—biases, recurring failures, hidden successes. It updates the Confidence Weights in the Cortex Router. The system literally gets smarter the longer you use it.

The Half-Written Brain Problem

Building stateful systems is dangerous. What if power fails while writing memory? We solved this with the Atomic Durability Pattern (inspired by database design).

Write to memory.json.tmp.
fsync() (Force flush to disk hardware).
Atomic rename() to memory.json.

We never corrupt the brain.

Conclusion

We are moving away from “Smart Proxies.” A proxy is a middleman. We are building an Intelligent Partner. One that watches your back, learns your quirks, and fixes problems before you even know they exist.

Intelligent Proxy Patterns: Building a Gateway That Learns

The Observer-Critic Duality

1. The Memory System 🧠

2. Heartbeat Monitoring ❤️

4. The Hook System 🪝

5. The Learning Engine 📈

The Half-Written Brain Problem

Conclusion

Bridging Architecture & Execution

Written by Sebastian Schkudlara Follow

Intelligent Proxy Patterns: Building a Gateway That Learns

The Observer-Critic Duality

1. The Memory System 🧠

2. Heartbeat Monitoring ❤️

4. The Hook System 🪝

5. The Learning Engine 📈

The Half-Written Brain Problem

Conclusion

Bridging Architecture & Execution

Written by Sebastian Schkudlara Follow

Data Protocol / Consent

Data Protocol / Consent