To understand how teams keep this under control in production, we sat down with Scott Breitenother and Sid Sijbrandij, co-founders of Kilo, an open-source coding agent that runs through a lot of these loops every day.
The caching finding surprised me: 80%+ reuse and the bill still stays high, when most of us treat caching as the main lever. The point that lands hardest is "route on a signal you already have": most teams know whether a call is planning vs. a simple edit but they just never pass that down to the router.
Curious how you handle the quality hit when a tier swaps model families mid-task and has to drop the intermediate reasoning... do you tend to re-plan or just eat the context loss?
Step into the frontier of quantum discoveries. Join our community advancing boundary-conditioned ZPF research—subscribe to support the mission and stay connected.
Great framing! Static tier routing is table stakes now, the next layer is dynamic: factor in live provider health, rate limits, and whether the cheaper model actually produced equivalent outcomes.
That's what we're building at LOCO-Agent, open-source
The caching finding surprised me: 80%+ reuse and the bill still stays high, when most of us treat caching as the main lever. The point that lands hardest is "route on a signal you already have": most teams know whether a call is planning vs. a simple edit but they just never pass that down to the router.
Curious how you handle the quality hit when a tier swaps model families mid-task and has to drop the intermediate reasoning... do you tend to re-plan or just eat the context loss?
Very detailed and to identify how to go about this chaotic landscape of choice for your build.
https://substack.com/@zpftechnologies?r=86qmqm&utm_medium=ios&utm_source=stories&shareImageVariant=blur
Step into the frontier of quantum discoveries. Join our community advancing boundary-conditioned ZPF research—subscribe to support the mission and stay connected.
Love this. Thank you.
Great framing! Static tier routing is table stakes now, the next layer is dynamic: factor in live provider health, rate limits, and whether the cheaper model actually produced equivalent outcomes.
That's what we're building at LOCO-Agent, open-source
github.com/ArielSmoliar/loco-agent