5 Comments
User's avatar
Mitchell Kosowski's avatar

The caching finding surprised me: 80%+ reuse and the bill still stays high, when most of us treat caching as the main lever. The point that lands hardest is "route on a signal you already have": most teams know whether a call is planning vs. a simple edit but they just never pass that down to the router.

Curious how you handle the quality hit when a tier swaps model families mid-task and has to drop the intermediate reasoning... do you tend to re-plan or just eat the context loss?

Clint Cain's avatar

Very detailed and to identify how to go about this chaotic landscape of choice for your build.

ZPF's avatar

https://substack.com/@zpftechnologies?r=86qmqm&utm_medium=ios&utm_source=stories&shareImageVariant=blur

Step into the frontier of quantum discoveries. Join our community advancing boundary-conditioned ZPF research—subscribe to support the mission and stay connected.

Jay Webster's avatar

Love this. Thank you.

Ariel Smoliar's avatar

Great framing! Static tier routing is table stakes now, the next layer is dynamic: factor in live provider health, rate limits, and whether the cheaper model actually produced equivalent outcomes.

That's what we're building at LOCO-Agent, open-source

github.com/ArielSmoliar/loco-agent