Discussion about this post

User's avatar
Scenarica's avatar

The honest answer to your closing question is the one that isnt on the list: the exit condition. The while-loop framing is correct but it hides the thing that actually kills agents in production. Knowing when to stop is significantly harder than knowing what to do next. An agent that cant recognise its output is good enough, or that the task is genuinely impossible, will burn tokens, take destructive actions, or loop indefinitely. Planning, tools, and memory all solve the "what next" problem. The exit condition solves the "when to stop" problem, and in my experience thats where roughly 70% of production failures originate.

The memory split is also missing a category thats more important in practice than either short-term or long-term: episodic memory. What did I try, what failed, and why. Without it the agent in a failing loop will attempt the same broken approach repeatedly because it has no record of having already tried it. The Reflexion mention in planning gets closest but in production episodic memory is the difference between an agent that converges on a solution and one that oscillates between the same two broken states until you kill it manually.

Ex-Consultant in Tech's avatar

I think the hardest part is evaluation. Most agent writeups assume the agent knows what “good” means. In practice, that’s the squishiest part. The model can make a plan, call tools, summarize results, and still be optimizing for the wrong definition of done.

That’s where agents get weird. If you don’t externalize that judgment into checks, tests, rubrics, budgets, constraints, and human review points, the agent just invents its own grading system.

2 more comments...

No posts

Ready for more?