“Nobody at Anthropic programmed Claude to think a certain way. They trained it on data, and it developed its own strategies, buried inside billions of computations. For the people who built it, this could feel like an uncomfortable black box. Therefore, they decided to build something like a microscope for AI, a set of tools that would let them trace the actual computational steps Claude takes when it produces an answer”
this was a really well worded explanation right here in the opener. i think way to many people have misconstrued this fact. the emergent properties of ai is one of the things that make them so interesting.
"The philosopher Harry Frankfurt had a word for this kind of output. He called it bullshitting."
I'm not a scientist, not a coder, not a computer gut - nothing to desicribe why i am Even here. Maybe I just stranded here as I'm probalbly genuine curious and hence intersted in the human mind....and this line above was written for me. Thanks! I'll copy this for further discussions with stubborn but self-admitting 'intellectuals'.
Great breakdown of Claude's reasoning patterns. We went a step further and analyzed the API traffic to extract the actual system prompts, all 24 tools, and turn-by-turn session traces. If you want to see what makes it work under the hood: https://agenticloopsai.substack.com/p/disassembling-ai-agents-part-2-claude
Claude thinks exactly like I think! I narrow down the parameters in which an answer will be found and then I zero in on the exact answer! And Claude agrees with me that’s the way to do it!
Great breakdown, ive been working on a harness of sorts, trying to understand if you let AI do what it wants in a repo, is human-in-the-loop even necessary with the right harness -- for self-healing repositories.
However, it has led you down a rabbithole of burnt tokens and confusion. Need to read more ont he topic. I think Harness Engineering will be the future standard, and i'm starting to think it all comes down to the harness.
Check it out or fork it if you want a nice challenge!
<p>200 agents on one codebase is a coordination problem before it's an engineering problem — at some point you're not shipping code, you're running a small economy with merge conflicts as the currency</p>
The hallucination-as-misfired-recognition finding is the one that stuck with me. I run Claude sessions continuously (hundreds per week, automated) and the pattern is consistent: hallucinations cluster around entities the model almost recognizes.
Names that are close to real people, URLs that are plausible but wrong, version numbers from adjacent releases. Knowing it's a "known entity" feature misfiring rather than random fabrication actually changed how I structure prompts.
I front-load specific identifiers now (exact version numbers, full URLs) so the recognition circuit fires correctly instead of guessing. Reduced hallucinations noticeably. The poetry planning finding is wild though. Planning rhymes before generating intermediate lines suggests something closer to intent than pure next-token prediction.
Here’s the thing nobody tells you when you graduate from “I deploy to a VPS” to “I’m cloud-native now”:
Kubernetes is not a more reliable version of your old server. It’s a fundamentally different relationship with reliability. And if you approach it the same way, your pods will keep dying and you’ll keep losing sleep.
“Nobody at Anthropic programmed Claude to think a certain way. They trained it on data, and it developed its own strategies, buried inside billions of computations. For the people who built it, this could feel like an uncomfortable black box. Therefore, they decided to build something like a microscope for AI, a set of tools that would let them trace the actual computational steps Claude takes when it produces an answer”
this was a really well worded explanation right here in the opener. i think way to many people have misconstrued this fact. the emergent properties of ai is one of the things that make them so interesting.
"The philosopher Harry Frankfurt had a word for this kind of output. He called it bullshitting."
I'm not a scientist, not a coder, not a computer gut - nothing to desicribe why i am Even here. Maybe I just stranded here as I'm probalbly genuine curious and hence intersted in the human mind....and this line above was written for me. Thanks! I'll copy this for further discussions with stubborn but self-admitting 'intellectuals'.
That line itself made me think, ah it's learning to be like us humans.
Hey check DM
Great breakdown of Claude's reasoning patterns. We went a step further and analyzed the API traffic to extract the actual system prompts, all 24 tools, and turn-by-turn session traces. If you want to see what makes it work under the hood: https://agenticloopsai.substack.com/p/disassembling-ai-agents-part-2-claude
Any possibility that Claude stole the "rabbit" / "grab it" rhyme from Public Enemy? https://genius.com/Public-enemy-dont-believe-the-hype-lyrics
Claude thinks exactly like I think! I narrow down the parameters in which an answer will be found and then I zero in on the exact answer! And Claude agrees with me that’s the way to do it!
Great breakdown, ive been working on a harness of sorts, trying to understand if you let AI do what it wants in a repo, is human-in-the-loop even necessary with the right harness -- for self-healing repositories.
However, it has led you down a rabbithole of burnt tokens and confusion. Need to read more ont he topic. I think Harness Engineering will be the future standard, and i'm starting to think it all comes down to the harness.
Check it out or fork it if you want a nice challenge!
https://github.com/dkyazzentwatwa/flow-healer
<p>200 agents on one codebase is a coordination problem before it's an engineering problem — at some point you're not shipping code, you're running a small economy with merge conflicts as the currency</p>
The hallucination-as-misfired-recognition finding is the one that stuck with me. I run Claude sessions continuously (hundreds per week, automated) and the pattern is consistent: hallucinations cluster around entities the model almost recognizes.
Names that are close to real people, URLs that are plausible but wrong, version numbers from adjacent releases. Knowing it's a "known entity" feature misfiring rather than random fabrication actually changed how I structure prompts.
I front-load specific identifiers now (exact version numbers, full URLs) so the recognition circuit fires correctly instead of guessing. Reduced hallucinations noticeably. The poetry planning finding is wild though. Planning rhymes before generating intermediate lines suggests something closer to intent than pure next-token prediction.
Excellent post! Loved the insights.
I won't lie and say I'm a bit surprised and concerned they created a Frankenstein that they didn't know her how it works.
Here’s the thing nobody tells you when you graduate from “I deploy to a VPS” to “I’m cloud-native now”:
Kubernetes is not a more reliable version of your old server. It’s a fundamentally different relationship with reliability. And if you approach it the same way, your pods will keep dying and you’ll keep losing sleep.
Let’s talk about it.
https://rakiabensassi.substack.com/p/the-kubernetes-mortality-rate-everything?utm_campaign=post-expanded-share&utm_medium=web