Discussion about this post

User's avatar
Jorrit's avatar

Very Interesting take. I have written this article about Cloudlfare and would be happy if you can check it out :)

https://marketsminds.substack.com/p/deep-dive-cloudflare-the-internets?utm_campaign=post-expanded-share&utm_medium=post%20viewer

Pawel Jozefiak's avatar

Strong point, and it matches my own tests. I tested Sonnet 4.6 on real coding and long context workflows, and the pattern was consistent: results improved when I enforced agent handoff boundaries and explicit success checks. When context got noisy, reliability dropped even if raw speed looked good. I documented both experiments, including a personal context test, here: https://thoughts.jock.pl/p/sonnet-46-two-experiments-one-got-personal

Have you found a reliable way to catch silent failures early? I would love to compare notes on your exact setup.

No posts

Ready for more?