<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[ByteByteGo Newsletter]]></title><description><![CDATA[Explain complex systems with simple terms, from the authors of the best-selling system design book series. Join over 1,000,000 friendly readers.]]></description><link>https://blog.bytebytego.com</link><image><url>https://substackcdn.com/image/fetch/$s_!1eXV!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8a5609ae-1239-4400-9491-6010a15c4d60_504x504.png</url><title>ByteByteGo Newsletter</title><link>https://blog.bytebytego.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 20 Jun 2026 01:53:16 GMT</lastBuildDate><atom:link href="https://blog.bytebytego.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[ByteByteGo]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[alex@bytebytego.com]]></webMaster><itunes:owner><itunes:email><![CDATA[alex@bytebytego.com]]></itunes:email><itunes:name><![CDATA[Alex Xu]]></itunes:name></itunes:owner><itunes:author><![CDATA[Alex Xu]]></itunes:author><googleplay:owner><![CDATA[alex@bytebytego.com]]></googleplay:owner><googleplay:email><![CDATA[alex@bytebytego.com]]></googleplay:email><googleplay:author><![CDATA[Alex Xu]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Observability for Beginners: Logs, Metrics, Traces, and Everything Around Them]]></title><description><![CDATA[In this article, we will look at the basics of observability in detail with concepts like logs, metrics, and traces explained in detail.]]></description><link>https://blog.bytebytego.com/p/observability-for-beginners-logs</link><guid isPermaLink="false">https://blog.bytebytego.com/p/observability-for-beginners-logs</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Thu, 18 Jun 2026 15:31:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!r5Ej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Observability for Beginners: Logs, Metrics, Traces, and Everything Around Them</span></p><p style="text-align: justify;"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">A running service generates events constantly.</span></p><p style="text-align: justify;"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Requests arrive, functions run, errors appear, and each one is a thing that happened at a specific time with a specific context and a specific outcome.</span></p><p style="text-align: justify;"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Logs, metrics, and traces are three ways of looking at this same stream. A log captures one event as a line of text, a metric counts or aggregates many events, and a trace links related events as they move across services. Most of the concepts in observability, including cardinality, sampling, and correlation, are consequences of this structure.</span></p><p style="text-align: justify;"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">In this article, we will look at the basics of observability in detail with concepts like logs, metrics, and traces explained in detail.</span></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r5Ej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r5Ej!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 424w, https://substackcdn.com/image/fetch/$s_!r5Ej!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 848w, https://substackcdn.com/image/fetch/$s_!r5Ej!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 1272w, https://substackcdn.com/image/fetch/$s_!r5Ej!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r5Ej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png" width="1456" height="1686" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1686,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:699245,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/202526911?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r5Ej!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 424w, https://substackcdn.com/image/fetch/$s_!r5Ej!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 848w, https://substackcdn.com/image/fetch/$s_!r5Ej!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 1272w, https://substackcdn.com/image/fetch/$s_!r5Ej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefd64dd9-b0a6-4fc8-8e27-2a929a3b5eef_2650x3068.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;"><span data-color="rgb(0, 0, 0)" style="color: rgb(0, 0, 0);">Events</span></h2>
      <p>
          <a href="https://blog.bytebytego.com/p/observability-for-beginners-logs">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[LAST CALL FOR ENROLLMENT: Build with Claude Code - Cohort 2]]></title><description><![CDATA[We&#8217;re launching Cohort 2 of our 2-day intensive, cohort-based course, Build with Claude Code, taught by John Kim, who has trained hundreds of engineers at Meta to use Claude Code in real production workflows.]]></description><link>https://blog.bytebytego.com/p/last-call-for-enrollment-build-with</link><guid isPermaLink="false">https://blog.bytebytego.com/p/last-call-for-enrollment-build-with</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Wed, 17 Jun 2026 15:31:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wXDW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;re launching Cohort 2 of our 2-day intensive, cohort-based course, <strong>Build with Claude Code</strong>, taught by John Kim, who has trained hundreds of engineers at Meta to use Claude Code in real production workflows.</p><p>The course kicks off on June 18th, and <strong>enrollment closes in less than 24 hours</strong>. If you&#8217;ve been thinking about leveling up how you and your team work with Claude Code, this is the moment.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c2-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/claude-c2-substack"><span>Check it out now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wXDW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wXDW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wXDW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png" width="1000" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!wXDW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1456w" sizes="100vw" loading="lazy" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>A few things you&#8217;ll learn:</p><ul><li><p>The agentic loop, context engineering, and memory layers that make Claude Code useful for real projects</p></li><li><p>How to build with Claude Code Skills, MCPs, and hooks to give Claude the tools and feedback loops it needs to self correct</p></li><li><p>Parallel development with Git worktrees, subagents, and agent teams</p></li><li><p>A capstone project where you ship something real on your own stack</p></li></ul><p>The course includes live sessions, assignments, and office hours, so there&#8217;s plenty of room to ask questions and get unstuck.</p><p>The second cohort starts in just a few days: June 18 to 19, 2026. If you want to learn everything from the fundamentals of Claude Code to advanced production workflows, including working with large codebases, this could be a great way to level up.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c2-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/claude-c2-substack"><span>Check it out now</span></a></p>]]></content:encoded></item><item><title><![CDATA[How Open-Weight Models Changed the AI Landscape]]></title><description><![CDATA[In this article, we will look at how open-weight models have transformed the AI landscape.]]></description><link>https://blog.bytebytego.com/p/how-open-weight-models-changed-the</link><guid isPermaLink="false">https://blog.bytebytego.com/p/how-open-weight-models-changed-the</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Tue, 16 Jun 2026 15:31:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5d9m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/Datadog_061626">5 Facts on Real World DevSecOps in 2026 (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Datadog_061626" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6mN2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!6mN2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!6mN2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!6mN2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6mN2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:237118,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Datadog_061626&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6mN2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!6mN2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!6mN2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!6mN2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcc7206d-ad63-4375-92a7-7597a803b121_1200x1200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Datadog analyzed tens of thousands of production applications to reveal where risk is actually showing up and what it means for teams handling security today. Download the full report to dive into the key findings, including:</p><ul><li><p>Why 87% of orgs have exploitable vulnerabilities in production, and how end-of-life runtimes and outdated dependencies are quietly driving that number</p></li><li><p>How to cut alert noise by 80% by focusing only on vulnerabilities that pose real business risk</p></li><li><p>The hidden dangers of day-one library, AMI, and Docker image updates, and how supply chain attacks exploit them</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Datadog_061626&quot;,&quot;text&quot;:&quot;Get the report&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/Datadog_061626"><span>Get the report</span></a></p><div><hr></div><p style="text-align: justify;">In December 2024, an AI lab called DeepSeek released a 671-billion-parameter language model along with a technical report describing exactly how they built it. Six months later, a different team called Moonshot AI used that report as a starting point. They scaled the design to a trillion parameters, ran into a training instability problem that emerged at that scale, invented a new optimizer to solve it, and shipped their own model. Eight months after that, a third team called Zhipu AI integrated a different DeepSeek innovation into their architecture and contributed a new training framework of their own.</p><p style="text-align: justify;">These three teams work for different organizations. However, they were indirectly collaborating in public, through model releases where each company was learning from what its predecessors had done before. This has been made possible by the rise of open-weight models, where even competitors get to learn from each other. The pace of that kind of collaboration has changed over the past eighteen months, and the reasons trace back to the architecture and training choices these teams made in the open.</p><p style="text-align: justify;">In this article, we will look at how open-weight models have transformed the AI landscape.</p><p style="text-align: justify;"><em>Disclaimer: This post is based on publicly shared </em>details<em> from various sources. Please comment if you notice any inaccuracies.</em></p><h2 style="text-align: justify;">Open Weight vs Closed Weight</h2><p style="text-align: justify;">Every modern large language model has two important things behind it:</p><ul><li><p style="text-align: justify;">The first is the trained parameters, which are the numbers, often hundreds of billions of them, that the model learned during training. The parameters are what make the model &#8220;know&#8221; things.</p></li><li><p style="text-align: justify;">The second is everything that produced those parameters, including the training data and the training code.</p></li></ul><p style="text-align: justify;">A closed-weight model is one that the company keeps behind an API. The user sends some text to the official API endpoint, the company&#8217;s servers run the model on their hardware, and a response comes back. The parameters stay with the organization, and running the model on personal hardware or adjusting it for a specific task remains out of reach.</p><p style="text-align: justify;">An open-weight model is one where the company has published the trained parameters. Anyone can download them, run the model on their own hardware, and adjust it for their own data. The training data and the full training code, however, usually stay private.</p><p style="text-align: justify;">The term is &#8220;open weight&#8221; rather than &#8220;open source&#8221; for this reason.</p><p style="text-align: justify;">In traditional software, &#8220;open source&#8221; means the full source code is available to inspect and reproduce. Most AI models marketed as open source are actually open weight, where the trained model is public, while the process that produced it remains closed. This distinction matters because the published weights, paired with detailed technical reports, are what allow other teams to study a design and build on top of it.</p><p style="text-align: justify;">Different open-weight models also use different licenses, ranging from very permissive ones like MIT and Apache 2.0 to custom licenses with specific commercial restrictions, so the practical freedoms vary across the ecosystem.</p><p style="text-align: justify;">See the diagram below that shows the difference between accessing a closed-weight model and an open-weight model:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5d9m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5d9m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 424w, https://substackcdn.com/image/fetch/$s_!5d9m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 848w, https://substackcdn.com/image/fetch/$s_!5d9m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!5d9m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5d9m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d908f910-b679-4901-b51e-9a1204909118_2182x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:148342,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5d9m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 424w, https://substackcdn.com/image/fetch/$s_!5d9m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 848w, https://substackcdn.com/image/fetch/$s_!5d9m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!5d9m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd908f910-b679-4901-b51e-9a1204909118_2182x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;">The MoE Architecture</h2><p style="text-align: justify;">Every major open-weight LLM released at the frontier in 2025 and 2026 shares the same architectural skeleton. It is called a Mixture-of-Experts transformer, or MoE for short.</p><p style="text-align: justify;">Modern LLMs are built from stacked &#8220;blocks.&#8221; Each block has two main parts, an attention layer that figures out which previous words matter for the next one, and a feed-forward layer that does the actual computation.</p><p style="text-align: justify;">In a regular (&#8221;dense&#8221;) model, every parameter activates for every word the model processes. Adding more parameters to make a smarter model means the cost of running it scales linearly with that count. With hundreds of billions of parameters, this becomes impractical.</p><p style="text-align: justify;">MoE solves this by replacing the single feed-forward layer in each block with several smaller &#8220;expert&#8221; sub-networks, plus a small routing component that picks which experts to use for each word. The model can store knowledge across many experts while only computing a few of them per word.</p><p style="text-align: justify;">This is why two numbers matter for every MoE model:</p><ul><li><p style="text-align: justify;">The first is total parameters, which represent the model&#8217;s full memory footprint and knowledge capacity.</p></li><li><p style="text-align: justify;">The second is the active parameters, which represent how much of the model actually computes per word. Active parameters drive inference speed and per-query cost.</p></li></ul><p style="text-align: justify;">DeepSeek V3, for example, has 671 billion total parameters but only 37 billion active per word. Kimi K2 has a trillion total, but 32 billion active. Qwen3 has 235 billion total and 22 billion active. When comparing the cost of running these models, the active counts are what matter, rather than the totals. A trillion-parameter model and a 235-billion-parameter model can cost roughly the same per query if their active counts are similar.</p><p style="text-align: justify;">See the diagram below that shows how an MoE block works:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XZAv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XZAv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 424w, https://substackcdn.com/image/fetch/$s_!XZAv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 848w, https://substackcdn.com/image/fetch/$s_!XZAv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 1272w, https://substackcdn.com/image/fetch/$s_!XZAv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XZAv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png" width="1456" height="963" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:963,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:175382,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XZAv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 424w, https://substackcdn.com/image/fetch/$s_!XZAv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 848w, https://substackcdn.com/image/fetch/$s_!XZAv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 1272w, https://substackcdn.com/image/fetch/$s_!XZAv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb5f771f-4357-48c4-a41e-6a6edad58048_2340x1548.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Beginners often assume that experts in MoE specialize by topic, with a math expert, a code expert, and so on. The reality differs from that picture. The router picks experts per word, rather than per question, and the patterns experts specialize in are mostly outside human interpretation. The routing is fine-grained, and a single sentence will pass through many different combinations of experts as it generates.</p><p style="text-align: justify;">The MoE architecture explains why every frontier open-weight team is using roughly the same approach. The interesting differences lie in the design choices teams make inside that design approach, and three places are where those choices show up most clearly.</p><div><hr></div><h2><a href="https://go.bytebytego.com/Descope_061626">Tips to take AI agents from playground to production (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Descope_061626" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EJG1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 424w, https://substackcdn.com/image/fetch/$s_!EJG1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 848w, https://substackcdn.com/image/fetch/$s_!EJG1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 1272w, https://substackcdn.com/image/fetch/$s_!EJG1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EJG1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1757005,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Descope_061626&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201810207?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!EJG1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 424w, https://substackcdn.com/image/fetch/$s_!EJG1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 848w, https://substackcdn.com/image/fetch/$s_!EJG1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 1272w, https://substackcdn.com/image/fetch/$s_!EJG1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fece6da8b-feff-455a-ab01-9fe5edb0c805_1600x840.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every organization is exploring how to adopt AI agents or MCP servers, but how many of them are in production?</p><p>And if they aren&#8217;t in production, how likely is it that authentication, access control, and agentic identity concerns are the reason?</p><p>Watch this on-demand webinar from Descope to learn:</p><ul><li><p>Real-world MCP and agentic AI use cases</p></li><li><p>Identity challenges that prevent production-readiness</p></li><li><p>Actionable tips to build secure, scalable AI agents and MCP servers</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Descope_061626&quot;,&quot;text&quot;:&quot;Watch the webinar&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/Descope_061626"><span>Watch the webinar</span></a></p><div><hr></div><h2 style="text-align: justify;">Attention Strategies</h2><p style="text-align: justify;">The first place is in how teams handle attention.</p><p style="text-align: justify;">Every time the model generates a word, it looks back at every previous word in the conversation to figure out what comes next. To avoid recomputing this lookback at every step, the model caches information from earlier words. This cache is called the KV-cache, short for &#8220;keys and values,&#8221; and it grows as the conversation grows. For long conversations, the KV-cache becomes the main memory bottleneck.</p><p style="text-align: justify;">Three different strategies have emerged for managing it.</p><ul><li><p style="text-align: justify;"><strong>Grouped-Query Attention (GQA):</strong> Shares cached information across groups of attention heads, which reduces memory usage with a relatively simple implementation. Qwen3 and Llama 4 both use GQA. It is the easiest of the three to engineer and the most widely adopted across the industry.</p></li><li><p style="text-align: justify;"><strong>Multi-Head Latent Attention (MLA):</strong> Compresses the cached information into a smaller latent representation before storing it, then decompresses when the model needs to use it. MLA was introduced by DeepSeek and adopted by Kimi K2. It saves more memory than GQA, while adding computational work for the compression and decompression steps.</p></li><li><p style="text-align: justify;"><strong>Sparse Attention:</strong> Selects only the most relevant previous words to attend to, instead of attending to every one. DeepSeek introduced their version, called DeepSeek Sparse Attention, in V3.2. Zhipu AI&#8217;s GLM-5 adopted it shortly after. Sparse attention is most useful when the context is very long, where attending to everything becomes expensive, although it requires careful design to avoid skipping important tokens.</p></li></ul><p style="text-align: justify;">Each strategy is a rational choice depending on what the team is optimizing for, whether that is engineering simplicity, memory efficiency, or context length.</p><p style="text-align: justify;">See the diagram below that shows the three attention strategies side by side:</p><h2 style="text-align: justify;">Expert Count and Sparsity</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TLuu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TLuu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 424w, https://substackcdn.com/image/fetch/$s_!TLuu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 848w, https://substackcdn.com/image/fetch/$s_!TLuu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!TLuu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TLuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png" width="1456" height="780" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:182744,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TLuu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 424w, https://substackcdn.com/image/fetch/$s_!TLuu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 848w, https://substackcdn.com/image/fetch/$s_!TLuu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!TLuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ed6a6-52bb-4ec5-a08a-e14d9d8f8fc9_2454x1314.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">The second point where the teams diverge is in how aggressively they use the MoE pattern.</p><p style="text-align: justify;">Across the major open-weight models released in 2025 and 2026, the number of experts ranges from 16 to 384. That wider spread reflects a real disagreement about how far to push sparsity.</p><p style="text-align: justify;">At a fixed compute budget, increasing the number of experts can lower training and validation loss, meaning the model learns better from the same amount of compute. The tradeoff is memory. More total experts mean more total parameters that need to live in memory, even if only a few of them fire per word. Kimi K2&#8217;s trillion total parameters require a multi-GPU cluster regardless of how few experts activate per token, while Llama 4 Scout&#8217;s 109 billion total parameters fit on a single high-memory server. Both belong to the same architectural family, although the deployment realities differ significantly.</p><p style="text-align: justify;">A separate disagreement exists around whether to include a &#8220;shared expert&#8221; that processes every word and provides a baseline capability floor. DeepSeek V3, Llama 4, and Kimi K2 include one. Qwen3 dropped it after using one in their previous Qwen2.5-MoE, and their technical report stays silent on why. Consensus in the field on shared experts remains elusive, which is worth flagging because beginners often assume technical questions in well-resourced labs are settled. Many of them remain open.</p><p style="text-align: justify;">Llama 4 takes a particularly distinctive approach. Rather than making every layer in the model an MoE layer, Llama 4 alternates between dense and MoE layers. It also routes each word to only one routed expert, plus the shared one, instead of eight as in DeepSeek V3. The result is fewer active experts per word, with each expert being larger, which represents a different architectural bet from the rest of the field.</p><p style="text-align: justify;">See the diagram below that shows how the expert count varies across these models:</p><h2 style="text-align: justify;">Training Approaches</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D1uM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D1uM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 424w, https://substackcdn.com/image/fetch/$s_!D1uM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 848w, https://substackcdn.com/image/fetch/$s_!D1uM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!D1uM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D1uM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png" width="1456" height="937" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:937,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:162536,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D1uM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 424w, https://substackcdn.com/image/fetch/$s_!D1uM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 848w, https://substackcdn.com/image/fetch/$s_!D1uM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!D1uM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3752a12e-93ff-4b95-846a-9661737c326d_2042x1314.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">The third point at which teams diverge is in training.</p><p style="text-align: justify;">Architecture is one-half of how a model behaves. Training is the other half, and lately it has become where the more meaningful differences live.</p><p style="text-align: justify;">Pre-training is the part where the model learns to predict the next word across trillions of tokens of text. Pre-training gives the model its base knowledge of language and the world. The scale varies between teams, with DeepSeek V3 trained on 14.8 trillion tokens and Qwen3 trained on up to 36 trillion. The general approach, however, remains similar across teams.</p><p style="text-align: justify;">Post-training is everything that happens after, and post-training is where models now diverge the most. Three techniques deserve attention here.</p><ul><li><p style="text-align: justify;">The first is reinforcement learning with verifiable rewards. The model produces an output, and the output gets checked for objective correctness. Did the code compile? Did the math answer come out right? The model is rewarded for correct outputs and adjusted away from wrong ones. This was the breakthrough behind DeepSeek R1, and elements of it have been adopted across the open-weight ecosystem.</p></li><li><p style="text-align: justify;">The second is distillation. A very large &#8220;teacher&#8221; model is trained, and its outputs are then used to train smaller &#8220;student&#8221; models. Llama 4 was co-distilled from a 2-trillion-parameter teacher called Behemoth during pre-training itself. Qwen3 distills from its flagship model down to smaller members of the family.</p></li><li><p style="text-align: justify;">The third is synthetic agentic data. Teams build simulated environments loaded with real tools like APIs, shells, and databases, and reward the model for completing tasks in those environments. Kimi K2&#8217;s technical report describes a large-scale pipeline that generates tool-use demonstrations across simulated and real-world environments.</p></li></ul><p style="text-align: justify;">Beyond these techniques, the training infrastructure itself has become a meaningful contribution. Two examples from the recent ecosystem stand out.</p><ul><li><p style="text-align: justify;"><strong>MuonClip:</strong> Kimi K2&#8217;s team developed this optimizer because their training run was hitting instability at the trillion-parameter scale. With MuonClip, they trained on 15.5 trillion tokens without a single loss spike.</p></li><li><p style="text-align: justify;"><strong>Slime:</strong> Zhipu AI built this asynchronous reinforcement learning framework to improve training throughput, which allows more training iterations within the same compute budget.</p></li></ul><p style="text-align: justify;">Both contributions may end up being more reusable across the ecosystem than any specific architectural choice. Architecture is converging while training is now where teams place their different bets.</p><p style="text-align: justify;">See the diagram below that shows where these training approaches sit in the overall pipeline:</p><h2 style="text-align: justify;">The Borrow-and-Build Pattern</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J74S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J74S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 424w, https://substackcdn.com/image/fetch/$s_!J74S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 848w, https://substackcdn.com/image/fetch/$s_!J74S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 1272w, https://substackcdn.com/image/fetch/$s_!J74S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J74S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png" width="1456" height="926" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180651,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J74S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 424w, https://substackcdn.com/image/fetch/$s_!J74S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 848w, https://substackcdn.com/image/fetch/$s_!J74S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 1272w, https://substackcdn.com/image/fetch/$s_!J74S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7609b24-803e-4875-ba20-af8f7d880edb_2434x1548.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">A pattern runs through all three of these approaches when looked at together.</p><ul><li><p style="text-align: justify;">DeepSeek V2 introduced MLA.</p></li><li><p style="text-align: justify;">DeepSeek V3 kept MLA and refined the MoE design, training the resulting model for approximately 5.5 million dollars on 14.8 trillion tokens.</p></li><li><p style="text-align: justify;">Moonshot AI&#8217;s Kimi K2 used DeepSeek V3 as a starting point, scaled the design to a trillion parameters, and contributed MuonClip when the scale-up surfaced new training instability.</p></li><li><p style="text-align: justify;">DeepSeek V3.2 introduced sparse attention.</p></li><li><p style="text-align: justify;">Zhipu AI&#8217;s GLM-5 adopted that sparse attention approach and contributed Slime, a new framework for the post-training phase.</p></li></ul><p style="text-align: justify;">Each team built on the previous team&#8217;s published innovations, and each added something the next team could build on in turn. These innovations all depend on published weights and detailed technical reports.</p><p style="text-align: justify;">See the diagram below that shows how innovations have traveled between teams over the past eighteen months:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4ZoH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4ZoH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 424w, https://substackcdn.com/image/fetch/$s_!4ZoH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 848w, https://substackcdn.com/image/fetch/$s_!4ZoH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!4ZoH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4ZoH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png" width="1456" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:236463,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201655845?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4ZoH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 424w, https://substackcdn.com/image/fetch/$s_!4ZoH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 848w, https://substackcdn.com/image/fetch/$s_!4ZoH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!4ZoH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff2a1a98d-887d-42ca-a0cf-101c7a0e3c2c_2512x1298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">This is an observation about the open-weight ecosystem rather than a verdict that open-weight is &#8220;winning&#8221; or that closed-weight teams have fallen behind. Closed-weight teams are doing different work, optimized for different things, and many of their innovations stay private by choice. The open-weight ecosystem, however, has produced a kind of technical conversation that ran on a smaller scale before.</p><h2 style="text-align: justify;">Conclusion</h2><p style="text-align: justify;">The specific models covered here will likely be overtaken in months. The framework for reading them, however, will probably hold.</p><p style="text-align: justify;">Modern open-weight LLMs share a common skeleton, the MoE transformer, where total parameters and active parameters represent two different costs. Within that skeleton, teams place distinctive bets in three places:</p><ul><li><p style="text-align: justify;">The first is the attention strategy, choosing between GQA, MLA, or sparse attention.</p></li><li><p style="text-align: justify;">The second is how aggressively to use sparsity, which ranges across the field from 16 experts to 384.</p></li><li><p style="text-align: justify;">The third is the post-training approach, drawing from reinforcement learning, distillation, synthetic agentic data, or novel training infrastructure.</p></li></ul><p style="text-align: justify;">The license under which a model is released determines what teams and individuals can actually do with it, and &#8220;open weight&#8221; remains technically narrower than the traditional notion of open source.</p><p style="text-align: justify;">The most interesting development in this period of AI engineering is the innovations that open-weight models are inspiring through their published designs</p><p style="text-align: justify;">References</p><ul><li><p><a href="https://arxiv.org/abs/2412.19437">DeepSeek V3 technical report</a></p></li><li><p><a href="https://arxiv.org/abs/2405.04434">DeepSeek V2 technical report (introduces MLA)</a></p></li><li><p><a href="https://arxiv.org/abs/2507.20534">Kimi K2 technical report</a></p></li><li><p><a href="https://arxiv.org/abs/2505.09388">Qwen3 technical report</a></p></li><li><p><a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/">Llama 4 announcement (Meta)</a></p></li><li><p><a href="https://github.com/zai-org/GLM-V">GLM-5 model card and technical details (Z.ai)</a></p></li><li><p><a href="https://arxiv.org/abs/2501.12948">DeepSeek-R1 technical report</a></p></li><li><p><a href="https://api-docs.deepseek.com/news/news250929">DeepSeek V3.2 (Sparse Attention) announcement</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[A Guide to AI Inference Engineering]]></title><description><![CDATA[In this article, we will walk through how inference works and why the field&#8217;s optimization techniques exist.]]></description><link>https://blog.bytebytego.com/p/a-guide-to-ai-inference-engineering</link><guid isPermaLink="false">https://blog.bytebytego.com/p/a-guide-to-ai-inference-engineering</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Mon, 15 Jun 2026 15:31:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_p6C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/Unleashed_061526">FeatureOps Summit 2026 - Feature management in the AI Era (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Unleashed_061526" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256380,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Unleashed_061526&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Speed without control is a false economy. As AI code-generation accelerates software delivery, the FeatureOps Summit 2026 is here to ensure that when we ship more, we break less.This premier virtual event brings together engineers, architects, and product leaders from companies like Wayfair, Visa, Mintlify, Lloyds, and many others, to explore the infrastructure of fearless delivery.</p><p><strong>Key Themes:</strong></p><p><strong>AI Safety Nets:</strong> Guardrails for the flood of automated code.<br><strong>Edge Resilience:</strong> Sub-millisecond evaluation at scale.<br><strong>Continuous Flow:</strong> Moving past the &#8220;fixed-release&#8221; mindset. Register today to master the tools and patterns required for a fail-safe release environment.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Unleashed_061526&quot;,&quot;text&quot;:&quot;Register Today&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/Unleashed_061526"><span>Register Today</span></a></p><div><hr></div><p style="text-align: justify;">Every time an LLM generates a response, two operations run in sequence on the same GPU. The first processes the input prompt and emits a single token. The second produces every token after that, one at a time.</p><p style="text-align: justify;">From the outside, they look like stages of one process. However, inside the hardware, they have opposite bottlenecks. One is limited by raw compute. The other is limited by how fast data moves through memory. Most of the engineering work that makes production AI systems fast exists because of this split, and the techniques used to handle it are what inference engineering is built around.</p><p style="text-align: justify;">Inference engineering is the discipline of running trained AI models in production efficiently. The work spans low-level GPU code, model serving frameworks, and the cloud infrastructure that ties them together. Engineers in this field optimize for some combination of latency, throughput, cost, and quality, with the specific mix depending on the product they support. A few years ago, this work happened almost entirely inside frontier AI labs. Today, it has become a broad specialty that any company running serious AI workloads invests in.</p><p style="text-align: justify;">In this article, we will walk through how inference works and why the field&#8217;s optimization techniques exist.</p><p style="text-align: justify;"><em>Disclaimer: This post is based on publicly shared </em>details<em> from various sources. Please comment if you notice any inaccuracies.</em></p><h2 style="text-align: justify;">The Rise of Inference Engineering</h2><p style="text-align: justify;">Three years ago, inference engineering was a specialty practiced almost entirely inside frontier AI labs. The work concerned a small group of engineers building closed models that the rest of the industry consumed through APIs. That picture has shifted dramatically since 2024.</p><p style="text-align: justify;">Open models drove the change. Hugging Face, the public registry for AI models, now hosts well over two million open models, roughly 25 times what existed five years ago. Open releases like DeepSeek V3 have closed the capability gap with closed models, giving companies a real choice between paying for a closed API and running an open model themselves.</p><p style="text-align: justify;">Self-hosting open models brings three operational advantages over closed APIs:</p><ul><li><p style="text-align: justify;">Latency profiles can be tuned for the workload pattern of a specific product, where public APIs optimize for general throughput across many customers.</p></li><li><p style="text-align: justify;">Uptime can reach four nines or better with dedicated deployments, comparing favorably to the two nines typical of public APIs.</p></li><li><p style="text-align: justify;">Costs typically drop by around 80 percent at scale once volume justifies the engineering investment.</p></li></ul><p style="text-align: justify;">The result is that companies across many categories now build serious inference stacks, including AI-native startups, established products integrating AI into existing workflows, and even traditionally cautious sectors like healthcare.</p><p style="text-align: justify;">Cursor offers a representative example. The team built Composer 2.0 on top of an open model, applying extensive inference engineering to deliver autocomplete latency below what closed APIs offer.</p><h2 style="text-align: justify;">The Two Phases of LLM inference</h2><p style="text-align: justify;">Understanding why inference engineering looks the way it does starts with understanding what actually happens when a prompt arrives at an LLM. The process splits into two phases with very different physical demands on the GPU.</p><p style="text-align: justify;">A token is the atomic unit that an LLM works with. Roughly, it is a word or word fragment. The word &#8220;inference&#8221; might be one token, while &#8220;engineering&#8221; might break into two. Latency metrics that mention tokens per second are counted in this unit.</p><p style="text-align: justify;">The first phase is called prefill.</p><p style="text-align: justify;">The model takes the entire input prompt and runs it through every layer of weights in parallel. Two outputs come out of this burst, namely the first token of the response and the KV cache, which is a structure that stores intermediate values from the attention mechanism so they can be referenced as more tokens get generated.</p><p style="text-align: justify;">Prefill is compute-bound. The GPU&#8217;s math units are the limiting factor because every input token gets processed simultaneously through every layer of the model, and throwing more raw computational power at this phase makes it faster. The metric that captures prefill performance is time to first token, or TTFT. That brief pause between sending a prompt to ChatGPT and seeing the first tokens appear is prefill in action.</p><p style="text-align: justify;">The second phase is the decode phase. The model generates each subsequent token one at a time, running a full forward pass through every layer of weights for every token. Each new token depends on every token before it, which makes the process fundamentally sequential, and the GPU does this thousands of times for a long response.</p><p style="text-align: justify;">Decode is memory-bandwidth-bound. Math throughput sits mostly idle while the GPU spends its cycles reading model weights from memory for each forward pass, with the bottleneck living in data movement rather than arithmetic. The metric that captures decode performance is tokens per second, or TPS. The streaming pace of a long response is the decode phase at work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_p6C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_p6C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 424w, https://substackcdn.com/image/fetch/$s_!_p6C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 848w, https://substackcdn.com/image/fetch/$s_!_p6C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 1272w, https://substackcdn.com/image/fetch/$s_!_p6C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_p6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png" width="1456" height="921" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:921,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:128568,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_p6C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 424w, https://substackcdn.com/image/fetch/$s_!_p6C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 848w, https://substackcdn.com/image/fetch/$s_!_p6C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 1272w, https://substackcdn.com/image/fetch/$s_!_p6C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b8d857b-1ce9-4f39-93b5-67fcb663b2d4_1970x1246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Since prefill and decode have opposite bottlenecks, a technique that accelerates one phase often has minimal impact on the other. This is why benchmarks report TTFT and TPS as separate numbers, with performance on each phase measured independently.</p><p style="text-align: justify;">This split is also the structural insight that organizes the rest of inference engineering. Once prefill and decode are understood as two distinct operations, the field&#8217;s techniques sort themselves into three groups: those that accelerate prefill, those that accelerate decode, and those that rebalance the two against each other.</p><p style="text-align: justify;">The picture above is somewhat simplified. Real inference engines run batching, scheduling, and other complexity layered on top, and the prefill-decode split holds underneath all of it, which is why it serves as the foundation for the rest of this article.</p><h2 style="text-align: justify;">Optimization Techniques</h2><p style="text-align: justify;">With the prefill-decode split in mind, the major techniques in inference engineering become much easier to organize. Each one accelerates a specific phase, attacks both for different reasons, or restructures the system around the split itself.</p><p style="text-align: justify;">Let us cover each of the six techniques in detail.</p><h3 style="text-align: justify;">Batching</h3><p style="text-align: justify;">Batching is the most basic way to scale a single GPU&#8217;s output. The inference engine weaves multiple requests together, token by token, so one GPU can serve many users at once. Throughput rises significantly because the GPU&#8217;s compute capacity gets fully utilized instead of sitting idle between requests.</p><p style="text-align: justify;">The cost is paid in per-user latency.</p><p style="text-align: justify;">A single user on an unbatched system gets the lowest possible response time, while the same user on a heavily batched system waits longer because the GPU is also serving other requests. This trade-off is the primary tension that every other technique navigates around, and different products land at different points on the spectrum, with consumer chat tools favoring lower latency and batch processing pipelines favoring higher throughput.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2TLa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2TLa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 424w, https://substackcdn.com/image/fetch/$s_!2TLa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 848w, https://substackcdn.com/image/fetch/$s_!2TLa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!2TLa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2TLa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png" width="1456" height="860" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:860,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134778,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2TLa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 424w, https://substackcdn.com/image/fetch/$s_!2TLa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 848w, https://substackcdn.com/image/fetch/$s_!2TLa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!2TLa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9cf704-cfea-4b7f-8b4c-6c3554ea184a_1856x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3 style="text-align: justify;">Prefix Caching</h3><p style="text-align: justify;">Prefix caching accelerates prefill by reusing KV cache values across requests. When two prompts share an opening segment, like a long system prompt that is identical across thousands of requests, the engine computes that prefix once and reads from cache thereafter. This is why API providers charge less for cached input tokens.</p><p style="text-align: justify;">The catch is that the cache helps from the start of the sequence up to the first non-matching token. If the very first token differs between two prompts, prefix caching delivers zero savings even when the rest of the sequence is identical. Therefore, prompt structure has direct cost and latency implications, and putting variable user input late in the prompt while keeping shared content early gives the cache something to work with.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SnEz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SnEz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 424w, https://substackcdn.com/image/fetch/$s_!SnEz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 848w, https://substackcdn.com/image/fetch/$s_!SnEz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!SnEz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SnEz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png" width="1456" height="719" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:719,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172245,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SnEz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 424w, https://substackcdn.com/image/fetch/$s_!SnEz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 848w, https://substackcdn.com/image/fetch/$s_!SnEz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!SnEz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb87de856-3775-4ada-b41b-4c99125f3a7e_2220x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3 style="text-align: justify;">Quantization</h3><p style="text-align: justify;">Quantization helps both phases of inference, though for different reasons.</p><p style="text-align: justify;">The basic move is storing model weights in a lower-precision number format. Most modern models train in 16-bit floating-point, and quantization compresses those values down to 8-bit or 4-bit representations, which means smaller weights occupying less memory and requiring less data movement.</p><p style="text-align: justify;">Prefill speeds up because lower-precision math operations run faster on the specialized math units inside modern GPUs. Decode speeds up because reduced memory bandwidth pressure means weights are loaded from memory more quickly per forward pass. A typical step down in precision yields roughly 30 to 50 percent better performance, with the exact gain depending on the model and the technique applied.</p><p style="text-align: justify;">The cost is potential quality degradation, and different parts of a model tolerate quantization differently.</p><p style="text-align: justify;">Linear weights handle it well, activations are somewhat more sensitive, the KV cache is more sensitive still, and attention layers are the most sensitive of all. The reason is that small precision errors in attention layers compound across the sequence of tokens, with each token&#8217;s calculation building on the previous ones, so even small errors snowball into meaningful quality loss over a long response.</p><p style="text-align: justify;">Most production setups leave attention at full precision for this reason. The bulk of the engineering work in quantization comes down to figuring out which parts to compress and how aggressively.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mo_d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mo_d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 424w, https://substackcdn.com/image/fetch/$s_!Mo_d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 848w, https://substackcdn.com/image/fetch/$s_!Mo_d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 1272w, https://substackcdn.com/image/fetch/$s_!Mo_d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mo_d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png" width="1456" height="884" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:884,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:126115,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mo_d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 424w, https://substackcdn.com/image/fetch/$s_!Mo_d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 848w, https://substackcdn.com/image/fetch/$s_!Mo_d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 1272w, https://substackcdn.com/image/fetch/$s_!Mo_d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbba455a-53ba-455f-9086-0f0d9e13346a_1970x1196.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3 style="text-align: justify;">Speculative Decoding</h3><p style="text-align: justify;">Speculative decoding accelerates the decode process by exploiting an asymmetry. Generating a token from scratch is expensive, while verifying whether a candidate token matches what the main model would produce is much cheaper. The Sudoku analogy works here, where solving the puzzle takes effort, while checking a finished puzzle is fast.</p><p style="text-align: justify;">In speculative decoding, a smaller draft model predicts the next several tokens, and the main model verifies all of them in a single forward pass, accepting the ones that match its own predictions and rejecting the rest. The result is multiple tokens emerging per forward pass through the main model, where one would normally appear.</p><p style="text-align: justify;">Speculative decoding improves TPS while leaving TTFT unchanged, because prefill still runs normally. The technique also works best at smaller batch sizes, when the GPU has spare compute capacity to spend on verification. At larger batch sizes, when many requests are being served at once, the GPU is already saturated, and speculation gets dynamically disabled because every cycle is needed for the main workload.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FKnr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FKnr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 424w, https://substackcdn.com/image/fetch/$s_!FKnr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 848w, https://substackcdn.com/image/fetch/$s_!FKnr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!FKnr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FKnr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png" width="1456" height="1161" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1161,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151026,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FKnr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 424w, https://substackcdn.com/image/fetch/$s_!FKnr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 848w, https://substackcdn.com/image/fetch/$s_!FKnr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!FKnr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7a4980d-059e-4337-a525-e3800ffe57cd_1540x1228.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3 style="text-align: justify;">Parallelism</h3><p style="text-align: justify;">Parallelism techniques let large models run across multiple GPUs when a single one falls short, either because the model is too big to fit in memory or because single-GPU latency is too high. Two main approaches dominate the open model landscape: tensor parallelism and expert parallelism.</p><p style="text-align: justify;">Tensor parallelism splits each layer of the model across multiple GPUs. Every GPU holds a fragment of every layer, and the GPUs share the work for each forward pass. This requires high-bandwidth interconnects between the GPUs, like NVIDIA&#8217;s NVLink, because results need to be combined after every layer. Tensor parallelism is the default choice for serving very large dense models, where the bandwidth-hungry communication is offset by the speedup from sharing per-layer work.</p><p style="text-align: justify;">Expert parallelism applies specifically to mixture-of-experts models, where only a subset of the model&#8217;s parameters activate for each token. Different experts get distributed across different GPUs, and tokens get routed to whichever experts they need. The communication overhead is lower than tensor parallelism because experts operate independently, which makes expert parallelism well-suited for multi-node deployments where bandwidth is more limited. Most production deployments combine both, using tensor parallelism within a node and expert parallelism across nodes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gjho!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gjho!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 424w, https://substackcdn.com/image/fetch/$s_!gjho!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 848w, https://substackcdn.com/image/fetch/$s_!gjho!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 1272w, https://substackcdn.com/image/fetch/$s_!gjho!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gjho!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png" width="1456" height="1061" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1061,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:198965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gjho!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 424w, https://substackcdn.com/image/fetch/$s_!gjho!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 848w, https://substackcdn.com/image/fetch/$s_!gjho!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 1272w, https://substackcdn.com/image/fetch/$s_!gjho!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f1d8e3a-4d25-4c1a-b925-7d022891fa30_1970x1436.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3 style="text-align: justify;">Disaggregation</h3><p style="text-align: justify;">Disaggregation takes the prefill-decode split literally. The idea is to run prefill on one set of GPUs and decode on another, with the KV cache shipped between them over the network. Each set uses hardware tuned to its specific bottleneck, and each set scales independently based on its own traffic pattern.</p><p style="text-align: justify;">The flow becomes a three-step process:</p><ul><li><p style="text-align: justify;">The prefill engine takes the input sequence and produces both the first token and the KV cache.</p></li><li><p style="text-align: justify;">The cache gets sent over a fast interconnect to the decode engine, and the decode engine handles every subsequent token.</p></li><li><p style="text-align: justify;">In conditional disaggregation, short or already-cached requests skip the handoff entirely and run on the decode engine alone, which performs better against real-world traffic that includes a mix of long and short prompts.</p></li></ul><p style="text-align: justify;">Disaggregation is the most architectural of the techniques covered here. It treats prefill and decode as separate services with separate operational concerns, giving operators independent levers to scale each one. Companies running large-scale inference often consider this a near-mandatory step once their workload mix is well understood.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qsld!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qsld!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 424w, https://substackcdn.com/image/fetch/$s_!Qsld!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 848w, https://substackcdn.com/image/fetch/$s_!Qsld!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!Qsld!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qsld!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png" width="1456" height="763" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131269,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qsld!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 424w, https://substackcdn.com/image/fetch/$s_!Qsld!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 848w, https://substackcdn.com/image/fetch/$s_!Qsld!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!Qsld!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cda55a4-3e8b-4095-9b61-7382ba00c1c3_2220x1164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;">When to Invest in Inference Engineering</h2><p style="text-align: justify;">Putting these techniques into production is a serious task, and combining them adds further complexity. The question every engineering team has to answer is whether to take on this work or whether off-the-shelf APIs are still the right choice. The answer depends on the stage of the product.</p><p style="text-align: justify;">Early in building an AI product, off-the-shelf APIs from established providers are almost always the right choice. Meaningful optimization requires real constraints to work against, and early-stage products tend to have fuzzy assumptions about traffic patterns, latency requirements, and unit economics. Engineering effort at this stage is better spent shipping product, since the complexity of running a custom inference stack slows down iteration when iteration speed is what actually matters.</p><p style="text-align: justify;">Three signals usually indicate the equation has shifted:</p><ul><li><p style="text-align: justify;">API costs have grown into a meaningful expense line.</p></li><li><p style="text-align: justify;">Latency requirements have moved past what closed APIs can deliver.</p></li><li><p style="text-align: justify;">Reliability needs have started to exceed what vendor SLAs offer.</p></li></ul><p style="text-align: justify;">Cursor handled this transition well. Sub-second autocomplete latency was the product itself, and closed APIs aim for general throughput across many customers, while a code completion model demands a specific shape of speed. Self-hosting an open model and applying inference engineering across the stack made the latency target reachable, and the investment paid back because the constraints were real and the workload was well understood.</p><h2>Conclusion</h2><p style="text-align: justify;">LLM inference is two operations with opposite physical constraints.</p><p style="text-align: justify;">Prefill is compute-bound and runs once per request. Decode is memory-bandwidth-bound and runs once per token. Most of the techniques in inference engineering exist because of this split, and grasping it makes the rest of the field much easier to navigate.</p><p style="text-align: justify;">Each technique covered above fits into the prefill-decode framework:</p><ul><li><p style="text-align: justify;">Batching trades per-user latency for total throughput.</p></li><li><p style="text-align: justify;">Prefix caching cuts prefill work when prompts share opening segments.</p></li><li><p style="text-align: justify;">Quantization compresses model weights to help both phases.</p></li><li><p style="text-align: justify;">Speculative decoding squeezes more tokens out of decode by exploiting idle compute.</p></li><li><p style="text-align: justify;">Parallelism scales models across multiple GPUs.</p></li><li><p style="text-align: justify;">Disaggregation runs prefill and decode on separate hardware altogether.</p></li></ul><p style="text-align: justify;">Layered on top of all this is the build-versus-buy question. Off-the-shelf APIs remain the right choice for most products in their early stages, while self-hosting starts to make sense when API costs grow into a real expense line, when latency requirements outgrow what closed APIs can deliver, or when reliability needs exceed vendor SLAs.</p><p style="text-align: justify;"><strong>References:</strong></p><ul><li><p><a href="https://huggingface.co/docs/hub/index">Hugging Face Hub Documentation</a></p></li><li><p><a href="https://arxiv.org/abs/2412.19437">DeepSeek-V3 Technical Report</a></p></li><li><p><a href="https://cursor.com/blog/composer">Cursor &#8212; Composer: Building a fast frontier model with RL</a></p></li><li><p><a href="https://cursor.com/blog/composer-2">Cursor &#8212; Introducing Composer 2</a></p></li><li><p><a href="https://arxiv.org/abs/2401.09670">DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving</a></p></li><li><p><a href="https://arxiv.org/abs/2309.06180">Efficient Memory Management for Large Language Model Serving with PagedAttention</a></p></li><li><p><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching">Anthropic &#8212; Prompt Caching Documentation</a></p></li><li><p><a href="https://developer.nvidia.com/tensorrt">NVIDIA TensorRT Documentation</a></p></li><li><p><a href="https://arxiv.org/abs/2211.17192">Fast Inference from Transformers via Speculative Decoding</a></p></li><li><p><a href="https://arxiv.org/abs/1909.08053">Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism</a></p></li><li><p><a href="https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/tensor_parallel.html">NVIDIA Megatron-Core Developer Guide &#8212; Tensor Parallel</a></p></li><li><p><a href="https://arxiv.org/abs/1701.06538">Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer</a></p></li><li><p><a href="https://www.nvidia.com/en-us/data-center/nvlink/">NVIDIA NVLink and NVLink Switch</a></p></li><li><p><a href="https://docs.anthropic.com">Anthropic Claude API Documentation</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[EP218: The Typical AI Agent Stack, Explained]]></title><description><![CDATA[Over to you: Which layer of the stack do you think is the hardest to get right in production?]]></description><link>https://blog.bytebytego.com/p/ep218-the-typical-ai-agent-stack</link><guid isPermaLink="false">https://blog.bytebytego.com/p/ep218-the-typical-ai-agent-stack</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Sat, 13 Jun 2026 15:31:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!N2N1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/Descope_061326">Run your customer auth with AI agents (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Descope_061326" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3lYw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!3lYw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!3lYw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!3lYw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3lYw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1619611,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Descope_061326&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201645658?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3lYw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!3lYw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!3lYw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!3lYw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff57c2c9d-8a4b-4b07-b750-72fb0d208a9f_1200x1200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Coding agents are here to stay, but vibe-coding auth is dangerous business. Connect your AI assistants to the Descope MCP server instead!</p><p>This remote MCP server connects agents to the Descope identity platform, giving them the ability to read docs, inspect project config, manage users and tenants, configure authentication flows, review audit logs, and make changes to your identity infrastructure. All through natural language and from a single session.</p><p>Descope is trusted by thousands of businesses including GoFundMe, GoodRx, Linktree, and Databricks.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Descope_061326&quot;,&quot;text&quot;:&quot;Get started with 100+ prompt examples&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/Descope_061326"><span>Get started with 100+ prompt examples</span></a></p><div><hr></div><p>This week&#8217;s system design refresher:</p><ul><li><p>How to Run LLMs Locally (Youtube video)</p></li><li><p>The Typical AI Agent Stack, Explained</p></li><li><p>Understanding Git Reset Modes</p></li><li><p>How NAT Works</p></li><li><p>Final Week to Enroll: Build with Claude Code</p></li><li><p>We&#8217;re hiring at ByteByteGo</p></li></ul><div><hr></div><h2>How to Run LLMs Locally (Great For Learning and Privacy)</h2><div id="youtube2-U8lGbSaCCYI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;U8lGbSaCCYI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/U8lGbSaCCYI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><div><hr></div><h2>The Typical AI Agent Stack, Explained</h2><p>Most people think an AI agent is just a clever prompt and an LLM. The reality is much deeper. There's an entire architecture working behind the scenes to make it all run.</p><p>The diagram below shows the full AI Agent Stack. At the core is the Agent Runtime that runs a ReAct loop, and three other layers feed into it. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N2N1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N2N1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!N2N1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!N2N1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!N2N1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N2N1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg" width="1284" height="1536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1536,&quot;width&quot;:1284,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;graphical user interface&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="graphical user interface" title="graphical user interface" srcset="https://substackcdn.com/image/fetch/$s_!N2N1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!N2N1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!N2N1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!N2N1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5edb76e4-d060-48d2-bd73-afe04f1cff5a_1284x1536.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI Agent Runtime: The LLM thinks about what to do, picks a tool, observes the result, then reflects and decides the next step. This loop repeats until the goal is reached. </p><p>Model Layer (the brain): The underlying LLMs that power reasoning.</p><p>Tool Layer (the hands): How the agent interacts with the real world: search, APIs, code execution, data access.</p><p>Memory Layer (the notebook): Short-term working memory for the current task, long-term semantic memory for knowledge, and transactional memory for state.</p><p>Wrapping everything is the Observability &amp; Safety Layer. This is what keeps agents debuggable, evaluable, cost-aware, and safe in production.</p><p>Over to you: Which layer of the stack do you think is the hardest to get right in production?</p><div><hr></div><h2><a href="https://go.bytebytego.com/Unleashed_061326">FeatureOps Summit 2026 - Feature management in the AI Era (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Unleashed_061326" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256380,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Unleashed_061326&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Speed without control is a false economy. As AI code-generation accelerates software delivery, the FeatureOps Summit 2026 is here to ensure that when we ship more, we break less. This premier virtual event brings together engineers, architects, and product leaders from companies like Wayfair, Visa, Mintlify, Lloyds, and many others, to explore the infrastructure of fearless delivery.</p><p><strong>Key Themes:</strong></p><p><strong>AI Safety Nets:</strong> Guardrails for the flood of automated code.<br><strong>Edge Resilience:</strong> Sub-millisecond evaluation at scale.<br><strong>Continuous Flow:</strong> Moving past the &#8220;fixed-release&#8221; mindset. Register today to master the tools and patterns required for a fail-safe release environment.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Unleashed_061326&quot;,&quot;text&quot;:&quot;Register Today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/Unleashed_061326"><span>Register Today</span></a></p><div><hr></div><h2>Understanding Git Reset Modes</h2><p>git reset has three modes. Each one moves HEAD, but they differ in what happens to your index and working directory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wI2g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wI2g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 424w, https://substackcdn.com/image/fetch/$s_!wI2g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 848w, https://substackcdn.com/image/fetch/$s_!wI2g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 1272w, https://substackcdn.com/image/fetch/$s_!wI2g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wI2g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png" width="1456" height="1826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:411552,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201645658?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wI2g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 424w, https://substackcdn.com/image/fetch/$s_!wI2g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 848w, https://substackcdn.com/image/fetch/$s_!wI2g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 1272w, https://substackcdn.com/image/fetch/$s_!wI2g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148840a3-d7df-4308-adad-c227f4d280e8_2360x2960.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><ol><li><p>git reset --soft: Moves HEAD only. Index and working directory stay as-is. Use this when you want to recommit with different changes or a different message.</p></li><li><p>git reset --mixed (default): Moves HEAD and clears the index, but leaves the working directory alone. Your changes become unstaged, still there, just no longer queued for commit.</p></li><li><p>git reset --hard: Moves HEAD, clears the index, and resets the working directory to match the target commit. Any uncommitted changes are gone.</p></li></ol><p>Over to you: Which reset mode do you use the most and has &#8220;--hard&#8221; ever cost you a day of work?</p><div><hr></div><h2>How NAT Works</h2><p>Every device in your home probably shares the same public IP, still each one browses, streams, and connects independently. This is handled by NAT (Network Address Translation), a protocol that runs quietly in the background of almost every home network.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CKOS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CKOS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 424w, https://substackcdn.com/image/fetch/$s_!CKOS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 848w, https://substackcdn.com/image/fetch/$s_!CKOS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 1272w, https://substackcdn.com/image/fetch/$s_!CKOS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CKOS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png" width="1456" height="1826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:376977,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201645658?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CKOS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 424w, https://substackcdn.com/image/fetch/$s_!CKOS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 848w, https://substackcdn.com/image/fetch/$s_!CKOS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 1272w, https://substackcdn.com/image/fetch/$s_!CKOS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d2521b0-1dd8-4b28-afbd-34f2bb44ee50_2360x2960.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>It&#8217;s the reason IPv4 hasn&#8217;t run out completely, and why your router can hide dozens of devices behind a single public IP.</p><ul><li><p>The Core Idea: Inside your local network, devices use private IP addresses that never leave your home or office. Your router, however, uses a single public IP address when talking to the outside world.</p></li></ul><p>NAT rewrites each outbound request so it appears to come from that public IP address, assigning a unique port mapping for every internal connection.</p><p>Outbound NAT (Local to Internet): When a device sends a request,</p><ul><li><p>NAT replaces the private IP address with the public one</p></li><li><p>Assigns a unique port so it can track the connection</p></li><li><p>Sends the packet out to the internet as if it originated from the router</p></li></ul><p>Reverse NAT (Internet to Local): When the response returns,</p><ul><li><p>NAT checks its translation table</p></li><li><p>Restores the original private IP address and port</p></li><li><p>Delivers the packet to the correct device on the local network</p></li></ul><p>Over to you: Have you ever run into tricky NAT edge cases? Port forwarding? Double NAT? Video calls breaking? Online gaming problems?</p><div><hr></div><h2>Final Week to Enroll: Build with Claude Code</h2><p>We&#8217;re launching a new 2 day intensive, cohort based course called Build with Claude Code, taught by John Kim, who has trained hundreds of engineers at Meta to use Claude Code in real production workflows.</p><p>The course kicks off <strong>June 18th</strong>, and <strong>enrollment closes in less than a week</strong>. If you&#8217;ve been thinking about leveling up how you and your team work with Claude Code, this is the moment.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c2-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/claude-c2-substack"><span>Check it out now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wXDW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wXDW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wXDW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png" width="1000" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!wXDW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 424w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 848w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!wXDW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe86264-5a05-4a02-a10f-ac7daf221ca7_1000x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>A few things you&#8217;ll learn:</p><ul><li><p>The agentic loop, context engineering, and memory layers that make Claude Code useful for real projects</p></li><li><p>How to build with Claude Code Skills, MCPs, and hooks to give Claude the tools and feedback loops it needs to self correct</p></li><li><p>Parallel development with Git worktrees, subagents, and agent teams</p></li><li><p>A capstone project where you ship something real on your own stack</p></li></ul><p>The course includes live sessions, assignments, and office hours, so there&#8217;s plenty of room to ask questions and get unstuck.</p><p>The first cohort starts in just a few days: May 28 to 29, 2026. If you want to learn everything from the fundamentals of Claude Code to advanced production workflows, including working with large codebases, this could be a great way to level up.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c2-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/claude-c2-substack"><span>Check it out now</span></a></p><div><hr></div><h2>We&#8217;re Hiring at ByteByteGo</h2><p>We&#8217;re looking for multiple part-time instructors to teach AI and engineering cohort-based live courses.</p><p>This is a great fit if you love teaching, enjoy sharing what you know, and want a meaningful side thing alongside your main work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cVPV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cVPV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cVPV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cVPV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cVPV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cVPV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg" width="1280" height="1581" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1581,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;table&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="table" title="table" srcset="https://substackcdn.com/image/fetch/$s_!cVPV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cVPV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cVPV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cVPV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451f54fc-95ec-43bf-8ad0-fdeada5fcdb5_1280x1581.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The role has some upfront time investment to get familiar with the curriculum and prepare, but after that, it&#8217;s designed to be a limited commitment (2-5 hours bi-weekly). It offers stable income, good upside, and a chance to share your knowledge while working with ambitious learners.</p><p>We&#8217;re especially looking for instructors in:</p><ul><li><p>Building Production-Grade AI Systems</p></li><li><p>System Design</p></li><li><p>AI Security &amp; LLM Red-Teaming</p></li><li><p>AI Evals Intensive</p></li><li><p>AI Cost Optimization</p></li><li><p>Agentic AI Coding</p></li><li><p>Build with Codex</p></li><li><p>AI for Engineering Leaders</p></li><li><p>AI Automation</p></li><li><p>Others, please suggest</p></li></ul><p>Ideal instructors are hands-on, clear communicators, and excited to teach.</p><p>If this sounds like you, email us at <strong><a href="mailto:jobs@bytebytego.com">jobs@bytebytego.com</a></strong> with your background, the topics you&#8217;d be excited to teach, and any teaching, writing, or speaking samples.</p>]]></content:encoded></item><item><title><![CDATA[Must- Know Deployment Strategies: From Big-Bang to Progressive Delivery]]></title><description><![CDATA[In this article, we&#8217;ll go through the main deployment strategies used in production today, looking at how each one works, what it costs, and when it makes sense to use.]]></description><link>https://blog.bytebytego.com/p/must-know-deployment-strategies-from</link><guid isPermaLink="false">https://blog.bytebytego.com/p/must-know-deployment-strategies-from</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Thu, 11 Jun 2026 15:31:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VACI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p style="text-align: justify;">Deployment is the moment when code stops being a developer&#8217;s problem and becomes everyone&#8217;s. It is the act of taking something that worked on a build server and putting it in front of real users, on real infrastructure, and handling real traffic. For a long time, this moment was riskier than it had any reason to be, and the strategies we will discuss in this article are what teams built to take that risk down.</p><p style="text-align: justify;">Several distinct strategies are in common use today, and each one is an answer to a specific problem that the previous approaches couldn&#8217;t solve well enough. Some reduce the blast radius (the number of users affected) when a deploy goes wrong. Others separate the moment the code reaches production from the moment users actually see it.</p><p style="text-align: justify;">In this article, we&#8217;ll go through the main deployment strategies used in production today, looking at how each one works, what it costs, and when it makes sense to use.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VACI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VACI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 424w, https://substackcdn.com/image/fetch/$s_!VACI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 848w, https://substackcdn.com/image/fetch/$s_!VACI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 1272w, https://substackcdn.com/image/fetch/$s_!VACI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VACI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png" width="1456" height="1798" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:388869,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201528937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VACI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 424w, https://substackcdn.com/image/fetch/$s_!VACI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 848w, https://substackcdn.com/image/fetch/$s_!VACI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 1272w, https://substackcdn.com/image/fetch/$s_!VACI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e15c3c2-bc34-4a4a-a698-0372c5c9f238_2484x3068.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;">Big-Bang Deployment</h2>
      <p>
          <a href="https://blog.bytebytego.com/p/must-know-deployment-strategies-from">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Love Teaching? ByteByteGo Is Hiring Part-Time AI & Engineering Instructors]]></title><description><![CDATA[We&#8217;re looking for multiple part-time instructors to teach AI and engineering cohort-based live courses.]]></description><link>https://blog.bytebytego.com/p/love-teaching-bytebytego-is-hiring</link><guid isPermaLink="false">https://blog.bytebytego.com/p/love-teaching-bytebytego-is-hiring</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Wed, 10 Jun 2026 15:02:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Mcb6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;re looking for multiple part-time instructors to teach AI and engineering cohort-based live courses.</p><p>This is a great fit if you love teaching, enjoy sharing what you know, and want a meaningful side thing alongside your main work.</p><p>The role has some upfront time investment to get familiar with the curriculum and prepare, but after that, it&#8217;s designed to be a limited commitment (2-5 hours bi-weekly). It offers stable income, good upside, and a chance to share your knowledge while working with ambitious learners.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mcb6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mcb6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 424w, https://substackcdn.com/image/fetch/$s_!Mcb6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 848w, https://substackcdn.com/image/fetch/$s_!Mcb6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 1272w, https://substackcdn.com/image/fetch/$s_!Mcb6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mcb6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png" width="1456" height="1792" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1792,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:654685,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/201465119?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mcb6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 424w, https://substackcdn.com/image/fetch/$s_!Mcb6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 848w, https://substackcdn.com/image/fetch/$s_!Mcb6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 1272w, https://substackcdn.com/image/fetch/$s_!Mcb6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86aa77cf-d704-453b-af2f-ca143b036b16_1492x1836.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re especially looking for instructors in:</p><p>- Building Production-Grade AI Systems</p><p>- System Design</p><p>- AI Security &amp; LLM Red-Teaming</p><p>- AI Evals Intensive</p><p>- AI Cost Optimization</p><p>- Agentic AI Coding</p><p>- Build with Codex</p><p>- AI for Engineering Leaders</p><p>- AI Automation</p><p>- Others, please suggest</p><p>Ideal instructors are hands-on, clear communicators, and excited to teach.</p><p>If this sounds like you, email us at<strong> jobs@bytebytego.com</strong> with your background, the topics you&#8217;d be excited to teach, and any teaching, writing, or speaking samples.</p><p>Got someone in mind for any of these topics? We&#8217;d appreciate the intro. Thank you.</p>]]></content:encoded></item><item><title><![CDATA[What Salesforce Learned from 20,000 Enterprise Agent Deployments]]></title><description><![CDATA[We sat down with John Kucera, Salesforce&#8217;s CPO of Agentforce, to learn what separates agents that deliver real business value from those that stall after a good demo.]]></description><link>https://blog.bytebytego.com/p/what-salesforce-learned-from-20000</link><guid isPermaLink="false">https://blog.bytebytego.com/p/what-salesforce-learned-from-20000</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Tue, 09 Jun 2026 15:07:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eobd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/WorkOS_060926Headline">WorkOS launches auth.md - an open protocol for agent registration (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/WorkOS_060926CTA" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zvx3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!Zvx3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!Zvx3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!Zvx3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zvx3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133944,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/WorkOS_060926CTA&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/200800137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zvx3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!Zvx3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!Zvx3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!Zvx3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4af97bf2-5e1d-4826-9db3-f54c37e4526d_1200x630.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Sign-up forms were built for humans in browsers, so how do AI agents programmatically register with services?</p><p>Enter auth.md. By exposing a single, machine-readable Markdown file at your service root, AI agents can dynamically discover your OAuth Protected Resource Metadata, parse required scopes, and authenticate seamlessly.</p><p>With native support in WorkOS AuthKit, you can now implement this protocol out of the box, giving AI tools a standardized, secure way to log into your application.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/WorkOS_060926CTA&quot;,&quot;text&quot;:&quot;Read the auth.md docs&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/WorkOS_060926CTA"><span>Read the auth.md docs</span></a></p><div><hr></div><p>Everyone is building agents. Most of them will fail at scale. Not because the technology doesn&#8217;t work, but because teams don&#8217;t know what happens after the demo.</p><p>That&#8217;s the lesson from Salesforce, which has over 20,000 enterprise customers running Agentforce in production. Their support agent alone has handled over three million conversations.</p><p>We sat down with <a href="https://www.linkedin.com/in/johnkucera/">John Kucera</a>, Salesforce&#8217;s CPO of Agentforce, to learn what separates agents that deliver real business value from those that stall after a good demo.</p><h2>What is Salesforce?</h2><p>Salesforce is the enterprise software leader, and its Agentic Enterprise Architecture defines how AI agents are built and deployed across business operations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eobd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eobd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 424w, https://substackcdn.com/image/fetch/$s_!eobd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 848w, https://substackcdn.com/image/fetch/$s_!eobd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 1272w, https://substackcdn.com/image/fetch/$s_!eobd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eobd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eobd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 424w, https://substackcdn.com/image/fetch/$s_!eobd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 848w, https://substackcdn.com/image/fetch/$s_!eobd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 1272w, https://substackcdn.com/image/fetch/$s_!eobd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F184b7bf6-1c10-4645-9afe-72a5d68a7756_1508x850.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.salesforce.com/blog/frontier/">Figure 1</a>: Salesforce Agentic Enterprise Architecture</figcaption></figure></div><p>The Agentic Enterprise Architecture has four layers. At the top is the engagement layer, where users interact with agents through their everyday tools like Slack, chat, or messaging apps. Below that is the agent layer, where the AI reasoning and decision-making happens. This is where agents are built, monitored, and orchestrated.</p><p>Below the agent layer is the system of work, which incorporates the apps trusted by, and tailored for, your department and your industry. These are the business applications where real work gets done, like resolving a support case, processing a return, or updating a sales pipeline. Lastly, the context layer provides agents with the data and metadata they need to ground their actions in real context, ensuring decisions are informed by the specific business operations.</p><p>A trust layer spans the entire stack, supporting multiple LLM providers and enforcing the guardrails we cover later in this article.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YeXB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YeXB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 424w, https://substackcdn.com/image/fetch/$s_!YeXB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 848w, https://substackcdn.com/image/fetch/$s_!YeXB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 1272w, https://substackcdn.com/image/fetch/$s_!YeXB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YeXB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png" width="1456" height="890" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:890,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YeXB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 424w, https://substackcdn.com/image/fetch/$s_!YeXB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 848w, https://substackcdn.com/image/fetch/$s_!YeXB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 1272w, https://substackcdn.com/image/fetch/$s_!YeXB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf2070a3-4473-4f91-96f6-6a4add38addc_2048x1252.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 2: How a Request Flows Through Agentforce</figcaption></figure></div><p>Together, these layers let customers go from idea to working agent without building the infrastructure from scratch. Agentforce provides the reasoning, the data access, the business applications, and the trust controls as a connected platform. But having the right architecture is only part of the story. Once 20,000 customers started deploying agents on this platform, Salesforce discovered something that reshaped how they think about the entire product: the hardest part isn&#8217;t building the agent. It&#8217;s what happens after you ship it.</p><p><strong>What is Agentforce?</strong></p><p>Agentforce is Salesforce&#8217;s platform for building and deploying AI agents in the enterprise. Rather than a single model or chatbot, it&#8217;s a layered architecture designed to embed agentic AI across Salesforce&#8217;s entire ecosystem like sales, commerce, and services.</p><p>Agentforce elevates every experience by bringing together humans, applications, AI agents, and data. Now any company can safely deploy agents that work for their customers, suppliers, and employees 24/7. Teams can manage the complete agent development lifecycle with a robust set of tools to build, test, deploy, manage, and orchestrate AI agents at scale.</p><h2>Why Most Enterprise Agents Fail</h2><p>Agents built on LLMs are flexible by design. They can interpret a wide range of inputs and decide what to do in real time. But that flexibility comes with a tradeoff. Because LLMs are non-deterministic, the same question can produce different steps each time. Across Salesforce&#8217;s deployments, this was one of the most common challenges: keeping agent behavior consistent and reliable, especially in high-stakes workflows.</p><p>The reason this is so hard comes down to how AI agents differ from traditional software. In traditional software, the effort distribution looks roughly like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BBSG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BBSG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!BBSG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!BBSG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!BBSG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BBSG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BBSG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!BBSG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!BBSG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!BBSG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62399f82-a331-4628-8b48-1371b813fed1_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3: Effort Distribution in Traditional Software vs. AI Systems</figcaption></figure></div><p>With traditional software, 90% of the work happens before launch. You gather requirements, design the architecture, build it, and test it. After launch, you&#8217;re mostly in maintenance mode. With AI agents, this ratio flips. As Kucera put it: &#8220;In the typical software world, 90% of the work is getting to go live. Whereas in the typical AI agent, 90% of the work is after you go live to manage and improve the agent.&#8221;</p><p>This is the main reason most enterprise agents fail. Teams follow the traditional software playbook and assume the hard work is done once the agent is live. It&#8217;s not. That&#8217;s when it starts.</p><p>Modern tooling makes this worse. You can build a working agent in an afternoon. The demo handles the typical questions well. Leadership sees it and greenlights production. But typical questions are only a minority of what real users will ask. The majority involves edge cases, ambiguous phrasing, and cross-domain questions. That&#8217;s where the agent earns or loses trust.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B2oi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B2oi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 424w, https://substackcdn.com/image/fetch/$s_!B2oi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 848w, https://substackcdn.com/image/fetch/$s_!B2oi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 1272w, https://substackcdn.com/image/fetch/$s_!B2oi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B2oi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png" width="1456" height="629" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:629,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B2oi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 424w, https://substackcdn.com/image/fetch/$s_!B2oi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 848w, https://substackcdn.com/image/fetch/$s_!B2oi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 1272w, https://substackcdn.com/image/fetch/$s_!B2oi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59bb9940-d19b-4a7d-9b16-6a92b25e6888_2048x885.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 4: Pre-launch vs. post-launch</figcaption></figure></div><p>The teams that succeed treat launch as the starting line. That means two things: 1) get the pre-launch foundations right so they can iterate on quickly, and 2) budget the majority of effort for post-launch and continuous improvement.</p><p>These production realities forced Salesforce to evolve Agentforce itself. Models are inherently probabilistic, predicting the next best response rather than executing fixed logic. That makes them powerful for reasoning and natural interaction, but enterprises still need deterministic systems underneath for consistency and trust. The future of enterprise AI is the combination of both: deterministic workflows set the guardrails, while probabilistic AI adds adaptability and contextual reasoning on top. Features like Agent Script and Hybrid Reasoning are the direct result of watching 20,000 deployments hit this wall. The rest of this article covers the pre-launch and post-launch lessons that shaped that evolution.</p><h2>Pre-Launch: What to Get Right Before You Ship</h2><p>If 90% of the work is post-launch, then the goal of pre-launch isn&#8217;t to build the perfect agent. It&#8217;s to build an agent you can effectively iterate on. That means choosing the right scope, defining how you&#8217;ll measure success, and putting the trust, security, and safety guardrails in place from the start.</p><h3>1. Start Small and Focused</h3><p>Kucera put it simply: &#8220;Don&#8217;t boil the ocean.&#8221; When building your agent, it&#8217;s tempting to go after something ambitious. Don&#8217;t. Pick a use case that is both high-value and achievable. There are two reasons for this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1iun!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1iun!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!1iun!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!1iun!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!1iun!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1iun!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1iun!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!1iun!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!1iun!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!1iun!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb211c3e-9fe4-4ee8-8768-8bf5dd39f9de_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 5: Scope Selection</figcaption></figure></div><p>First, agent capabilities are still evolving fast. What&#8217;s possible today will look different in six months. If you invest heavily in a complex, multi-step agent now, you may end up rebuilding it as better models and tooling arrive. A focused use case gives you real production learnings without overcommitting to today&#8217;s limitations.</p><p>Second, the process of building agents is different from traditional software. Your team needs to learn how to review agent transcripts, figure out why the agent made a wrong decision, and update instructions, tools, and data sources. That learning is faster and lower-risk when the use case is small.</p><p>Starting small pays off soon. Once your team has shipped one agent, measured its impact, and learned the iteration cycle, scaling to the next use case is much faster. But even a focused agent needs a clear definition of success.</p><h3>2. Tie Agent to a KPI</h3><p>A common failure pattern across Salesforce&#8217;s customer base is that teams push an agent to production without defining what success actually means. Without a measurable goal, there&#8217;s no way to know if the agent is working or drifting.</p><p>This is where <a href="https://www.salesforce.com/news/stories/agentic-work-units/">Agentic Work Units</a> (AWUs) become critical. Introduced by Salesforce, AWUs are discrete units of meaningful work completed by an agent and provide a standardized way to measure actual task completion. They give teams a consistent framework to quantify value beyond activity or interactions, focusing instead on whether the agent is truly getting work done.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ySLU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ySLU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!ySLU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!ySLU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!ySLU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ySLU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ySLU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!ySLU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!ySLU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!ySLU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8723789-8a8d-4c1a-a728-ce154ddba93b_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 6: With vs Without a KPI</figcaption></figure></div><p>Every agent needs a concrete KPI tied to a real business outcome. For Salesforce&#8217;s own support agent at help.salesforce.com, that KPI is containment rate.</p><p>Containment rate means the percentage of cases fully resolved by the agent without human follow-up. A user asks &#8220;How do I reset my password?&#8221; The agent gives a clear answer, the user solves their problem, and never comes back for the same issue. That&#8217;s contained. But if the user returns the next day asking the same thing, the agent failed to actually help.</p><p>AWUs complement this by turning those outcomes into a consistent measurement of work completed, how many user intents were fully resolved per interaction, allowing teams to track not just whether a case was contained, but how efficiently and reliably the agent is performing work at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!auBv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!auBv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!auBv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!auBv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!auBv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!auBv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!auBv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!auBv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!auBv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!auBv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd32a702a-85c6-4abd-9b5f-ddc4b9bd258b_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 7: Containment Rate</figcaption></figure></div><p>The KPI also drives your post-launch iteration. When your team reviews transcripts and decides what to fix next, the KPI tells you what matters. A tone issue might be annoying, but a logic error that tanks your containment rate gets fixed first. Over time, AWUs become a shared language across product, engineering, and operations for evaluating agent performance, complementing traditional KPIs like containment rate with a more granular view of agent productivity.</p><p>With scope and KPIs in place, the last pre-launch foundation is the trust layer.</p><h3>3. Trust, Security, and Safety at Enterprise Scale</h3><p>Your agent sits between your users and your data, with an LLM in the middle. Data flows in both directions: user queries pull sensitive data into the LLM&#8217;s context, and the LLM&#8217;s responses flow back to users, sometimes triggering real actions. Each direction creates different risks, from data privacy leaks on the way in to hallucinated actions on the way out.</p><p>In practice, teams implement two guardrails on both sides of the LLM. Input guardrails to  protect data before it reaches the LLM, and output guardrails to validate the LLM&#8217;s response before it reaches the user.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YF2i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YF2i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 424w, https://substackcdn.com/image/fetch/$s_!YF2i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 848w, https://substackcdn.com/image/fetch/$s_!YF2i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 1272w, https://substackcdn.com/image/fetch/$s_!YF2i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YF2i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png" width="1456" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YF2i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 424w, https://substackcdn.com/image/fetch/$s_!YF2i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 848w, https://substackcdn.com/image/fetch/$s_!YF2i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 1272w, https://substackcdn.com/image/fetch/$s_!YF2i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa88f4b95-f8bf-484a-97ad-7d26212ef32f_2048x488.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 8: Input and Output Guardrails</figcaption></figure></div><h4><strong>Input guardrails</strong></h4><p>When your agent needs data to answer a question, that data has to travel from your systems into the LLM prompt. This is where things can leak. The core protections are secure data retrieval, zero data retention, and keeping data inside a trusted boundary. Data masking is also available, but as we&#8217;ll see, it comes with a tradeoff that makes it the wrong default for most agents.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kI4a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kI4a!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!kI4a!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!kI4a!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!kI4a!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kI4a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kI4a!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!kI4a!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!kI4a!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!kI4a!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ee3ed6-3022-46c8-9776-40bfbd4ec657_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 9: Sequence of input guardrails</figcaption></figure></div><p>Secure data retrieval means you control exactly how data enters the prompt. Instead of giving the LLM raw access to a database, you route requests through a controlled layer that only returns what the agent is allowed to see.</p><p>Zero data retention is an agreement with your LLM provider that they won&#8217;t store your prompts or responses, and won&#8217;t use them to train future models. Anything sent to an external LLM is covered by this contract: it isn&#8217;t retained, viewed, or used for training once the response is returned. Without it, your customer data could end up embedded in a model that serves other companies.</p><p>Keeping data inside a trusted boundary goes one step further. For the most sensitive workloads, you can route requests to a provider-hosted model that sits inside your platform&#8217;s trust boundary, so the data never crosses the public internet at all. On Agentforce, for example, custom actions can call Salesforce-managed models like Anthropic&#8217;s Claude hosted within the Salesforce trust boundary. The data stays inside the boundary, and zero data retention still applies.</p><p>Data masking is the one input guardrail to use with care. It catches sensitive data before it reaches the LLM, for example detecting a social security number and replacing it with a placeholder token. The catch is that masking can strip out the very context the agent needs. If a user asks the agent to build a list of accounts similar to a reference account, but the reference account&#8217;s details are masked, the agent no longer has the information to find the match. For this reason, Agentforce keeps pattern-based and field-based masking off by default for agents, relying on zero data retention and trust-boundary hosting instead. Masking remains a legitimate control where the redacted fields aren&#8217;t needed for reasoning, but it shouldn&#8217;t be the default for agents that depend on rich context.</p><h4><strong>Output guardrails</strong></h4><p>Input guardrails protect data on the way in, but the LLM can still produce bad output. Before a response reaches the user, you need a second set of checks such as tool validation, grounding checks, and content filtering.</p><p>Tool and sub-agent validation ensures the agent isn&#8217;t hallucinating actions, not just text. If the agent decides to route to a &#8220;refund_processor&#8221; sub-agent that was never defined, the system should catch that and block it.</p><p>Grounding checks verify the agent isn&#8217;t making up facts from its general training data. If your agent is supposed to answer based on your help docs, the response should only contain information from those docs.</p><p>Content filtering catches harmful or inappropriate content before the user sees it. This includes toxicity scoring and other safety classifiers that screen the agent&#8217;s output.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ppGD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ppGD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!ppGD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!ppGD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!ppGD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ppGD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ppGD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!ppGD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!ppGD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!ppGD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8c4d400-6e78-4e49-bd08-150ff6b3706a_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 10: Sequence of output guardrails</figcaption></figure></div><p>Neither layer alone is enough. PII masking protects your data on the way in but doesn&#8217;t prevent hallucinated tool calls on the way out. Output validation catches bad responses but doesn&#8217;t stop sensitive data from reaching the LLM. Teams implement both of these layers in practice.</p><h2>Post-Launch: Lessons from 20,000 Deployments</h2><p>You&#8217;ve scoped your use case, defined a KPI, and built your trust layer. The agent is live. As we discussed earlier, this is where 90% of the real work begins. The following lessons come from Salesforce&#8217;s experience managing 20,000 enterprise agent deployments once they meet real users.</p><h3>Build a Feedback Loop</h3><p>In traditional software, testing is fairly binary. You have unit tests, integration tests, maybe some latency benchmarks. They pass or they don&#8217;t. With agents, the failure modes are fuzzier. Users ask things you didn&#8217;t anticipate. The agent&#8217;s tone drifts from your brand. A retrieved document turns out to be outdated. The agent gets the right answer from the wrong source. This makes the feedback loop an important post-launch investment.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8rA3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8rA3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 424w, https://substackcdn.com/image/fetch/$s_!8rA3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 848w, https://substackcdn.com/image/fetch/$s_!8rA3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 1272w, https://substackcdn.com/image/fetch/$s_!8rA3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8rA3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png" width="1456" height="804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:804,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8rA3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 424w, https://substackcdn.com/image/fetch/$s_!8rA3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 848w, https://substackcdn.com/image/fetch/$s_!8rA3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 1272w, https://substackcdn.com/image/fetch/$s_!8rA3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d0713f2-f77b-4a6d-8678-ac70c1e07344_1946x1074.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The feedback loop has four triage categories, each with a different fix.</p><p><strong>1. Tone and brand alignment.</strong> The agent&#8217;s responses don&#8217;t match your company&#8217;s voice. This is especially common with B2C agents where brand consistency matters. The fix is in the system prompt and instructions. Adjust the voice guidelines, add examples of preferred phrasing, and re-test against recent transcripts.</p><p>Agibank is one example. Its FAQ agent pulls accurate answers in real time from a knowledge base stored in Agentforce Service, delivering clear responses while staying human, not robotic. &#8220;Our principal objective is to reduce effort for customers and provide fast, clear answers through automation, while preserving a welcoming, human tone and not being robotic,&#8221; said Akira Vargas Morishita, CX Process &amp; Continuous Improvement Coordinator at Agibank.</p><p><strong>2. Logic errors.</strong> The agent calls the wrong tool, reasons incorrectly, or takes too many steps to reach an answer. This shows up as slow or wrong responses. Start by checking tool configurations and instructions. If the same error keeps recurring, that flow is a candidate for deterministic scripting instead of LLM reasoning.</p><p><strong>3. Data quality.</strong> The agent gives a wrong answer not because it hallucinated, but because the source was wrong. Salesforce&#8217;s support agent is grounded in 135,000 help articles, and the team regularly finds outdated or conflicting documents behind flagged responses. The fix isn&#8217;t in the agent. It&#8217;s routing the issue back to the data owners to update or retire the source content.</p><p><strong>4. Coverage gaps.</strong> Users ask things the agent was never designed to handle. This is inevitable and grows with adoption. The fix is either expanding the agent&#8217;s scope or building a clean escalation path to a human. Either way, log the gap so you can track how coverage grows over time.</p><p>Telepass is one example. When a question needs human support, such as troubleshooting a device, Agentforce escalates to a live rep with full context so the customer doesn&#8217;t have to start over. Before the chat ends, it offers a short survey to capture feedback and satisfaction, and Telepass uses that data to shape its roadmap for future deployments.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dyiR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dyiR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!dyiR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!dyiR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!dyiR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dyiR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dyiR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!dyiR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!dyiR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!dyiR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d8df889-f329-4262-a4bb-786d504ae4b9_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 11: Summary of categories and fixes</figcaption></figure></div><p>The key insight is that this loop needs to be fast. Across Salesforce&#8217;s customer base, the speed of this feedback loop turned out to be the gate to scaling. Teams that could quickly triage and fix issues gained confidence in their KPIs and got approval to expand. Teams with slow loops stayed stuck in pilot mode.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0wsm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0wsm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!0wsm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!0wsm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!0wsm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0wsm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0wsm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!0wsm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!0wsm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!0wsm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb243756-d1b5-4fb2-9c33-6c610286bad7_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 12: Agent Feedback Loop</figcaption></figure></div><p>The feedback loop catches problems. But some problems are better prevented than triaged. Salesforce identified three recurring anti-patterns that consistently degraded agent performance. Each one is easy to fall into and hard to diagnose through transcripts alone.</p><h3>Anti-Patterns: What Not to Do</h3><p>The feedback loop catches problems. But some problems are better prevented than triaged. Across 20,000 deployments, Salesforce identified three recurring anti-patterns that consistently degraded agent performance. Each one is a mistake that&#8217;s easy to make and shows up in production as degraded accuracy, slow response times, or both.</p><h4><strong>1. Over-reliance on LLM reasoning where code is better</strong></h4><p>Not every agent decision needs to go through the LLM. When a customer asks &#8220;Where&#8217;s my order?&#8221;, the correct sequence of API calls is deterministic: look up the order, get its status, get the shipment details. Routing this through the LLM&#8217;s reasoning loop means multiple round trips, each adding latency and introducing a chance of error.</p><p>This is the most common anti-pattern, and it directly led Salesforce to build Agent Script. Agent Script solves this by letting you define deterministic control flow alongside LLM-powered decision-making. It&#8217;s a TypeScript-based scripting framework where you can specify: if the user&#8217;s intent matches X, skip the reasoning loop and immediately execute this sequence of tool calls. The LLM is still there for the parts that genuinely need flexibility like understanding ambiguous requests or generating natural language responses, but the predictable parts run as code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5t06!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5t06!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!5t06!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!5t06!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!5t06!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5t06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5t06!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!5t06!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!5t06!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!5t06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6aee061-1a11-4719-99f4-9e8ceb82ecbc_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 13: Reasoning vs. Deterministic Code</figcaption></figure></div><p>The general principle for developers is that if you can write the logic as a flowchart, it should probably be code, not a prompt.</p><h4><strong>2: Prompting harder instead of encoding policies</strong></h4><p>This one is subtle because it feels like good prompt engineering. Teams discover the agent does something wrong, so they add a strongly worded instruction. &#8220;NEVER do X.&#8221; &#8220;ALWAYS do Y.&#8221; They use exclamation points, bold text, and capitalization. When the agent still gets it wrong, they add even more emphasis.</p><p>This doesn&#8217;t work reliably. LLMs don&#8217;t respond to emphasis the way humans do. What works is encoding business rules as explicit, structured policies. For example, if you&#8217;re a financial services company that doesn&#8217;t operate in Hawaii, you don&#8217;t want the LLM to figure that out from a prompt instruction. You need a policy that says: if the customer&#8217;s state is Hawaii, return this specific response. No LLM judgment involved.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NDIK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NDIK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!NDIK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!NDIK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!NDIK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NDIK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NDIK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!NDIK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!NDIK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!NDIK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeded4c3-af3c-406a-8f60-76b9276d2d0d_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 14: Prompting Harder vs Encoding Policies</figcaption></figure></div><p>This is another area where deterministic scripting helps. Instead of hoping the LLM internalizes a rule from natural language, you encode it as a conditional in code. It executes the same way every time.</p><h4><strong>3. Poor context engineering</strong></h4><p>This anti-pattern hurts both accuracy and performance at the same time. Many teams start by passing full, unfiltered API responses into the agent&#8217;s context window. A large e-commerce company, for instance, has a get_orders API call that returns roughly 100K tokens by default.</p><p>This causes two problems. First, the agent has to reason over a much larger input, which slows response time. Second, the noise makes the agent less accurate. When the relevant information is buried in hundreds of irrelevant fields, the agent is more likely to pick up the wrong data or miss the right answer entirely.</p><p>The fix is right-sizing your context. For the e-commerce example, that meant trimming the get_orders response from 100K tokens to roughly 2K by returning only the fields the agent actually needs: order ID, current status, expected delivery date, and tracking number. The same principle applies to document retrieval. One insurance company was loading entire policy documents into context to answer a single question. The fix was retrieving only the relevant sections instead of the full document. In both cases, less context meant faster responses and more accurate answers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zr0h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zr0h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!zr0h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!zr0h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!zr0h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zr0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zr0h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!zr0h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!zr0h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!zr0h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee148377-4aa4-44d2-a390-a8957bf0fd67_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 15: Context Engineering</figcaption></figure></div><h1><strong>What&#8217;s next?</strong></h1><p>Here are three directions Salesforce sees enterprise agent architecture heading next.</p><h3><strong>1. Multi-Agent Orchestration</strong></h3><p>Most agent deployments today use a single agent per use case. The next step is multiple agents working together, where a parent agent coordinates specialized sub-agents, each handling a narrower piece of the problem.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PzPB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PzPB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!PzPB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!PzPB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!PzPB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PzPB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PzPB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!PzPB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!PzPB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!PzPB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d44ed6e-8602-4c9d-aae3-10180e7b6310_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 16: Multi-Agent Orchestration</figcaption></figure></div><p>Salesforce is already building orchestration systems that go three levels deep: a parent agent with sub-agents, where each sub-agent can have its own sub-agents.</p><p>For developers, this changes how you architect agent systems. Instead of one large agent that handles everything, you decompose the problem into specialized agents that hand off to each other. Each agent has a narrower scope, which means simpler instructions, fewer tools, and a smaller context window.</p><h3><strong>2. Agents Beyond the Chat Window</strong></h3><p>Today, most agents live inside a chat widget. But the use cases emerging at enterprise scale go well beyond that. They include multi-session tasks that span days (like a tier-2 support case or a return authorization workflow), background agents that run with no user-facing interface at all, and agents that work across channels like phone, email, web, and Slack.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q1og!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q1og!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!q1og!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!q1og!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!q1og!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q1og!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q1og!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!q1og!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!q1og!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!q1og!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F031ac7bd-b939-4c53-98d2-a6a75642aa3c_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 17: Agents Beyond the Chat Window</figcaption></figure></div><p>For developers, the takeaway is to avoid coupling your agent architecture too tightly to a chat interface. The agent&#8217;s logic, tools, and policies should be independent of the delivery channel. Teams building on Agentforce today are already running the same agent across web chat, phone (via Agentforce Voice), and background automation.</p><h3><strong>3. The Pace of Change</strong></h3><p>One thing that came up repeatedly in our conversation is how fast this space is moving. Coding agents, for example, went from basic assistants to highly capable tools in just a few months. The models are getting faster. The tooling is getting better. What&#8217;s considered best practice today may look different six months from now.</p><p>Building enterprise agents is still early. The models, tooling, and best practices are all moving fast. But the core engineering disciplines hold. Start small. Measure what matters. Build tight feedback loops. Encode policies in code, not prompts. Keep your context lean. These aren&#8217;t tricks tied to any specific model or framework. They&#8217;re how you build agents that work at scale.</p>]]></content:encoded></item><item><title><![CDATA[Token Spend Out of Control? The Case for Smarter Routing]]></title><description><![CDATA[To understand how teams keep this under control in production, we sat down with Scott Breitenother and Sid Sijbrandij, co-founders of Kilo, an open-source coding agent that runs through a lot of these loops every day.]]></description><link>https://blog.bytebytego.com/p/token-spend-out-of-control-the-case</link><guid isPermaLink="false">https://blog.bytebytego.com/p/token-spend-out-of-control-the-case</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Mon, 08 Jun 2026 15:01:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WLFU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/Agentfield_060826CTA">Code review needed a new architecture. We open-sourced it. (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Agentfield_060826CTA" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wXlq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!wXlq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!wXlq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!wXlq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wXlq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1120046,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Agentfield_060826CTA&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/200791851?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wXlq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!wXlq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!wXlq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!wXlq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb90216ba-7051-402e-92c7-44f640f3256e_1920x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Code review broke when AI started writing the code.</p><p>AgentField just shipped a multi-agent code reviewer with dynamic meta-orchestration. The planner reads each PR first, then compiles a custom review strategy for it - security agents for auth changes, schema agents for migrations, behavioral agents for refactors. Configurable per team. Deploy it with one docker compose. Runs on open or closed models (Kimi, DeepSeek, Claude). Costs cents per review on open models - no per-seat licenses.</p><p>AgentField&#8217;s <a href="https://go.bytebytego.com/Agentfield_060826Writeup">writeup</a>: the four jobs of code review, which three stay load-bearing once AI writes the first draft, and why static pipelines fail. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Agentfield_060826CTA&quot;,&quot;text&quot;:&quot;&#8594; Star &amp; Deploy&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/Agentfield_060826CTA"><span>&#8594; Star &amp; Deploy</span></a></p><div><hr></div><p>LLM agents can burn millions of tokens on a single task. They put a model in a loop, resend the full context every step, and usually call the most expensive one available. Costs scale fast.</p><p>To understand how teams keep this under control in production, we sat down with <a href="https://www.linkedin.com/in/scottbreitenother/">Scott Breitenother </a>and <a href="https://www.linkedin.com/in/sijbrandij/">Sid Sijbrandij</a>, co-founders of <a href="https://kilo.ai/">Kilo</a>, an open-source coding agent that runs through a lot of these loops every day. The patterns they shared are not specific to coding, and most of them are not unique to Kilo either. Similar approaches show up in tools like Cursor, Cline, and Aider, and in shared infrastructure like OpenRouter and RouteLLM. If you build any agent that makes many model calls, the same ideas apply.</p><h2><strong>Why Running an LLM Agent Gets Expensive</strong></h2><p>A single request to a language model is usually cheap. An agent built on the same model is not. The difference is that an agent makes many calls instead of one, and it tends to send them to the most expensive models available. Both of these drive the cost up.</p><h3><strong>1. Frontier Models Cost a Lot Per Token</strong></h3><p>The most capable models are called frontier models. They sit at the leading edge of what is possible, and they cost the most per token. Below them is a range of cheaper models that give up some capability for a lower price, down to small models that are very cheap and still handle simple work well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vzif!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vzif!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 424w, https://substackcdn.com/image/fetch/$s_!vzif!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 848w, https://substackcdn.com/image/fetch/$s_!vzif!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!vzif!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vzif!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png" width="1456" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vzif!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 424w, https://substackcdn.com/image/fetch/$s_!vzif!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 848w, https://substackcdn.com/image/fetch/$s_!vzif!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!vzif!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55ffcafe-a0c3-4357-ac89-a4e556c8e568_2000x1278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The cost ladder: frontier vs. small models</figcaption></figure></div><p>The gap across that ladder is large. The top model often costs more than ten times what a small one costs for the same work. Teams that use frontier models to power their applications pay frontier prices for everything, which makes the whole system expensive.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0QUF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0QUF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 424w, https://substackcdn.com/image/fetch/$s_!0QUF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 848w, https://substackcdn.com/image/fetch/$s_!0QUF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 1272w, https://substackcdn.com/image/fetch/$s_!0QUF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0QUF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png" width="830" height="806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:830,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0QUF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 424w, https://substackcdn.com/image/fetch/$s_!0QUF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 848w, https://substackcdn.com/image/fetch/$s_!0QUF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 1272w, https://substackcdn.com/image/fetch/$s_!0QUF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1ed4d9-a840-4d57-8ca4-b16845b1a4d6_830x806.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">LLM input/output cost (Source: <a href="https://www.together.ai/pricing">Together AI</a>)</figcaption></figure></div><h3><strong>2. The Agent Loop Multiplies Every Call</strong></h3><p>Frontier models are expensive per token, but in a standard chatbot setting, the cost is manageable. For example, Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. A single question and answer is only a few thousand tokens, so it costs under two cents. At that rate, you can ask a lot of questions before the cost matters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6koi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6koi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 424w, https://substackcdn.com/image/fetch/$s_!6koi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 848w, https://substackcdn.com/image/fetch/$s_!6koi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 1272w, https://substackcdn.com/image/fetch/$s_!6koi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6koi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png" width="1456" height="883" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:883,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6koi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 424w, https://substackcdn.com/image/fetch/$s_!6koi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 848w, https://substackcdn.com/image/fetch/$s_!6koi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 1272w, https://substackcdn.com/image/fetch/$s_!6koi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec7bc6ba-18ca-40e7-883a-5222d2a854cb_2048x1242.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A chatbot call costs a few cents</figcaption></figure></div><p>LLM agents are different. They do not produce an answer immediately. They run in a loop. The agent reads the task, takes an action like running a tool or reading a file, looks at the result, and decides what to do next. To see why this gets expensive, look at what each step sends to the LLM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PRHV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PRHV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 424w, https://substackcdn.com/image/fetch/$s_!PRHV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 848w, https://substackcdn.com/image/fetch/$s_!PRHV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!PRHV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PRHV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png" width="1456" height="732" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PRHV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 424w, https://substackcdn.com/image/fetch/$s_!PRHV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 848w, https://substackcdn.com/image/fetch/$s_!PRHV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!PRHV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F808fb358-8ec5-4a7c-9179-329413eba1e7_2048x1030.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The agent loop</figcaption></figure></div><p>Since an LLM has no memory of its own, everything has to be bundled into the context on every step. That includes the instructions, the question, tool schemas, tool calls, tool results, and the LLM&#8217;s intermediate thinking. The agent resends all of it each turn, so the context grows as the loop runs, and each call to the LLM costs more than the last.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NV5s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NV5s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 424w, https://substackcdn.com/image/fetch/$s_!NV5s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 848w, https://substackcdn.com/image/fetch/$s_!NV5s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!NV5s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NV5s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png" width="1456" height="846" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:846,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NV5s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 424w, https://substackcdn.com/image/fetch/$s_!NV5s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 848w, https://substackcdn.com/image/fetch/$s_!NV5s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!NV5s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4e78c04-56bb-4a04-81d7-e9fee5be2ee3_2048x1190.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Context grows with more turns</figcaption></figure></div><p>A session might start at a few thousand tokens. By the time the agent has read a dozen files and run a dozen tools, a single request near the end can carry well over a hundred thousand. As a result, agents can burn many times more tokens than a single chatbot question.</p><p>The growing context is only half of it. The other half is how often the agent calls the model. In a normal chat, it takes time for a person to type a question and effort to read the answer, so they only ask so much. An agent has no such brake. An agent that reviews every code change, comments on every commit, and writes a test for every function fires off requests as fast as the software allows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sxkk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sxkk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 424w, https://substackcdn.com/image/fetch/$s_!Sxkk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 848w, https://substackcdn.com/image/fetch/$s_!Sxkk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 1272w, https://substackcdn.com/image/fetch/$s_!Sxkk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sxkk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png" width="1456" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sxkk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 424w, https://substackcdn.com/image/fetch/$s_!Sxkk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 848w, https://substackcdn.com/image/fetch/$s_!Sxkk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 1272w, https://substackcdn.com/image/fetch/$s_!Sxkk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e1add8-6c29-4803-916e-ac040d5bc742_2048x1193.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><a href="https://go.bytebytego.com/Datadog_060825">Your AI pipeline passed every test. Then it hit production. (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Datadog_060825" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZdHh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!ZdHh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!ZdHh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!ZdHh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZdHh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png" width="1080" height="1080" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1080,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:406111,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Datadog_060825&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198889640?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ZdHh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 424w, https://substackcdn.com/image/fetch/$s_!ZdHh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 848w, https://substackcdn.com/image/fetch/$s_!ZdHh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!ZdHh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e5ef085-2c75-447f-83b7-44d99a4ba672_1080x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Datadog&#8217;s free Developer Toolkit for the AI Era gives you four resources to close the gap, from catching flaky tests and CI bottlenecks before they block releases, to instrumenting every LLM call for quality, latency, and cost regressions.</p><p>You&#8217;ll learn how to:</p><ul><li><p>Surface and eliminate CI pipeline failures before they block your AI delivery cycles.</p></li><li><p>Use feature flags to control AI rollouts and DORA metrics to measure exactly how your team is shipping.</p></li><li><p>Score LLM output quality and catch latency drift across every model call in production.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Datadog_060825&quot;,&quot;text&quot;:&quot;Download Toolkit&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/Datadog_060825"><span>Download Toolkit</span></a></p><div><hr></div><h2><strong>The Standard Approach: Route Requests to the Right Model</strong></h2><p>A growing context and an agent that runs without a human brake are both inherent to how agents work, so you cannot really send fewer tokens. What you can change is which model receives them, and that is what routing does. A router looks at each request, decides which model is good enough, and sends it there.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_4Kr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_4Kr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 424w, https://substackcdn.com/image/fetch/$s_!_4Kr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 848w, https://substackcdn.com/image/fetch/$s_!_4Kr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 1272w, https://substackcdn.com/image/fetch/$s_!_4Kr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_4Kr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_4Kr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 424w, https://substackcdn.com/image/fetch/$s_!_4Kr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 848w, https://substackcdn.com/image/fetch/$s_!_4Kr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 1272w, https://substackcdn.com/image/fetch/$s_!_4Kr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3368e83-9408-4ec4-a15c-280eea2f487f_1614x914.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Routing each request to the right model</figcaption></figure></div><p>For example, think about the requests a coding agent sends while working on a task. A few are hard, like designing how a system should be structured. Most are simple, like renaming a variable or summarizing a file. The hard ones need a frontier model. The simple ones do not, but they cost the same if you send them there too. Routing saves money by sending each request to the cheapest model that can handle it, so you only pay for a frontier model when you actually need one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FvTv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FvTv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 424w, https://substackcdn.com/image/fetch/$s_!FvTv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 848w, https://substackcdn.com/image/fetch/$s_!FvTv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 1272w, https://substackcdn.com/image/fetch/$s_!FvTv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FvTv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png" width="1456" height="648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FvTv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 424w, https://substackcdn.com/image/fetch/$s_!FvTv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 848w, https://substackcdn.com/image/fetch/$s_!FvTv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 1272w, https://substackcdn.com/image/fetch/$s_!FvTv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F565f0632-c328-48ba-a2c7-9029b0a72329_2048x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Route requests based on their complexity</figcaption></figure></div><h3><strong>How a Router Works Under the Hood</strong></h3><p>A router needs two things. It needs a way to send requests to many different models. It also needs to decide which model to use for each request. These are two separate problems, and keeping them separate makes the design cleaner.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Jxb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Jxb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 424w, https://substackcdn.com/image/fetch/$s_!5Jxb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 848w, https://substackcdn.com/image/fetch/$s_!5Jxb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 1272w, https://substackcdn.com/image/fetch/$s_!5Jxb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Jxb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png" width="1456" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Jxb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 424w, https://substackcdn.com/image/fetch/$s_!5Jxb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 848w, https://substackcdn.com/image/fetch/$s_!5Jxb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 1272w, https://substackcdn.com/image/fetch/$s_!5Jxb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d0b64f3-ea91-4850-88ad-61c29abb6fbe_1936x958.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Two components of a router</figcaption></figure></div><p>The first problem is the entry point. Normally each model provider has its own request format, so using several models means writing separate code for each. A single entry point gives you one standard request format, and the router translates that into whatever the chosen provider expects, sends it, and translates the response back. You write in one format. The router talks to all the providers for you. Without this, routing between models is not practical in the first place.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f_V7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f_V7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 424w, https://substackcdn.com/image/fetch/$s_!f_V7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 848w, https://substackcdn.com/image/fetch/$s_!f_V7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 1272w, https://substackcdn.com/image/fetch/$s_!f_V7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f_V7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png" width="1456" height="613" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:613,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f_V7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 424w, https://substackcdn.com/image/fetch/$s_!f_V7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 848w, https://substackcdn.com/image/fetch/$s_!f_V7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 1272w, https://substackcdn.com/image/fetch/$s_!f_V7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9b1c091-556d-4d95-b9e3-9ac9c37e7b7f_2048x862.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">One entry point, many providers</figcaption></figure></div><p>The second problem is the decision. Which model should a given request use? In practice, the decision is handled in two ways.</p><p>The first is to route on a signal you already have.<strong> </strong>If the system already knows what kind of work a request is, it can map that kind of work to a model. A request known to be a planning task maps to a strong reasoning model. A request known to be a simple edit maps to a cheap one. This is reliable and almost free to run, because the decision is just a lookup. The catch is that you need a trustworthy signal to begin with.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gRKx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gRKx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 424w, https://substackcdn.com/image/fetch/$s_!gRKx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 848w, https://substackcdn.com/image/fetch/$s_!gRKx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 1272w, https://substackcdn.com/image/fetch/$s_!gRKx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gRKx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png" width="1456" height="689" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:689,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gRKx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 424w, https://substackcdn.com/image/fetch/$s_!gRKx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 848w, https://substackcdn.com/image/fetch/$s_!gRKx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 1272w, https://substackcdn.com/image/fetch/$s_!gRKx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e55b66-79af-46af-9d49-7119b3c9470c_2048x969.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Route on a known signal</figcaption></figure></div><p>The second is to predict the right model from the request itself. The system reads the request, judges how hard it is, and picks the cheapest model likely to answer it well. This works even when you have no prior signal about the request. The cost is that the prediction has to be learned from data and kept current as models change. A wrong guess sends a hard request to a model that cannot handle it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Si-U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Si-U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 424w, https://substackcdn.com/image/fetch/$s_!Si-U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 848w, https://substackcdn.com/image/fetch/$s_!Si-U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 1272w, https://substackcdn.com/image/fetch/$s_!Si-U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Si-U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25474342-0972-4147-b756-303952888884_2048x871.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Si-U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 424w, https://substackcdn.com/image/fetch/$s_!Si-U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 848w, https://substackcdn.com/image/fetch/$s_!Si-U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 1272w, https://substackcdn.com/image/fetch/$s_!Si-U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25474342-0972-4147-b756-303952888884_2048x871.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Predict from the request</figcaption></figure></div><p>Most real systems use one entry point with one of these two decision methods on top. The entry point gives access to many models. The decision technique picks among them.</p><h3><strong>How Much Routing Actually Saves</strong></h3><p>Routing saves money because most requests do not need a frontier model, and cheap models have gotten good enough to handle them. The saving is the gap between the frontier price you would have paid and the cheaper price you actually paid, summed over every request that did not need the expensive model.</p><p>The effect of proper routing is quite noticeable. In a widely cited study from researchers at UC Berkeley and Anyscale, a router cut cost by about half while keeping 95% of a frontier model&#8217;s quality. It did this by sending only the hard requests to the frontier model and the rest to a cheaper one. More broadly, results across the field tend to land between forty and seventy percent cost savings, with little drop in quality on hard tasks.</p><p style="text-align: center;">Router cost saving (Source: <a href="https://github.com/lm-sys/RouteLLM">Github</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RGHM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RGHM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 424w, https://substackcdn.com/image/fetch/$s_!RGHM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 848w, https://substackcdn.com/image/fetch/$s_!RGHM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 1272w, https://substackcdn.com/image/fetch/$s_!RGHM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RGHM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png" width="1172" height="1039" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1039,&quot;width&quot;:1172,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RGHM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 424w, https://substackcdn.com/image/fetch/$s_!RGHM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 848w, https://substackcdn.com/image/fetch/$s_!RGHM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 1272w, https://substackcdn.com/image/fetch/$s_!RGHM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c91602b-6099-4093-a4bd-f51d56010e88_1172x1039.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Those savings come with tradeoffs, though. The decision step adds a little delay to every request and becomes one more thing that can break. A wrong decision hurts quality by sending a hard request to a weak model. A predicting router needs data and upkeep to stay accurate. And switching between different model families inside one task can cause trouble, because the internal reasoning one model produces is not always readable by another. None of these kill the idea. They are why routing is an important but challenging problem to handle well.</p><h2><strong>A Case Study: How Kilo Routes Requests in Production</strong></h2><p>To make this concrete, it helps to look at one system that does it end-to-end. Kilo makes an open-source AI coding agent that drives a model in long loops to write and fix code, and across its user base that adds up to a very high request volume. To serve that volume without the cost running away, the Kilo team built its own routing layer, the Kilo Gateway, and runs all of its traffic through it. The numbers and design choices in the rest of this section come from that production system.</p><h3><strong>The Gateway</strong></h3><p>Kilo Gateway is the entry point described earlier. It puts one consistent way of making requests in front of more than five hundred models, so switching from one to another is a one-line change. It speaks the same request format most code is already written for, so existing code works by pointing it at the Gateway. It also lets a team use its own provider accounts and pay only for the routing, not a markup on the models.</p><p>The decision layer is the interesting part. Kilo uses the first method: it routes on a signal it already has. Its coding agent always knows what it is doing right now, because it works in distinct modes like planning, writing code, or debugging. The agent sends that mode with each request. The Gateway reads the mode and maps it to a model. The mode is a trustworthy signal of how hard the work is, so the system gets most of the benefit of routing without having to guess difficulty from the request text.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3mZJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3mZJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 424w, https://substackcdn.com/image/fetch/$s_!3mZJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 848w, https://substackcdn.com/image/fetch/$s_!3mZJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 1272w, https://substackcdn.com/image/fetch/$s_!3mZJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3mZJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png" width="1456" height="654" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:654,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3mZJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 424w, https://substackcdn.com/image/fetch/$s_!3mZJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 848w, https://substackcdn.com/image/fetch/$s_!3mZJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 1272w, https://substackcdn.com/image/fetch/$s_!3mZJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fced505b4-742c-48cf-9570-d9f7dc38b1e6_2048x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Kilo Gateway: routing by mode</figcaption></figure></div><p>The routing is organized into tiers a user can pick from. A top tier sends demanding modes like planning and debugging to the strongest model, and sends routine modes like code editing to a capable but cheaper one. A balanced tier sends everything to an economical model. A free tier maps to no-cost models. A separate internal tier quietly handles background chores, like writing commit messages, with tiny models so they never burn expensive capacity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WLFU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WLFU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 424w, https://substackcdn.com/image/fetch/$s_!WLFU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 848w, https://substackcdn.com/image/fetch/$s_!WLFU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 1272w, https://substackcdn.com/image/fetch/$s_!WLFU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WLFU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png" width="1456" height="1335" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1335,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WLFU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 424w, https://substackcdn.com/image/fetch/$s_!WLFU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 848w, https://substackcdn.com/image/fetch/$s_!WLFU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 1272w, https://substackcdn.com/image/fetch/$s_!WLFU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb606acb8-099f-4f97-847a-a0b88400c482_2002x1836.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Routing tiers</figcaption></figure></div><p>One design choice explains how the system stays current. The map from a mode to a specific model does not live in the software on your machine. It is served from Kilo&#8217;s own systems and refreshed often. So the underlying models can be swapped as prices and quality change, while the tier you picked stays the same.</p><p>This flexibility has one cost worth understanding. A tier can change models between the turns of a single task, for example moving from one provider&#8217;s model to another&#8217;s as the mode shifts. The problem is that reasoning models produce internal thinking in their own format, and one provider&#8217;s model cannot read the thinking another model wrote. So when the tier switches families mid-task, Kilo has to drop that intermediate reasoning before the next call. The agent keeps working, but it loses some of the context it built up, which can cost a little quality on the next step.</p><h3><strong>What Kilo&#8217;s Production Numbers Show</strong></h3><p>Kilo published figures from its own production traffic over the first quarter of 2026. These are the company&#8217;s internal numbers from paid usage, not independently verified, but they are useful because they come from a real workload rather than a benchmark.</p><p>When the team let the Gateway route on its own instead of having users pick a model by hand, the average cost per request dropped by about a third. Kilo found that 80 to 90% of requests do not need frontier models. Across millions of requests, routing those jobs to cheaper models adds up to real savings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9reV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9reV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 424w, https://substackcdn.com/image/fetch/$s_!9reV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 848w, https://substackcdn.com/image/fetch/$s_!9reV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!9reV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9reV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png" width="1456" height="1043" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1043,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9reV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 424w, https://substackcdn.com/image/fetch/$s_!9reV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 848w, https://substackcdn.com/image/fetch/$s_!9reV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!9reV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9c270f6-4a2e-4a98-8f94-2948da050621_1594x1142.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Auto-routing cuts cost by a third (Kilo internal paid prod traffic, March 2026)</figcaption></figure></div><p>The bigger surprise was how much the choice of tier shaped the cost. For the same coding work, running on the cheaper balanced tier cost over ten times less per request than running on the top tier, and that gap showed up across every kind of work the team measured. The smallest background tasks, handled by tiny models, came in at a fraction of a cent. Most of the savings turned out to come not from anything clever, but from simply keeping routine work off the most expensive model.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l_--!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l_--!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 424w, https://substackcdn.com/image/fetch/$s_!l_--!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 848w, https://substackcdn.com/image/fetch/$s_!l_--!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 1272w, https://substackcdn.com/image/fetch/$s_!l_--!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l_--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png" width="1456" height="458" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:458,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l_--!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 424w, https://substackcdn.com/image/fetch/$s_!l_--!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 848w, https://substackcdn.com/image/fetch/$s_!l_--!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 1272w, https://substackcdn.com/image/fetch/$s_!l_--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24f3c149-3df4-42ef-9c37-25246f8d5307_2048x644.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Balanced vs. top tier cost saving</figcaption></figure></div><p>To put a number on what that is worth, the team estimated that forcing its routine traffic onto top-tier models for the quarter would have cost about eighty-seven thousand dollars more. That is roughly what getting routing wrong would have cost them on a real workload.</p><p>The Kilo team also shared a useful finding about caching. Caching, where the system saves repeated context so you do not pay for it twice, is usually seen as the main way to cut the bill. Kilo found that even with cache reuse above eighty percent on many features, total spend stayed high, because there were so many requests and the part of each context that could not be cached was still large. Caching clearly helps, but it does not solve the volume problem by itself. Routing works on a different part of the cost, which is why teams tend to use both together.</p><h2><strong>Lessons for Any Team Running Agents at Scale</strong></h2><p>Kilo&#8217;s numbers point to a few lessons that apply to any team running agents at scale, whichever router or models they use.</p><p><strong>Set a budget and treat AI spend like any other infrastructure cost. </strong>It is tempting to just switch to a cheaper model and reduce the per-token rate. But a lower rate usually leads to more usage, since work that was too expensive before now looks affordable. So the total bill often climbs even as each request gets cheaper.</p><p>Pick a monthly budget for the whole workload and treat it as fixed. Then the goal is not the lowest price per request, but the most useful work you can fit inside that budget.</p><p><strong>Measure before you optimize.</strong> Cost is driven by tokens per request, which is mostly a function of context size, not by which part of the product the request came from. Two requests can both be tagged &#8220;chat&#8221; while one carries a thousand tokens and the other a hundred thousand.</p><p>So log the token count of every request, and tag each one with the task type and the feature that sent it. Then add the tokens up per group. The groups that dominate your token total, not the ones that send the most requests, are where your spend actually is, and where routing pays off most.</p><p><strong>Route on the strongest signal you already have.</strong> If your system already knows the task type, like whether a request is planning or a simple edit, route on that directly. It is a static lookup from task type to model: cheap to run, predictable, and easy to debug when a route looks wrong. Only fall back to inferring difficulty from the request text when you have no such signal, since that means running a separate classifier you have to train, evaluate, and keep current as models change.</p><p>Either way, the router can only help if there is a real spread of models behind it. Give it access to the full range, from frontier models down to small cheap ones, so it has a meaningfully cheaper option to pick whenever the work allows.</p><h2><strong>Conclusion</strong></h2><p>Today, routing still takes manual work. You have to choose a tier, set up the signal it routes on, and decide which tasks are safe to send to a cheaper model. It works, but it is something you have to stay on top of.</p><p>This will get easier. Routers are starting to make these choices on their own. Instead of being told the task type, a router will read the request, judge how hard it is, and pick the model itself. Over time it will get more precise, choosing a model for each step of a task rather than for the whole task. The goal is for routing to fade into the background, the way load balancing did. You set a budget and a quality bar, and the system handles the rest.</p><p>This matters more every year, because agents keep getting more capable. They run longer, act on their own, and send more tokens with no one watching. As that grows, picking the right model stops being a way to save a little money and becomes what decides whether running an agent is affordable at all. The main takeaway is this: routing is no longer a cost optimization. It is becoming part of what makes ambitious agents possible.</p>]]></content:encoded></item><item><title><![CDATA[EP217: Latency vs Throughput vs Bandwidth]]></title><description><![CDATA[Latency, throughput, and bandwidth often get used interchangeably, but each one tells a different story about performance.]]></description><link>https://blog.bytebytego.com/p/ep217-latency-vs-throughput-vs-bandwidth</link><guid isPermaLink="false">https://blog.bytebytego.com/p/ep217-latency-vs-throughput-vs-bandwidth</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Sat, 06 Jun 2026 15:30:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7PuH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/QAWolf_060626Headline">Map workflows, automate E2E tests, and ship faster with QA Wolf (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/QAWolf_060626CTA" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_yQG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!_yQG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!_yQG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!_yQG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_yQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90284,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/QAWolf_060626CTA&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199796558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_yQG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!_yQG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!_yQG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!_yQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7d0b87-f2eb-4e4c-bb77-b4f74c12a109_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://go.bytebytego.com/QAWolf_060626QAWolf">QA Wolf&#8217;s</a> AI agent maps and tests your app&#8217;s most complex user flows.</p><p>It turns your prompts into real Playwright and Appium code that runs 12x faster and more reliably than other computer-use agents.</p><p>What sets our AI apart:</p><ul><li><p>Maps <strong>200+ test cases in minutes</strong> instead of weeks of manual planning.</p></li><li><p>Executes tests <strong>12x faster</strong> than computer-use agents.</p></li><li><p>Runs entire suites <strong>100% parallel</strong> with consistent results.</p></li><li><p>Produces open-source tests your team owns, with <strong>zero vendor lock-in</strong>.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/QAWolf_060626CTA&quot;,&quot;text&quot;:&quot;Get started today&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/QAWolf_060626CTA"><span>Get started today</span></a></p><div><hr></div><p>This week&#8217;s system design refresher:</p><ul><li><p>CPU vs GPU vs TPU (Youtube video)</p></li><li><p>Latency vs Throughput vs Bandwidth</p></li><li><p>What is Google&#8217;s TPU?</p></li><li><p>7 Permission Modes Every Claude Code User Should Know</p></li><li><p>Top AI Trends to Watch in 2026</p></li><li><p>We&#8217;re hiring at ByteByteGo</p></li></ul><div><hr></div><h2>CPU vs GPU vs TPU</h2><div id="youtube2-MUWAbpg1xLo" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;MUWAbpg1xLo&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/MUWAbpg1xLo?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><div><hr></div><h2>Latency vs Throughput vs Bandwidth</h2><p>Ever wondered why your app feels slow even when the bandwidth looks fine? Latency, throughput, and bandwidth often get used interchangeably, but each one tells a different story about performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y582!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y582!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!Y582!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!Y582!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!Y582!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y582!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png" width="1456" height="1760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:397715,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199796558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y582!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!Y582!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!Y582!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!Y582!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b5057ba-3667-446b-9760-b726da1431f4_2484x3002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Latency is the delay. How long it takes for a single packet to travel from sender to receiver. If your ping shows 40 ms round-trip, that's latency.</p><p>Throughput is the actual delivery rate. How much data is successfully transferred per second. If your download shows 62 Mbps, that&#8217;s throughput.</p><p>Bandwidth is the maximum capacity of the link. For example, a 100 Mbps connection is the upper limit under ideal conditions. </p><p>Throughput is always less than bandwidth. Network congestion, packet loss, and protocol overhead all affect throughput, which is why you never actually hit the maximum bandwidth capacity in practice.</p><p>Similarly, low latency doesn't always mean high throughput. Small payloads, single connections, and tight window sizes can all keep throughput low, which is why fast responses don't guarantee you're sending a lot of data.</p><p>Another way to understand these three concepts: Bandwidth is the highway width. Throughput is the traffic flow. Latency is how long it takes a car to go from A to B.</p><p>All three matter, but they solve different problems.</p><p>Over to you: How do you measure these metrics in a way that actually predicts when things will break?</p><div><hr></div><h2>What is Google&#8217;s TPU?</h2><p>A TPU (Tensor Processing Unit) is Google&#8217;s custom AI chip, designed from scratch for the giant matrix multiplications that modern models live on. GPUs were built for graphics first.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7PuH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7PuH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7PuH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7PuH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7PuH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7PuH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg" width="1280" height="1643" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1643,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;No alternative text description for this image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="No alternative text description for this image" title="No alternative text description for this image" srcset="https://substackcdn.com/image/fetch/$s_!7PuH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7PuH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7PuH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7PuH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff822afe9-5c16-4d2d-8fa1-f32d03b9743e_1280x1643.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>TPUs were built for deep learning from day one.</p><p>At Cloud Next &#8217;26, Google unveiled its 8th generation, and for the first time it ships in two flavors. TPU 8t is built for training, where raw throughput wins. TPU 8i is built for inference, where latency and chip-to-chip speed matter most. </p><p>Both still share the same Axion CPUs, liquid cooling, and software stack, so code written for one runs on the other.</p><p>The diagram is a quick study guide to what&#8217;s the same, what&#8217;s different, and why, based on our understanding of published Google articles.</p><div><hr></div><h2>7 Permission Modes Every Claude Code User Should Know</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bdjh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bdjh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!bdjh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!bdjh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!bdjh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bdjh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png" width="1456" height="1760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289616,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199796558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bdjh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!bdjh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!bdjh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!bdjh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9c8ec17-676c-441a-9c97-d08f8eec804f_2484x3002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>plan: The model drafts a plan. Nothing executes until the user approves.</p></li><li><p>default: Standard interactive use. Most tool calls require user approval.</p></li><li><p>acceptEdits: Edits in the working directory are auto-approved. Other shell commands still prompt.</p></li><li><p>auto: An ML classifier decides on requests that miss the fast path.</p></li><li><p>dontAsk: No prompts shown. Deny rules are still enforced.</p></li><li><p>bypassPermissions: Most prompts are skipped. Safety-critical guards still apply.</p></li><li><p>bubble: A subagent escalates its permission request to the parent.</p></li></ol><p>Only 5 modes are user-selectable. &#8220;auto&#8221; is gated by a feature flag, and &#8220;bubble&#8221; is internal.</p><p>Over to you: Which mode do you reach for most, and what made you pick it?</p><div><hr></div><h2>Top AI Trends to Watch in 2026</h2><p>2026 is already moving faster than anyone expected. Anthropic released Opus 4.7, OpenAI introduced GPT5.5-Codex, and open-source releases like Kimi K2.5 and GLM-5 showed impressive agentic performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G4Hk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G4Hk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 424w, https://substackcdn.com/image/fetch/$s_!G4Hk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 848w, https://substackcdn.com/image/fetch/$s_!G4Hk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 1272w, https://substackcdn.com/image/fetch/$s_!G4Hk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G4Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png" width="1456" height="1742" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1742,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:504928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199796558?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G4Hk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 424w, https://substackcdn.com/image/fetch/$s_!G4Hk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 848w, https://substackcdn.com/image/fetch/$s_!G4Hk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 1272w, https://substackcdn.com/image/fetch/$s_!G4Hk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac9f91d-6ad9-49c9-b593-97bacf7cf628_2508x3000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These launches point to bigger trends. Here are the five categories to closely watch in 2026.</p><p><strong>1. Efficient Reasoning:</strong> RLVR-style training scales reasoning by auto-checking math and code. In 2026, expect more adaptive reasoning and extremely sparse architectures. Early signs include Gemini&#8217;s adaptive thinking and Qwen3.5&#8217;s sparse MoE architecture.</p><p><strong>2. Persistent Agents:</strong> Agents now plan in loops with tools and memory, not just chat. In 2026, expect always-on personal agents that live across days, have access to your files, and can complete tasks safely. OpenClaw is an early example of this direction.</p><p><strong>3. Repo-Scale Coding:</strong> Coding has moved from autocomplete to multi-file edits with tests, builds, and terminal tools. In 2026, expect agents that understand very large repos and can ship security-aware PRs by default.</p><p><strong>4. Open-Weight Everywhere:</strong> Open-weight models are now strong enough to compete with closed ones. In 2026, expect more of them to get leaner, agent-ready, and easier to deploy. Models like GLM5 and Kimi K2.5 are already pushing in this direction.</p><p><strong>5. World Models + Physical AI:</strong> Multimodal models have reached impressive quality across vision, image, and video generation. In 2026, expect these models to become the foundation for physical AI and world models, with early examples like Google Genie 3 and humanoid robots already pointing the way.</p><p>Over to you: which shift do you think will change how teams build products the most in 2026?</p><div><hr></div><h2>We&#8217;re hiring at ByteByteGo</h2><p>We&#8217;re looking for multiple part-time instructors to teach AI and engineering cohort-based live courses.</p><p>This is a great fit if you love teaching, enjoy sharing what you know, and want a meaningful side thing alongside your main work.</p><p>The role has some upfront time investment to get familiar with the curriculum and prepare, but after that, it&#8217;s designed to be a limited commitment (2-5 hours bi-weekly). It offers stable income, good upside, and a chance to share your knowledge while working with ambitious learners.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0fGb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0fGb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0fGb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0fGb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0fGb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0fGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg" width="1280" height="1581" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1581,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;table&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="table" title="table" srcset="https://substackcdn.com/image/fetch/$s_!0fGb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0fGb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0fGb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0fGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69da25b1-62d2-4921-9be9-93874d3ea577_1280x1581.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re especially looking for instructors in:</p><ul><li><p>Building Production-Grade AI Systems</p></li><li><p>System Design</p></li><li><p>AI Security &amp; LLM Red-Teaming</p></li><li><p>AI Evals Intensive</p></li><li><p>AI Cost Optimization</p></li><li><p>Agentic AI Coding</p></li><li><p>Build with Codex</p></li><li><p>AI for Engineering Leaders</p></li><li><p>AI Automation</p></li><li><p>Others, please suggest</p></li></ul><p>Ideal instructors are hands-on, clear communicators, and excited to teach.</p><p>If this sounds like you, email us at <strong><a href="mailto:jobs@bytebytego.com">jobs@bytebytego.com</a></strong> with your background, the topics you&#8217;d be excited to teach, and any teaching, writing, or speaking samples.</p>]]></content:encoded></item><item><title><![CDATA[The Path of a Request: A Tour of Modern Web Architecture]]></title><description><![CDATA[In this article, we follow the journey of a web request one hop at a time.]]></description><link>https://blog.bytebytego.com/p/the-path-of-a-request-a-tour-of-modern</link><guid isPermaLink="false">https://blog.bytebytego.com/p/the-path-of-a-request-a-tour-of-modern</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Thu, 04 Jun 2026 15:31:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7d4Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p style="text-align: justify;">A web page loads in under a second. In that second, a single user request may have passed through roughly ten distinct systems on its way to and from the database. The page feels fast because of how those systems are arranged. Each layer absorbs as much traffic as it can before passing the rest along. Taken together, the layers form a funnel, with most traffic handled long before it reaches the bottom.</p><p style="text-align: justify;">Understanding what each layer does to narrow that funnel can lead to a better grasp of each component of a modern web stack.</p><p style="text-align: justify;">In this article, we follow the journey of a web request one hop at a time. At each stop, we ask two questions. What is this layer doing, and what trade-off is it making? The journey starts before the request has fully left the browser, and latency is spent at every hop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7d4Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7d4Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 424w, https://substackcdn.com/image/fetch/$s_!7d4Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 848w, https://substackcdn.com/image/fetch/$s_!7d4Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 1272w, https://substackcdn.com/image/fetch/$s_!7d4Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7d4Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png" width="1456" height="1698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1698,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133134,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/200598985?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7d4Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 424w, https://substackcdn.com/image/fetch/$s_!7d4Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 848w, https://substackcdn.com/image/fetch/$s_!7d4Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 1272w, https://substackcdn.com/image/fetch/$s_!7d4Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd486fdf8-79de-429b-b453-67a3af15caed_2250x2624.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;">DNS</h2>
      <p>
          <a href="https://blog.bytebytego.com/p/the-path-of-a-request-a-tour-of-modern">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How OpenAI Built Its Data Agent]]></title><description><![CDATA[The hardest part of data analysis isn&#8217;t writing SQL. It&#8217;s finding the right tables to use in the first place and understanding semantically how to use data.]]></description><link>https://blog.bytebytego.com/p/how-openai-built-its-data-agent</link><guid isPermaLink="false">https://blog.bytebytego.com/p/how-openai-built-its-data-agent</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Wed, 03 Jun 2026 14:50:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bcMJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/GitLab_060326">GitLab Transcend is next week. Built for engineers. Free to attend. Don&#8217;t miss it. (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/GitLab_060326" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!25Oz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 424w, https://substackcdn.com/image/fetch/$s_!25Oz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 848w, https://substackcdn.com/image/fetch/$s_!25Oz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 1272w, https://substackcdn.com/image/fetch/$s_!25Oz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!25Oz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:&quot;https://go.bytebytego.com/GitLab_060326&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!25Oz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 424w, https://substackcdn.com/image/fetch/$s_!25Oz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 848w, https://substackcdn.com/image/fetch/$s_!25Oz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 1272w, https://substackcdn.com/image/fetch/$s_!25Oz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F143f795a-d166-4805-8909-4ccba1c6cccb_2048x1075.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>New research. New announcements. A new chapter for GitLab.</p><p>On June 10, GitLab Transcend streams live from London &#8212; and engineers get a first look at GitLab 19 and Duo Agent Platform advancements before anyone else.</p><p>Including a live demo of GitLab Orbit: a knowledge graph across your entire SDLC so your agents know your pipelines, your security backlog, and what shipped last week. Not just your repo.</p><p>Virtual, free, and just days away.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/GitLab_060326&quot;,&quot;text&quot;:&quot;Register now for free&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/GitLab_060326"><span>Register now for free</span></a></p><div><hr></div><p>OpenAI&#8217;s data platform stores 1.5 exabytes across 90,000 datasets and serves ~4,000 internal users as of May 2026. The team has scaled the platform through enormous growth in the last two years. At this scale, the hardest part of data analysis isn&#8217;t writing SQL. It&#8217;s finding the right tables to use in the first place and understanding semantically how to use data. Many tables look similar but mean different things. What&#8217;s the grain of each table? How do you join them against other data? Analysts can spend hours figuring out which tables to use and how to use them before writing a single line of code.</p><p>Last year, OpenAI&#8217;s data platform team built an in-house agent to fix that. The agent is, in their own words, &#8220;pretty vanilla&#8221;, yet it works reliably across the entire ecosystem. And the same investment in Codex that powers the agent has let the team do things most companies consider impossible, like migrating thousands of DAGs, 90,000 tables and 600 petabytes between clouds in two months.</p><p>We spoke with <a href="https://www.linkedin.com/in/emmaytang/">Emma Tang</a>, Head of Data Platform Engineering at OpenAI, about how the agent works, why a simple architecture is enough at this scale due to strong data infrastructure foundations, the lessons for other teams, and where the platform is headed next. Thanks to Emma for taking the time to share the team&#8217;s work in detail.</p><p>In this article, you&#8217;ll learn:</p><ul><li><p>The architecture behind OpenAI&#8217;s data agent, and why &#8220;vanilla&#8221; is the point.</p></li><li><p>The six layers of context that turn a single LLM into a reliable analyst across 90,000 tables.</p></li><li><p>How a question becomes a verified answer in three steps.</p></li><li><p>Three real Codex use cases inside OpenAI: a 10,000 DAG, 90,000-table cross-cloud migration, hands-off open-source patching, and automated support triage.</p></li><li><p>Five practical lessons for any team building a domain agent, and where OpenAI&#8217;s data platform is headed next.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bcMJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bcMJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 424w, https://substackcdn.com/image/fetch/$s_!bcMJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 848w, https://substackcdn.com/image/fetch/$s_!bcMJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!bcMJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bcMJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png" width="1456" height="1881" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1881,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bcMJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 424w, https://substackcdn.com/image/fetch/$s_!bcMJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 848w, https://substackcdn.com/image/fetch/$s_!bcMJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!bcMJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c12a1d2-c330-4c2f-84c1-186b06a7b200_1585x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How OpenAI Built Its Data Agent (High-Level)</figcaption></figure></div><h2>How the Data Agent Works</h2><p>To understand the agent, we will look at three things: what users experience when they ask a question, what architecture supports that experience, and how a request moves through the agent until it returns a verified answer.</p><h3>The User Experience: Ask in Plain English</h3><p>Imagine an engineer or marketer at OpenAI who needs a quick answer. They open Slack and ask their questions in plain English. Moments later, the agent replies with its answer, the SQL it ran, and the tables it pulled from. That&#8217;s OpenAI&#8217;s data agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jmI_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jmI_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 424w, https://substackcdn.com/image/fetch/$s_!jmI_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 848w, https://substackcdn.com/image/fetch/$s_!jmI_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 1272w, https://substackcdn.com/image/fetch/$s_!jmI_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jmI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png" width="1456" height="763" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jmI_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 424w, https://substackcdn.com/image/fetch/$s_!jmI_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 848w, https://substackcdn.com/image/fetch/$s_!jmI_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 1272w, https://substackcdn.com/image/fetch/$s_!jmI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389d1a60-b369-4a66-a1a4-bfda04bc152f_2048x1073.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Agent in Slack (Mockup)</figcaption></figure></div><p>The agent sits across the entire data platform and answers questions in natural language. A user can ask in Slack, in a web portal, in their IDE, or in the Codex CLI through MCP. The agent figures out which tables are relevant, writes SQL, runs it, checks the result, and returns the answer with its reasoning attached.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n00Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n00Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 424w, https://substackcdn.com/image/fetch/$s_!n00Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 848w, https://substackcdn.com/image/fetch/$s_!n00Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 1272w, https://substackcdn.com/image/fetch/$s_!n00Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n00Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png" width="1456" height="692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:692,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n00Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 424w, https://substackcdn.com/image/fetch/$s_!n00Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 848w, https://substackcdn.com/image/fetch/$s_!n00Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 1272w, https://substackcdn.com/image/fetch/$s_!n00Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3a5b7c2-1fe0-4f5d-aa8e-61d372b1374a_2048x973.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Agent&#8217;s Entry Points and Query Loop</figcaption></figure></div><p>Doing all of this reliably across 90,000 tables sounds like it would need a complex system. The team&#8217;s approach is the opposite of what most people expect. The agent itself is simple. The reliability comes from the engineering around it: careful data acquisition that gives the agent the right context before it ever sees a question. The next sections look at how the agent is built to get that context right.</p><h3>The Architecture: Simple by Design</h3><p>OpenAI&#8217;s architecture is intentionally simple. Before diving deeper into the architecture, it helps to first understand the basic patterns behind agentic systems.</p><p>The basic pattern behind the data agent is an LLM plus a harness.  The LLM provides the reasoning. The harness provides the tools and the agentic loop that turns reasoning into action.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SojY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SojY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 424w, https://substackcdn.com/image/fetch/$s_!SojY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 848w, https://substackcdn.com/image/fetch/$s_!SojY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!SojY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SojY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png" width="1456" height="879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:879,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SojY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 424w, https://substackcdn.com/image/fetch/$s_!SojY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 848w, https://substackcdn.com/image/fetch/$s_!SojY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!SojY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e8b25f8-0471-41d1-b2e1-06f156b6c6b4_2048x1236.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An Agent Combines an LLM with a Harness</figcaption></figure></div><p>The reason you need a harness is that an LLM by itself can only predict the next token. It knows a lot, but it cannot run a SQL query or act on the result. The harness fills that gap. It gives the model tools it can call, like a database query interface, assembles relevant context, and runs the model in a loop so it can reason, act, observe the result, and act again until the task is done.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!83WQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!83WQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 424w, https://substackcdn.com/image/fetch/$s_!83WQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 848w, https://substackcdn.com/image/fetch/$s_!83WQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!83WQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!83WQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png" width="1456" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!83WQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 424w, https://substackcdn.com/image/fetch/$s_!83WQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 848w, https://substackcdn.com/image/fetch/$s_!83WQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!83WQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb14361e2-cf6e-46a7-93c9-74fcff90ea30_2048x1058.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Agent Input-Output</figcaption></figure></div><p>Many agent systems become complicated at this point as shown in the figure below. A team might add a router that sends easy questions to a small, cheap model and hard ones to a larger model. It might mix multiple LLMs, fine-tune models on internal data, or build complex retrieval pipelines with different embedding models for different content types. Each choice can help, but each also adds cost, latency, and more ways for the system to fail.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dbZ-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dbZ-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 424w, https://substackcdn.com/image/fetch/$s_!dbZ-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 848w, https://substackcdn.com/image/fetch/$s_!dbZ-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 1272w, https://substackcdn.com/image/fetch/$s_!dbZ-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dbZ-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png" width="1456" height="996" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:996,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dbZ-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 424w, https://substackcdn.com/image/fetch/$s_!dbZ-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 848w, https://substackcdn.com/image/fetch/$s_!dbZ-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 1272w, https://substackcdn.com/image/fetch/$s_!dbZ-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fef97db-73ac-4015-87b3-4326ea5a86a4_2048x1401.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Agent Harness Gets Complex Quickly</figcaption></figure></div><p>OpenAI&#8217;s data team took a different approach. They found that a simple architecture works well at their scale, backed by their robust and unified data platform foundation. The data agent they developed consists of four main components: a single LLM, a context assembly layer, a carefully curated set of tools, and an agent runtime.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BKUw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BKUw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 424w, https://substackcdn.com/image/fetch/$s_!BKUw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 848w, https://substackcdn.com/image/fetch/$s_!BKUw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 1272w, https://substackcdn.com/image/fetch/$s_!BKUw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BKUw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png" width="1456" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BKUw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 424w, https://substackcdn.com/image/fetch/$s_!BKUw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 848w, https://substackcdn.com/image/fetch/$s_!BKUw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 1272w, https://substackcdn.com/image/fetch/$s_!BKUw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4575cb38-5fb8-4847-96e9-018963d8e25f_2048x1212.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI&#8217;s Data Agent Architecture</figcaption></figure></div><p><strong>LLM.</strong> The data agent uses GPT-5.5 as the foundation model for every request. The team relies on the model to produce the right SQL queries, inspect the results, correct the queries, and reason its way to a verified answer.</p><p><strong>Runtime.</strong> The runtime is the orchestrator that drives each request. An LLM on its own only emits text, so something has to act on what it produces. The runtime parses the model&#8217;s output, dispatches the requested calls to tools, feeds the results back into the model, and repeats this loop so the model can reason, act, observe, and act again until the task is done.</p><p><strong>Context Assembly.</strong> This is where the real engineering work lives. A strong model still produces wrong answers without the right context. A bare schema is not enough to tell tables apart. For example, two tables may both have a user_id column and look almost identical, yet one includes logged-out users and the other does not. From the schema alone, the model cannot tell which table answers the question, and picks the wrong one.</p><p>To build a richer context, the team identified the signals that actually help the model decide which tables to use and what query to generate: a table&#8217;s schema and how people have queried it, notes from the people who own it, and what the pipeline code reveals about how it is built.</p><p>Building on these signals, the agent relies on six layers to assemble the right context when a user query arrives:</p><ul><li><p><strong>Table usage metadata. </strong>The table&#8217;s schema, its lineage, and a history of how people have queried it. Not all queries are equally useful. Queries from popular dashboards written by data scientists rank highest because they tend to be correct and reusable. One-off, exploratory queries rank lower.</p></li><li><p><strong>Human annotations. </strong>Curated descriptions written by table owners that capture business meaning, ownership, criticality, and known caveats that cannot be inferred from schemas or past queries.</p></li><li><p><strong>Codex enrichment.</strong> A nightly Codex job crawls the pipeline code that produces each table. It runs in batches of 100 to 200 tables, with each table taking 5 to 10 minutes. By reading the code, it captures what a table actually contains, how it is derived, how fresh it is, and when to use it instead of a similar table.</p></li><li><p><strong>Institutional knowledge.</strong> A lot of context about the company&#8217;s data lives outside the warehouse, in Slack threads, Google Docs, and Notion pages. These documents are ingested and embedded separately, and served through an access-controlled retrieval service, so the agent never surfaces documents a user is not allowed to see.</p></li><li><p><strong>Memory.</strong> Corrections and learnings the agent has saved from prior conversations, scoped at global or personal level.</p></li><li><p><strong>Runtime context.</strong> When the offline context is missing or stale, the agent queries the warehouse directly, and can also talk to other platform systems like Airflow and Spark to fill the gap.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!REgD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!REgD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 424w, https://substackcdn.com/image/fetch/$s_!REgD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 848w, https://substackcdn.com/image/fetch/$s_!REgD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!REgD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!REgD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png" width="1456" height="988" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:988,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!REgD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 424w, https://substackcdn.com/image/fetch/$s_!REgD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 848w, https://substackcdn.com/image/fetch/$s_!REgD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!REgD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b49023-3589-4ea3-9f1e-71077efc5cc4_2048x1390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI Data Agent&#8217;s Layers of Context</figcaption></figure></div><p>The first three layers, table usage metadata, human annotations, and Codex enrichment, are the ones that describe a table. A daily offline pipeline merges them into a single description per table, and an embedding model embeds that description into one vector per table, stored for retrieval. At runtime, when a question comes in, the tables whose descriptions best match the question are retrieved to be included in the context.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c-Bz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c-Bz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 424w, https://substackcdn.com/image/fetch/$s_!c-Bz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 848w, https://substackcdn.com/image/fetch/$s_!c-Bz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Bz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c-Bz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png" width="1456" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c-Bz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 424w, https://substackcdn.com/image/fetch/$s_!c-Bz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 848w, https://substackcdn.com/image/fetch/$s_!c-Bz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 1272w, https://substackcdn.com/image/fetch/$s_!c-Bz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb0d8753-5895-435c-bdb2-97a210e0c41d_2048x1235.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Offline Indexing and Runtime Retrieval of Table Descriptions</figcaption></figure></div><p>Memory is the other source to assemble the context from. It holds corrections and learnings saved from past conversations, applied on top of the retrieved descriptions so the agent starts from a more accurate baseline instead of repeating old mistakes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RWmG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RWmG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 424w, https://substackcdn.com/image/fetch/$s_!RWmG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 848w, https://substackcdn.com/image/fetch/$s_!RWmG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 1272w, https://substackcdn.com/image/fetch/$s_!RWmG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RWmG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png" width="1456" height="1129" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1129,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RWmG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 424w, https://substackcdn.com/image/fetch/$s_!RWmG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 848w, https://substackcdn.com/image/fetch/$s_!RWmG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 1272w, https://substackcdn.com/image/fetch/$s_!RWmG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13e9457b-50f6-4b51-be23-30b3771d462d_2048x1588.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Memory structure: Global vs. personal memory</figcaption></figure></div><p>The Figure above shows the overall design of context assembly. Retrieval over the table descriptions identifies the relevant tables, and relevant memory is pulled in as additional context. The last two layers fill the gaps the table store cannot. Institutional knowledge is embedded and retrieved through its own access-controlled service, and runtime context is pulled live from the warehouse when the offline description is missing or stale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zQ6F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zQ6F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 424w, https://substackcdn.com/image/fetch/$s_!zQ6F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 848w, https://substackcdn.com/image/fetch/$s_!zQ6F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!zQ6F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zQ6F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png" width="1456" height="1486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zQ6F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 424w, https://substackcdn.com/image/fetch/$s_!zQ6F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 848w, https://substackcdn.com/image/fetch/$s_!zQ6F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!zQ6F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b978793-a9ab-4db3-8ee1-33da1c0f0bb0_2007x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">End-to-end Context Assembly</figcaption></figure></div><p><strong>Tools.</strong> The agent has access to a small, curated set of 13 tools. These cover company context lookups, internal knowledge bases, big data systems like Airflow and Spark, and metadata services.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4UHT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4UHT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 424w, https://substackcdn.com/image/fetch/$s_!4UHT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 848w, https://substackcdn.com/image/fetch/$s_!4UHT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 1272w, https://substackcdn.com/image/fetch/$s_!4UHT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4UHT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png" width="1456" height="740" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:740,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4UHT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 424w, https://substackcdn.com/image/fetch/$s_!4UHT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 848w, https://substackcdn.com/image/fetch/$s_!4UHT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 1272w, https://substackcdn.com/image/fetch/$s_!4UHT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94fc4d0-3e33-4940-a5e0-dac5785a41eb_2048x1041.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Agent&#8217;s Category of Tools</figcaption></figure></div><p>The agent uses them to fetch the information it needs to answer a question and verify its work.</p><p>The four components described above are the whole architecture of the data agent. There is no router, no fine-tuning, and no special post-training. Every question goes to the same model. According to Emma, the simplicity of the data agent is by design. The real engineering work happens at the infrastructure layer, which builds the right foundation for context assembly.</p><h3>The Request Flow: From Question to Verified Answer</h3><p>With the architecture in place, the next question is what happens when a user actually asks something. A question arrives in plain English. The agent&#8217;s job is to turn that question into the right context, run it, and return a verified answer. This happens in the following three steps.</p><p><strong>Step 1: Embed the Question.</strong> The user&#8217;s question is converted into a vector using the same embedding model the team used to embed table descriptions offline. This vector is what the retrieval step searches against.</p><p><strong>Step 2: Assemble the Context.</strong> The context assembly layer searches the vector store for the table descriptions that best match the question, combining semantic search with exact text matching. It also retrieves relevant institutional knowledge from its own access-controlled service, and adds any relevant memory to the context.</p><p><strong>Step 3: Start the Agent Loop.</strong> The agent sends the assembled context to the LLM and puts it in a loop so it can write a SQL query, look at what comes back from the tool execution, and try again until the answer is correct.</p><p>That is the full flow. Three steps from question to verified answer. What makes the agent reliable is the quality of the context that flows through the three steps, which depends on the quality of the underlying infrastructure and how easy it is for the model to reason about. That comes from the six data layers prepared before any user asks a question. This is how the entire company can rely on the agent for critical workloads every day.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Rzh0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rzh0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Rzh0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Rzh0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Rzh0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rzh0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png" width="1456" height="1621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1621,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rzh0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Rzh0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Rzh0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Rzh0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6702f751-7f90-4893-a4c0-3602d23ecd19_1840x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From Question to Verified Answer in Three Steps</figcaption></figure></div><p>For other teams, the useful part of this story is that most components used to build the data agent are available to anyone. GPT-5.5 is on the API. OpenAI&#8217;s embedding API is public. Codex is public. MCP is an open protocol. The data platform team did not have access to anything a serious engineering team could not get. What they had was a unified, clean, and robust foundation, a carefully engineered context layer, and a willingness to keep the agent itself simple.</p><h2>How OpenAI Uses Codex Internally: 3 Use Cases</h2><p>As pointed out in the previous section, the data platform uses Codex to read pipeline code each night, enriching the context the agent retrieves. That is one use case of Codex. But Codex supports many other use cases internally, helping run the platform itself.</p><p>Emma pointed out three interesting ones, different in scope but sharing the same pattern:</p><ul><li><p>Migrating 10,000 DAGs, and 90,000 tables in two months</p></li><li><p>Releasing open-source patches without humans</p></li><li><p>Closing the support loop</p></li></ul><h3><strong>Use Case 1: Migrating 10,000 DAGs, and 90,000 tables in two months</strong></h3><p>OpenAI&#8217;s data platform was running out of capacity on one cloud provider. The team needed to move the data estate to a second cloud, fast. The migration involved 90,000 tables and 600 petabytes of data, plus hundreds of thousands of interdependent workloads.</p><p>The hard part of a migration at this scale is not moving the data. It is the dependency graph. Tables form a DAG. Table B depends on table A. Table C depends on table B. You cannot migrate in arbitrary order. During cutover, some tables live on the old cloud while their downstream consumers are already on the new one. At any point during the cutover, the team must know which copy of each table is the authoritative one, so dependent workloads do not read from a stale source. A system was built to replicate data across clouds in the correct direction while the migration is ongoing. This is for dependency graphs that are in the order of O(100k).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!05Wp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!05Wp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!05Wp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!05Wp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!05Wp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!05Wp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!05Wp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 424w, https://substackcdn.com/image/fetch/$s_!05Wp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 848w, https://substackcdn.com/image/fetch/$s_!05Wp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 1272w, https://substackcdn.com/image/fetch/$s_!05Wp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d4a135f-dfab-4da5-be8e-f8872056530d_2048x1085.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Cross-cloud migration before, during, and after cutover</figcaption></figure></div><p>The migration touched hundreds of thousands of workloads, each needing a small code change to point at the new cloud. Filing that many pull requests by hand was not feasible, so Codex generated them instead. Codex Skills then handled the testing and validation for each PR. Around this, the team built a custom system to solve two hard problems: ordering the changes so dependencies migrated in the right sequence, and keeping data consistent while each workload ran against both the old and new cloud during cutover. That system gave Codex the guardrails it needed to operate safely at this scale.</p><p>With a strong team, and Codex doing much of the code changes, the migration finished end to end in roughly two months. Comparable cross-cloud migrations at other companies have run for years.</p><h3><strong>Use Case 2: Releasing open-source patches without humans</strong></h3><p>OpenAI&#8217;s data platform runs on more than a dozen open-source tools, including Spark, Kafka, and Flink. The team keeps its own version of each tool internally, modified with custom patches. Every time a new patch is added, it has to be tested against the existing test suites, validated on staging, and rolled to production.</p><p>This kind of work is critical for the platform&#8217;s reliability, but it is also repetitive and time-consuming. The test suites are long, with some taking hours and others running for days. An engineer used to babysit each release, watching the tests, diagnosing failures, and rolling the patch forward step by step. With more than a dozen forks, that work added up to a meaningful share of the team&#8217;s time.</p><p>The team turned the entire cycle over to Codex. A Codex-powered release agent validates patches against the test suites. It diagnoses failures and suggests fixes when something breaks. It rolls the patch all the way to production and alerts the team about what it did.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xZgW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xZgW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 424w, https://substackcdn.com/image/fetch/$s_!xZgW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 848w, https://substackcdn.com/image/fetch/$s_!xZgW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 1272w, https://substackcdn.com/image/fetch/$s_!xZgW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xZgW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png" width="1456" height="770" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xZgW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 424w, https://substackcdn.com/image/fetch/$s_!xZgW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 848w, https://substackcdn.com/image/fetch/$s_!xZgW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 1272w, https://substackcdn.com/image/fetch/$s_!xZgW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F419ab2eb-d0f3-40cb-951a-136e51c6ff97_2048x1083.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Open-source patch releases before and after Codex</figcaption></figure></div><p>The release agent has now been running end to end for three to four months without human involvement and without a single incident. What used to require an engineer per release now runs unattended.</p><h3><strong>Use Case 3: Closing the support loop</strong></h3><p>A platform that serves 5,500 internal users gets a steady stream of support questions. A pipeline fails. A dashboard breaks. A permission link does not work. Every one of these ends up with the platform team, and each one requires investigation before it can be fixed. At scale, that investigation work used to consume a meaningful share of senior engineering time.</p><p>Codex now handles the part of support that used to require investigation. A support bot fields the common questions first and resolves the easy ones directly. When the bot cannot resolve an issue, the engineer on call hands it off to Codex with minimal context. Codex investigates, finds the fix, and applies it. The engineer reviews and approves.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O-ZL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O-ZL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 424w, https://substackcdn.com/image/fetch/$s_!O-ZL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 848w, https://substackcdn.com/image/fetch/$s_!O-ZL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!O-ZL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O-ZL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png" width="1456" height="845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O-ZL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 424w, https://substackcdn.com/image/fetch/$s_!O-ZL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 848w, https://substackcdn.com/image/fetch/$s_!O-ZL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!O-ZL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe430ea00-36cd-46a1-b97b-73084a39ebf7_2048x1189.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Closing the support loop with Codex</figcaption></figure></div><p>The loop that used to take an engineer a few hours per ticket now lets that engineer dispatch around a hundred fixes per day. The work is not easier. The engineer is amplified.</p><h2>What Other Teams Can Learn From OpenAI&#8217;s Data Agent</h2><p>OpenAI&#8217;s data agent works, but the architecture itself is not what most teams should copy. What is worth borrowing is the set of decisions the team made when they built it. Emma shared five main lessons that apply to any team building a similar system.</p><h3>The data foundation matters more than the agent</h3><p>A coding agent has one source of truth: the repository. A data agent&#8217;s source of truth is the whole company. Every system, every siloed data store, every team&#8217;s conventions, every table defined outside a unified codebase. If none of that is legible to a model, no agent architecture will save you.</p><p>OpenAI&#8217;s data is well structured. They&#8217;ve built best in class industry standard infrastructure across compute, orchestration, metadata management, storage technology, and more. There are no duplicated technologies, and the data lake is unified. Every table on the platform is produced by code in a single monorepo, and data engineering teams enforce conventions and police duplicate or unclear columns along the way. On top of that, every table has strong annotations with the owner, how critical it is, and how fresh the data should be. None of this is glamorous work, but it is what makes a vanilla agent reliable at exabyte scale.</p><p>If your team is considering building a data agent and your data is scattered or inconsistent, the agent is not the first investment. The foundation is.</p><h3>Fewer tools beat more tools</h3><p>The team initially connected the agent to around 40 tools, including metadata systems, orchestration tools, and big data systems. The results were bad. The model picked the wrong tool and got confused by overlapping answers from tools that did similar things.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z8F_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z8F_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 424w, https://substackcdn.com/image/fetch/$s_!Z8F_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 848w, https://substackcdn.com/image/fetch/$s_!Z8F_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!Z8F_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z8F_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png" width="1456" height="845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z8F_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 424w, https://substackcdn.com/image/fetch/$s_!Z8F_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 848w, https://substackcdn.com/image/fetch/$s_!Z8F_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!Z8F_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ae8ff1-56f9-4c64-ac98-876fca556e9c_2048x1189.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Too many overlapping tools return conflicting answers</figcaption></figure></div><p>Capping the agent at around 13 tools per call, and removing overlapping ones, fixed the problem. The lesson is that an agent does not need access to every system in the company. It needs access to the right ones, with no two tools doing the same job.</p><p>In practice, that means avoiding overlap. If two metadata services expose similar information, the agent should only see one. If two ways exist to look up table ownership, pick one. The model is better at reasoning than at choosing between near-duplicate tools.</p><h3>Pick trusted queries for retrieval</h3><p>A natural first instinct for a data agent is to embed all historical queries and use them as context. OpenAI tried this approach, and it did not work. Most queries in any company are exploratory one-offs, not canonical examples of how a table should be used.</p><p>The team improved results by ranking past queries by how trustworthy they are. The agent learns from queries the company has already written, but not all of them are worth imitating. Queries behind heavily used dashboards, usually written by data scientists, rank highest, because they tend to be correct and reused often. One-off queries written for a single analysis and never run again rank lowest. Once the team ranked queries this way, the model started copying the good patterns instead of the bad ones.</p><p>The general lesson is that retrieval quality depends on the quality of what you retrieve. What you feed into retrieval is what you get back from it.</p><h3>Guide the goal, not the path</h3><p>Prescriptive prompts hurt the agent&#8217;s results. The team experimented with detailed step-by-step instructions about how the agent should approach each kind of question. The agent followed the instructions and produced worse answers. High-level guidance worked better. Tell the model what the goal is. Let it reason about how to get there. Give it the right context and the right tools, and trust the reasoning.</p><p>This matches what other teams building agents have found. Modern models are good at planning when they have good information. They are less good at being told what to plan.</p><h3>Be more ambitious</h3><p>The cross-cloud migration was supposed to be impossible in months. The team&#8217;s initial estimate was longer. Emma pushed for two months because the company was running out of capacity and a longer timeline was not an option. The team hit the deadline. Comparable migrations at other companies can take years.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YPYK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YPYK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 424w, https://substackcdn.com/image/fetch/$s_!YPYK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 848w, https://substackcdn.com/image/fetch/$s_!YPYK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 1272w, https://substackcdn.com/image/fetch/$s_!YPYK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YPYK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png" width="1456" height="978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:978,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YPYK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 424w, https://substackcdn.com/image/fetch/$s_!YPYK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 848w, https://substackcdn.com/image/fetch/$s_!YPYK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 1272w, https://substackcdn.com/image/fetch/$s_!YPYK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F508fdfae-08d3-4bf7-b27b-a00f67cd3f4a_2048x1375.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Compounding Gains with Agents</figcaption></figure></div><p>The takeaway the team came away with is that timeline estimates from before Codex no longer apply. If a project sounds like it should take a year, the right question is whether it could be done in a quarter with an agent doing most of the work. The bigger risk is not over-promising; it&#8217;s playing it safe. A team that sticks to old timelines never finds out what the new tools make possible.</p><h2>What Comes Next for OpenAI&#8217;s Data Platform</h2><p>The data agent and the internal Codex use cases are not the end of the story. Emma says two things sit on the team&#8217;s near horizon: custom apps, and platforms keeping pace with users.</p><p><strong>Custom apps generated per question.</strong></p><p>Most analytics tools today give users a fixed set of widgets, like bar charts, line charts, and pivot tables. They are useful, but limited. If your question does not fit the available widgets, you have to write a custom script or file a request with the data team.</p><p>OpenAI wants to go further. The agent already builds traditional dashboards on demand, but the next step is freeform ones. Instead of a chart, Codex would build a full React app connected to a backing store and tailored to the exact question asked. Each one takes seconds to build, fits a single user&#8217;s need, and runs on real data with real guardrails.</p><p>When this rolls out, users will no longer pick from a fixed set of widgets. They will describe what they want and the app will appear, generated per question. A marketer who wants to explore campaign performance with a custom filter and a tailored layout will simply ask.</p><p><strong>Users move faster than platforms can keep up.</strong></p><p>The same Codex that powers the data agent has accelerated every team at OpenAI. Frontend engineers vibe-code new UIs in a morning. Researchers spin up custom pipelines on demand. Platform teams cannot move at that speed safely. A bad UI affects a few users, but a bad change to shared infrastructure can take the whole company offline.</p><p>This leads to a mismatch. Users now ship code to the platform faster than the team can review and validate it, and some of that code is written by people who do not fully understand what it does. Emma described examples like when a bad Flink job lands on the cluster and brings it down, upon asking the user replies, &#8220;I don&#8217;t know, I don&#8217;t know how Flink works, it&#8217;s vibe-coded. Can you help fix it?&#8221;</p><p>This is the next problem the data platform team plans to work on. The fix will not be another user-facing agent, but agents on the platform side, designed to triage incoming code, validate it before it runs, and absorb the deluge from AI-amplified users. The previous wave of agents helped users do more. The next wave will help platforms keep up.</p>]]></content:encoded></item><item><title><![CDATA[A Practical Guide to Becoming an AI-Native Engineer]]></title><description><![CDATA[This piece is a working guide for engineers who want to land on the productive side of that split.]]></description><link>https://blog.bytebytego.com/p/a-practical-guide-to-becoming-an</link><guid isPermaLink="false">https://blog.bytebytego.com/p/a-practical-guide-to-becoming-an</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Tue, 02 Jun 2026 14:31:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!I3y0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong><a href="https://go.bytebytego.com/You_060226">New Year, New Metrics: Evaluating AI Search in the Agentic Era (Sponsored)</a></strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/You_060226" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8ZPR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 424w, https://substackcdn.com/image/fetch/$s_!8ZPR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 848w, https://substackcdn.com/image/fetch/$s_!8ZPR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 1272w, https://substackcdn.com/image/fetch/$s_!8ZPR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8ZPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1432342,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/You_060226&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/183299050?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8ZPR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 424w, https://substackcdn.com/image/fetch/$s_!8ZPR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 848w, https://substackcdn.com/image/fetch/$s_!8ZPR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 1272w, https://substackcdn.com/image/fetch/$s_!8ZPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e07ebdc-da60-480c-874b-162a215a186b_1600x840.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most teams pick a search provider by running a few test queries and hoping for the best &#8211; a recipe for hallucinations and unpredictable failures. <a href="https://go.bytebytego.com/You_060226">This technical guide</a> from <a href="https://go.bytebytego.com/You_060226">You.com</a> gives you access to an exact framework to evaluate AI search and retrieval.</p><p><strong>What you&#8217;ll get:</strong></p><ul><li><p>A four-phase framework for evaluating AI search</p></li><li><p>How to build a golden set of queries that predicts real-world performance</p></li><li><p>Metrics and code for measuring accuracy</p></li></ul><p>Go from &#8220;looks good&#8221; to proven quality.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/You_060226&quot;,&quot;text&quot;:&quot;Learn how to run an eval&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/You_060226"><span>Learn how to run an eval</span></a></p><div><hr></div><p style="text-align: justify;">Few people in tech have a clearer view of AI-native engineering at hyperscale than Shah Rahman. As Global Head of Autonomous ML Iteration &amp; Optimization for Ads at Meta, Shah spends his days architecting AI-native infrastructure and multi-agent systems that make ML iteration reliable across one of the largest production environments on the planet.</p><p style="text-align: justify;">In the piece below, Shah cuts through the &#8220;everyone is an engineer now&#8221; noise and lays out what AI-native engineering actually requires: context engineering, spec-driven development, critical verification, and disciplined problem decomposition. He walks through the Agentic Development Life Cycle, the journey that separates real 10x leverage from &#8220;faster failure,&#8221; and the security guardrails that are no longer optional.</p><p style="text-align: justify;">If you&#8217;re moving your engineering org toward becoming AI-native, this is a strong playbook.</p><p style="text-align: justify;">Let&#8217;s get into it.</p><p style="text-align: justify;">For more from Shah, connect with him on <a href="https://www.linkedin.com/in/shahirahman/">LinkedIn</a>.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I3y0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I3y0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 424w, https://substackcdn.com/image/fetch/$s_!I3y0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 848w, https://substackcdn.com/image/fetch/$s_!I3y0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!I3y0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I3y0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png" width="1456" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I3y0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 424w, https://substackcdn.com/image/fetch/$s_!I3y0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 848w, https://substackcdn.com/image/fetch/$s_!I3y0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!I3y0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bcec742-8919-4c38-9f8b-6697fa0b6423_2048x1076.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">AI generates more than 75% of Google&#8217;s new code. OpenAI and Anthropic claim that almost every line of fresh code that they produce comes from AI. Amazon recently migrated 30,000 of its production applications from Java 8 to Java 17 in a matter of months, a project that would otherwise have taken an estimated 4,500 developer-years.  And Mark Zuckerberg expects that AI agents will be operating as mid-level engineers by the end of 2026.</p><p style="text-align: justify;">Reading those statements, we may feel as if we are looking at the last lines being written on the closing pages of an era. Perhaps even the closing pages of a <em>profession</em>.</p><p style="text-align: justify;"><strong>But here&#8217;s the question: If AI writing everything is the answer, then why are most engineering teams shipping more bugs, more incidents, and more technical debt than they shipped two years ago?</strong></p><p style="text-align: justify;">In an April article in the <em>New York Times</em>, Mike Isaac and Erin Griffith gave a name to describe what&#8217;s happening across the industry. They called it <a href="https://www.nytimes.com/2026/04/06/technology/ai-code-overload.html">code overload</a>.</p><p style="text-align: justify;">The essence of code overload, according to Isaac and Griffith, is that &#8220;tech workers are producing so much code so quickly that it has become too much to handle.&#8221; Teams that have rebuilt their work around the use of AI agents are drowning in code churn and security holes.</p><p style="text-align: justify;">But. Many engineers who have employed AI agents are pulling ahead of the field, achieving real productivity gains. They are using the same models and the same tools, but they are generating very different outcomes. What explains the gap?</p><p style="text-align: justify;">It comes down to one decision. Real productivity gains come when engineers decide to make the leap from writing code to <em>orchestrating it</em>. This piece is a working guide for engineers who want to land on the productive side of that split. It will cover the practices, guardrails, and mindset shifts that separate AI-native engineering from vibe coding and from the everyday chaos that most teams are now generating at scale.</p><h2>From Engineer to Orchestrator</h2><p style="text-align: justify;">Let me first clarify one thing: engineers are not becoming obsolete. Coding has always been a small part of engineering (20-30% max). This underappreciated reality is more visible when AI agents and tools produce more code, but more code is not necessarily more productive (often it&#8217;s less). This is a critical distinction that the industry is blurring dangerously, and I state that as:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OJwN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OJwN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 424w, https://substackcdn.com/image/fetch/$s_!OJwN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 848w, https://substackcdn.com/image/fetch/$s_!OJwN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 1272w, https://substackcdn.com/image/fetch/$s_!OJwN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OJwN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png" width="1456" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OJwN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 424w, https://substackcdn.com/image/fetch/$s_!OJwN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 848w, https://substackcdn.com/image/fetch/$s_!OJwN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 1272w, https://substackcdn.com/image/fetch/$s_!OJwN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebc9614-2da6-41ac-8f94-858583a3fedc_2048x1108.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">When Andrej Karpathy coined &#8220;vibe coding&#8221; in early 2025, it captured something useful &#8212; the ability for non-engineers to build functional software by describing what they want. That democratization is valuable. But it&#8217;s categorically different from professional AI-native engineering.</p><p style="text-align: justify;">AI-native engineering means commanding and mastering available and emerging AI agents and tools to engineer things that weren&#8217;t possible in the pre-AI era. Knowing how to code remains a fundamental expectation. Without that knowledge, you can build systems using AI &#8212; and that&#8217;s vibe coding. It has its place, but it&#8217;s not engineering.</p><p style="text-align: justify;">The AI-native engineer operates as an <strong>orchestrator</strong> &#8212; someone who can turbocharge 10x engineering into 100x output through proper orchestration of AI agents. And that bar continues to rise weekly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5zAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5zAd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 424w, https://substackcdn.com/image/fetch/$s_!5zAd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 848w, https://substackcdn.com/image/fetch/$s_!5zAd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 1272w, https://substackcdn.com/image/fetch/$s_!5zAd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5zAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png" width="1456" height="901" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:901,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5zAd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 424w, https://substackcdn.com/image/fetch/$s_!5zAd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 848w, https://substackcdn.com/image/fetch/$s_!5zAd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 1272w, https://substackcdn.com/image/fetch/$s_!5zAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F597222bc-27ce-4625-a2aa-4b4a452197c4_2048x1268.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Four Core Practices</h3><h4>1. Synchronized Context Engineering</h4><p style="text-align: justify;">Emerging as a distinct discipline, this is the single most important skill for AI-native engineers. Context engineering means the systematic curation and injection of project-specific information into AI working memory: <em>architectural diagrams</em>, <em>coding standards</em>, <em>business rules</em>, <em>team conventions</em>, and d<em>evelopment workflows</em> that are reusable and standardized across your team members.</p><p style="text-align: justify;">This shifts basic  &#8220;prompt engineering&#8221; to sophisticated &#8220;context engineering&#8221; reflecting a deeper understanding: the quality of AI output is bounded by the quality of context it receives. Teams practicing rigorous context engineering report 40&#8211;50% speed increases and dramatically reduced alignment overhead.</p><p style="text-align: justify;">As context engineering matures, Anthropic&#8217;s MCP &#8212; described as &#8220;USB-C for AI&#8221; &#8212; continues to be a universal standard for connecting agents to external tools and data sources. Context files like CLAUDE.md have become core infrastructure, not optional documentation. This persistent, evolving knowledge layer makes agents genuinely useful within your specific codebase, once you master developing and maintaining this critical context.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q-ip!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q-ip!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 424w, https://substackcdn.com/image/fetch/$s_!q-ip!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 848w, https://substackcdn.com/image/fetch/$s_!q-ip!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 1272w, https://substackcdn.com/image/fetch/$s_!q-ip!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q-ip!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png" width="1456" height="1353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1353,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q-ip!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 424w, https://substackcdn.com/image/fetch/$s_!q-ip!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 848w, https://substackcdn.com/image/fetch/$s_!q-ip!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 1272w, https://substackcdn.com/image/fetch/$s_!q-ip!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88d553b7-fcfd-4c20-b8d9-8db87b0d35ec_2048x1903.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>2. Specification-Driven Development</h4><p style="text-align: justify;">The quality of AI-generated code matches the quality of input specifications. Garbage in, garbage out &#8212; this principle applies with even more force when AI can generate garbage at unprecedented speed and volume.</p><p style="text-align: justify;">Random prompting and vibe coding consistently underperform spec-driven workflows. AI agents get stuck in circular reasoning without clear specifications and instructions that are contained and well-defined. Consider this discipline: define what you want before asking AI to build it, break problems into discrete milestones with clear success criteria, and execute incrementally with validation at each checkpoint. Make sure the agent checks all open Qs with you and doesn&#8217;t run off on its own to find answers.</p><h4>3. Critical Verification</h4><p style="text-align: justify;">AI-generated code quality approximates that of early-career developers. Research consistently shows that around 45% of AI-generated code contains security flaws. A Stanford study found that developers using AI assistants wrote significantly less secure code and were more confident it was secure &#8212; a dangerous combination.</p><p style="text-align: justify;">Meanwhile, a striking METR/Anthropic randomized controlled trial found experienced open-source developers were actually 19% slower when using AI assistants on familiar codebases. The culprit? Over-reliance without adequate verification. A GitClear study found AI-assisted codebases showed increased &#8220;code churn&#8221; &#8212; code written and then quickly revised or deleted &#8212; suggesting raw output is a poor proxy for productivity.</p><p style="text-align: justify;">In the AI-native era, the bottleneck has permanently shifted from writing code to proving that it works at scale, with reliability and security. When AI generates code quickly, review, testing, and verification of that code become the new rate-limiting factors, and verification now becomes non-negotiable.</p><h4>4. Problem Decomposition</h4><p style="text-align: justify;">Avoid over-trusting AI with large, complex problems. Break tasks into AI-manageable chunks where humans handle edge cases, custom logic, and domain-specific aspects while AI agents handle the 70&#8211;80% of routine implementation. Complex problems lead to context pollution and slop generation that AI agents really struggle to recover from. Compacting and summarizing when context is polluted and shifting to a different session helps, but this discontinuity can be damaging for long-horizon tasks.. Many of us wasted hours, if not days, due to not decomposing and stubbornly confusing agents about expectations outside of a well-defined context, reasonable specifications, and a lack of verification guardrails.</p><h3>Time Allocation for AI-Native Work</h3><p style="text-align: justify;">I recommend the optimal split of: 40% context-setting, 20% generation and testing iteration, 40% reviewing and verification. This surprises many developers who spend most of their time in code generation. In practice, the generation step is fast; the verification and context work become the new time sink.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hW8V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hW8V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 424w, https://substackcdn.com/image/fetch/$s_!hW8V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 848w, https://substackcdn.com/image/fetch/$s_!hW8V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!hW8V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hW8V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png" width="1456" height="1001" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1001,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hW8V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 424w, https://substackcdn.com/image/fetch/$s_!hW8V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 848w, https://substackcdn.com/image/fetch/$s_!hW8V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!hW8V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94d84be3-4268-40a3-ac09-392c2ea26e64_2048x1408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Individual Transformation Journey</h2><h3>Phase 1: Foundation &#8212; should only take a couple of weeks</h3><p style="text-align: justify;">Begin with one primary AI assistant &#8212; pick your favorite one: Codex, Claude Code, or Cursor. Dive deep and build intuition for its capabilities and limitations through daily practice. Set up your workspace, workflow, and initial configurations. You&#8217;ll have to take the leap from the times of manual coding to AI-assisted and AI-generated coding practices. Your goal should be to develop judgment about when AI delivers value versus when it creates more work than it saves. Write down your personal notes, iterate, and build a strong foundation.</p><h3>Phase 2: Integration &#8212; should take a month max</h3><p style="text-align: justify;">Adopt structured prompting frameworks. Create project-specific context files encoding team standards and architectural patterns. Implement the &#8220;Plan first, then Execute and finally review&#8221; workflow: planning mode generates specifications, execution mode implements, and make sure you review after each atomic task. Establish approval gates and guardrails that prevent agent drift. Skipping the review will pile tech debt that you and your agent will both struggle downstream.</p><p style="text-align: justify;">The critical practice here is small loops with verification checkpoints. Evidence shows tight human-in-the-loop cycles with limited scope dramatically outperform large autonomous runs, at least for coding tasks. This may feel counterintuitive and slower &#8212; but it produces dramatically better outcomes in practice. Indulging into somewhat unplanned and speculative autonomous agent runs will likely produce a large volume of slop whose only destiny may be throwaway and start all over again. Avoid that before it happens.</p><h3>Phase 3: Mastery &#8212; live on</h3><p style="text-align: justify;">Deploy AI agents for multi-step, multi-file tasks. Implement AI-assisted code review workflows. Use advanced techniques: multi-agent workflows, parallel sessions, and cross-agent verification loops. Every week, we hear about coding agents advancing on benchmarks and solving problems that were never solved before. Stay on top of those developments and embrace what Claude or Codex inventors are advocating, but adopt those to your needs (don&#8217;t blindly follow as their situations may be wildly different than yours).</p><p style="text-align: justify;"><strong>Target metrics:</strong> 80%+ AI-generated coding rate with less than 20% rewrite rate. Achieve this, and you can pull your team toward the same proficiency level rather fast.</p><h2>Team Transformation: The Cultural Foundation</h2><p style="text-align: justify;">Research shows 70% of transformation success comes from operational and cultural change. These changes call for Organizational and technical leads to actively model transformation through daily AI usage. At the same time, ensure three critical aspects to establish the AI-native cultural foundation:</p><ol><li><p style="text-align: justify;"><strong>Psychological safety</strong> is paramount. MIT research found 83% of leaders believe psychological safety measurably improves AI initiative success. Celebrate &#8220;AI failure stories&#8221; as learning opportunities. Make this deliberate practice, not optional and ensure everyone feels included as part of the collective learning and growing exercise.</p></li></ol><ol start="2"><li><p style="text-align: justify;"><strong>Evolved code</strong> review is essential. AI-generated code volume overwhelms traditional human review processes. Redesign review to distinguish AI-generated versus human code with separate review rubrics. Be especially vigilant about the dangerous combination of AI-generated and AI-reviewed PRs &#8212; these combinations should be explicitly guardrailed and governed, when necessary.</p></li></ol><ol start="3"><li><p style="text-align: justify;"><strong>Shared context</strong> libraries become the core currency. Standardize context files, evaluation sets, and agent configurations across teams. Modern tooling enables easy packaging of context through plugins, skills, and commands &#8212; but watch for uncontrolled proliferation, where teams compete for standardization rather than collaborating. Don&#8217;t let too many team members&#8217; desire to build agents and skills jeopardize your standardized agentic operating environment.</p></li></ol><h2>The Agentic Development Life Cycle (ADLC)</h2><p style="text-align: justify;">Traditional SDLC &#8212; and even extreme agile &#8212; falls short for how AI agents develop software alongside humans. AI-native engineering evolution toward an Agentic Development Life Cycle redefines each phase.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5wbm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5wbm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 424w, https://substackcdn.com/image/fetch/$s_!5wbm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 848w, https://substackcdn.com/image/fetch/$s_!5wbm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!5wbm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5wbm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png" width="1456" height="857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:857,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5wbm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 424w, https://substackcdn.com/image/fetch/$s_!5wbm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 848w, https://substackcdn.com/image/fetch/$s_!5wbm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 1272w, https://substackcdn.com/image/fetch/$s_!5wbm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3c0b138-c25e-4b81-a7a0-b4a951e7df60_2048x1206.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Planning</h3><p style="text-align: justify;">The most critical step. Use deep research and planning modes with multiple agents for parallel exploration. Specify against codebases, flag ambiguities, decompose into subtasks, and estimate difficulty. Create roadmaps with version milestones that help agents follow through incrementally. A planning agent can assemble findings from multiple exploration agents into a coherent implementation strategy. OpenClaw of Claude can run in multiple sub-agent in parallel.</p><h3>Building</h3><p style="text-align: justify;">AI agents handle end-to-end feature implementation like junior or mid-level engineers (at the time of this writing, which I expect to edge up to senior engineers within a year or two). The engineer acts as the tech lead, orchestrating multiple agents rather than coding directly. Sequential or parallel execution models depend on your roadmap and verification plan. The agentic coding tool landscape has matured rapidly &#8212; Claude Code, Cursor&#8217;s Composer mode, GitHub Copilot&#8217;s Agent Mode, and OpenAI&#8217;s Codex agent all support this pattern with varying strengths. There&#8217;s new versions coming out every month -- watch closely for new capabilities.</p><h3>Testing</h3><p style="text-align: justify;">This is TDD reincarnated. Agents write test plans first, then implement code. All tests should fail at the beginning and then incrementally pass. Unit testing at the atomic level, integration testing across features, and end-to-end testing across the system. Don&#8217;t overindex on unit testing at the expense of integration or system testing lack.</p><p style="text-align: justify;"><strong>Pro Tip:</strong> Consider separating planning, building, and testing agents. Each agent swarm specializes and develops a deep understanding of your codebase from a different perspective. Planning agents can challenge building agents who take shortcuts; testing agents who skip coverage; or the review agents who are biased towards incorrect implementations that appear to be correct. Similarly, review agents can hold every other upstream agent accountable for making mistakes or missing steps.</p><h3>Review</h3><p style="text-align: justify;">Deploy agent swarms specializing in key dimensions: <em>functionality</em>, <em>quality</em>, <em>scalability</em>, <em>performance</em>, <em>reliability</em>, <em>security</em>, and <em>privacy</em>. Agents take the first pass and produce reports; humans review each report carefully. When one agent discovers an issue &#8212; say, an injection vulnerability &#8212; apply the generalization principle: if one instance exists, others likely do too and this is your chance to proactively similar a vulnerability type in your code, not just one or two instances.</p><h3>Documentation</h3><p style="text-align: justify;">Move from post-facto documentation to continuous generation. AI agents generate summaries, key design decisions, architectural diagrams, and changelogs in real time. This flows naturally into API documentation, feature collaterals, and customer-facing contents. I&#8217;m quite excited about AI tools finally solving the outdated, stale, and inconsistent documentation problem that I&#8217;ve seen myself and my teams suffer for decades.</p><h3>Codify ADLC</h3><p style="text-align: justify;">Encode your Layer-1 (individual) and Layer-2 (team) practices into maintained, self-evolving context files, skills libraries, and MCP tools. This ensures ADLC adoption scales across the organization rather than remaining tribal knowledge or trapped in parts of the org. Promote the ADLC tooling package.</p><h2>What AI-Native Process Actually Looks Like</h2><p style="text-align: justify;">There&#8217;s a seductive narrative in the industry right now: <em>fewer people, less overhead, faster builds</em>. But this narrative conflates construction costs with decision costs. AI has drastically reduced the cost of building, but that represents only 20&#8211;30% of total development costs, while leaving the cost of deciding what to build and what to cut largely untouched. With the proliferation of code and builders, this becomes a harder problem.</p><p style="text-align: justify;">AI-native process optimization requires redirecting effort from coordinating execution to accelerating learning.</p><h3>The Learning Loop</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BAzE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BAzE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 424w, https://substackcdn.com/image/fetch/$s_!BAzE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 848w, https://substackcdn.com/image/fetch/$s_!BAzE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!BAzE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BAzE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png" width="1456" height="1001" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1001,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BAzE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 424w, https://substackcdn.com/image/fetch/$s_!BAzE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 848w, https://substackcdn.com/image/fetch/$s_!BAzE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 1272w, https://substackcdn.com/image/fetch/$s_!BAzE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0e2a68-f217-476c-b546-76753cb9e48f_2048x1408.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">AI compresses the first step dramatically. But compression value depends entirely on execution quality throughout the remaining cycle. Faster building without robust user observation and scope discipline produces faster divergence from genuine product goals. This results in customers not seeing the benefits of AI acceleration.</p><h3>Where AI Creates Genuine Leverage</h3><p style="text-align: justify;"><strong>Cheaper experimentation.</strong> Test more hypotheses per unit time. Over 70% of features never reach real users. AI makes it trivially cheap to test whether something matters before committing to full development. The discipline: kill non-viable concepts ruthlessly.</p><p style="text-align: justify;"><strong>Faster prototyping for user research.</strong> Working prototypes replace documentation. Tools like Vercel&#8217;s v0, Replit Agent, and Bolt.new enable functional prototypes from natural language in minutes. This produces superior signal quality from user testing. Encourage everyone to prototype aggressively, make it a habit before you build.</p><p style="text-align: justify;"><strong>Automated boilerplate, not automated judgment.</strong> AI handles undifferentiated work: scaffolding, non-novel code, business logic tests, documentation, and data models. Teams focus on differentiated work: core business logic, empathetic user experiences, novel implementations, and the crucial decision of what to keep or kill.</p><p style="text-align: justify;"><strong>The &#8220;design to 50%&#8221; principle.</strong> Ship minimal functionality enabling core user journeys. Observe where users hesitate, misunderstand, or abandon. This reveals actual product challenges rather than imagined ones. AI makes this approach nearly zero cost.</p><h2>Guardrails &#8212; Not Optional</h2><p style="text-align: justify;">The security landscape for AI-generated code has become genuinely alarming. The data shows AI-native development speed is creating new attack surfaces faster than manual security review or traditional tools can address. We observed roughly one new insecure AI integration appearing per week in our environment, many resulting in production incidents. Anthropic&#8217;s Daybreak and Mythos bring a clear wake-up call to security.</p><h3>Real Incidents, Real Consequences</h3><p style="text-align: justify;"><strong>Chat Integration RCE.</strong> Built in two days using AI, achieved Remote Code Execution by bypassing 2FA and exploiting open ACLs. It costs tens of hours to detect, mitigate, and fix.</p><p style="text-align: justify;"><strong>Unauthorized Database Access.</strong> An AI coding agent accessed approximately 1,500 secure, unauthorized database tables without proper authorization, exposing sensitive data to prompt injection risks.</p><p style="text-align: justify;"><strong>Google Docs Prompt Injection.</strong> An AI coding agent achieved Remote Code Execution through prompt injection embedded in a Google Docs document, bypassing input filtering protections entirely.</p><p style="text-align: justify;"><strong>Supply Chain Poisoning.</strong> A new attack vector called &#8220;slopsquatting&#8221; emerged in 2025 &#8212; AI models hallucinate package names that don&#8217;t exist, and attackers register those names with malicious code. Multiple documented incidents have resulted from this.</p><h3>Emerging Security Controls</h3><p style="text-align: justify;">Agent Identity and Access Control. Implement step-up 2FA. Apply the principle of least privilege. No shared credentials or open ACLs. Start with passive, read-only use cases and build confidence before granting read-write or broader access.</p><p style="text-align: justify;"><strong>Data Classification Awareness.</strong> Agents must respect data classifications and sensitive boundaries. &#8220;Agentic Authorization&#8221; is an emerging enterprise challenge where agents bypass restrictions at machine speed that human oversight cannot match.</p><p style="text-align: justify;"><strong>Prompt Injection Protection.</strong> External content &#8212; documents, web pages, user inputs &#8212; can contain hidden instructions that hijack agent behavior. Implement input filtering, content validation, and context sanitization. Never auto-execute untrusted commands. Resist the temptation of auto-accepting all agent suggestions.</p><p style="text-align: justify;"><strong>Infrastructure Sandboxing.</strong> Agent activities must be observable and auditable. Block high-risk production surfaces &#8212; configurations, critical execution, critical storage &#8212; until controls are verified. Use sandboxing and OS-level enforcement.</p><h3>Technical Guardrails</h3><p style="text-align: justify;"><strong>Static analysis integration.</strong> Data shows roughly 30% of Python and 25% of JavaScript AI-generated snippets contain security weaknesses. Centralize advanced static analysis in CI/CD pipelines. Require mandatory human review for critical functions: authentication, payments, and PII handling.</p><p style="text-align: justify;"><strong>Automated quality gates.</strong> Implement &#8220;Ralph Loops&#8221;, OpenClaw, or another form of autonomous loops &#8212; iterative verification until success criteria are met. Type checking, linting, and test execution before diff submission. Multi-stage canary systems with stringent gates before production deployment.</p><p style="text-align: justify;"><strong>Skills-based security.</strong> Where the agents are taught secure coding patterns, flagging common vulnerabilities during generation rather than after. Shift left, but with agents.</p><h3>Organizational Guardrails</h3><p style="text-align: justify;"><strong>Skill atrophy prevention.</strong> Gartner reports 50% of organizations will require &#8220;AI-free&#8221; skills assessments by 2026. Treat AI as a learning tool &#8212; request explanations alongside generated code. Occasionally, work without AI to preserve foundational abilities. The goal isn&#8217;t Luddism; it&#8217;s insurance against the day your AI tools are unavailable or producing subtly wrong, but potentially fatal results.</p><p style="text-align: justify;"><strong>The productivity paradox.</strong> Individual productivity gains from AI tools often fail to materialize at the team and company levels. Focus on end-to-end cycle time and feature velocity, not coding speed alone. Adding AI to broken processes yields broken processes that generate more code, faster.</p><h2>The Engineer of 2026 and Beyond</h2><p style="text-align: justify;">The engineers thriving in the new environment treat AI as a collaborative partner for execution while maintaining the systems thinking, critical judgment, and communication skills that no AI can replicate. AI amplifies existing expertise rather than replacing it &#8212; senior engineers achieve dramatically better results because they bring deeper context and sharper judgment.</p><p style="text-align: justify;">Your domain expertise is the key differentiator in AI-native productivity. No AI tool or agent can replace it. So, invest in sharpening your domain skills, whether that&#8217;s math, science, finance, health science, or a legal profession. Continuing to uplevel your engineering fundamentals pay recurring dividends in AI effectiveness.</p><p style="text-align: justify;">This is a multi-year transformation, not a one-off tool adoption. Teams treating it as a tooling upgrade consistently fail to realize productivity gains. The organizations that succeed are the ones treating AI-native engineering as a new way of working &#8212; with new practices, new disciplines, and new definitions of what &#8220;amazing&#8221; looks like.</p><p style="text-align: justify;"><strong>Are you building this way yet? If not, ping me, and I&#8217;m happy to <a href="https://www.linkedin.com/in/shahirahman/">have a chat</a>.</strong></p><div><hr></div><p style="text-align: justify;">This is Part 1 of a two-part series. Part 2, &#8220;AI-Native Leaders,&#8221; covers the organizational transformation, leadership models, and measurement frameworks required to make AI-native engineering work at scale.</p>]]></content:encoded></item><item><title><![CDATA[How DoorDash Built a Testing System to Evaluate LLMs]]></title><description><![CDATA[In this article, we will learn how they built this flywheel and the key takeaways.]]></description><link>https://blog.bytebytego.com/p/how-doordash-built-a-testing-system</link><guid isPermaLink="false">https://blog.bytebytego.com/p/how-doordash-built-a-testing-system</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Sat, 30 May 2026 15:30:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!L2Ta!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/Datadog_060126">How to Track AI ROI in Real Time (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Datadog_060126" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eGG5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 424w, https://substackcdn.com/image/fetch/$s_!eGG5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 848w, https://substackcdn.com/image/fetch/$s_!eGG5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 1272w, https://substackcdn.com/image/fetch/$s_!eGG5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eGG5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png" width="1456" height="739" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:739,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1988902,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Datadog_060126&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199798126?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eGG5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 424w, https://substackcdn.com/image/fetch/$s_!eGG5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 848w, https://substackcdn.com/image/fetch/$s_!eGG5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 1272w, https://substackcdn.com/image/fetch/$s_!eGG5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e514377-9065-4d16-b57c-160afec4e714_2800x1422.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Datadog&#8217;s guide shows you how to connect AI spend, infrastructure, and model performance into a single view, so you can catch cost spikes the moment they happen. See how Kevel cut AWS costs by up to $100,000/month after replacing reactive cost reviews with real-time visibility.<br><br>You&#8217;ll learn how to:</p><ul><li><p>Break down AI costs by token, model, provider, and team</p></li><li><p>Get alerted the instant inference volume spikes or API spend exceeds budget</p></li><li><p>Correlate cost increases directly to architecture changes so root-cause analysis takes minutes</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Datadog_060126&quot;,&quot;text&quot;:&quot;Get the guide&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/Datadog_060126"><span>Get the guide</span></a></p><div><hr></div><p style="text-align: justify;">DoorDash&#8217;s customer support chatbot had a hallucination problem. Not the dramatic kind where it invents entire conversations, but the subtle, harder-to-catch kind.</p><p style="text-align: justify;">For example, the chatbot would look at a customer&#8217;s order history, see a delivery status field, misread it, and then confidently suggest a refund policy that didn&#8217;t actually exist. The raw data was right there in the chatbot&#8217;s context window, the working memory where an LLM holds everything it needs to generate a response, but having too much information was making things worse.</p><p style="text-align: justify;">For reference, DoorDash is one of the largest food delivery and local commerce platforms in the United States, connecting customers with restaurants and stores through a network of independent delivery drivers called Dashers.</p><p style="text-align: justify;">At that scale, the company handles hundreds of thousands of support contacts every day from customers, merchants, and Dashers, making automated support not just a nice-to-have but a necessity.</p><p style="text-align: justify;">The team could see the problem clearly, but fixing it was a different story. Every change they made to reduce hallucinations in one scenario risked creating new ones in another. They were stuck between two bad options. They could deploy changes to production and hope for the best, which meant risking real customer experiences. Or they could manually test dozens of conversation scenarios for every prompt change, which would take weeks and still might miss things.</p><p style="text-align: justify;">This tension isn&#8217;t unique to DoorDash. It&#8217;s the fundamental challenge anyone faces when they move from traditional deterministic software to LLM-based systems. DoorDash used to run customer support on hand-built decision trees, where every change had a predictable, traceable impact. LLMs replaced that predictability with flexibility and more natural conversations, but they also introduced non-determinism, meaning the same input can produce different outputs each time.</p><p style="text-align: justify;">DoorDash&#8217;s answer to this problem wasn&#8217;t a better chatbot. It was a better system for improving the chatbot, something they call the simulation and evaluation flywheel. In this article, we will learn how they built this flywheel and the key takeaways.</p><p style="text-align: justify;"><em>Disclaimer: This post is based on publicly shared details from the DoorDash Engineering Team. Please comment if you notice any inaccuracies.</em></p><h2 style="text-align: justify;">What the Flywheel Actually Does</h2><p style="text-align: justify;">The flywheel has two interconnected pieces:</p><ul><li><p style="text-align: justify;">The first is an offline simulator that generates realistic multi-turn customer conversations without involving any real customers.</p></li><li><p style="text-align: justify;">The second is an evaluation framework that automatically grades how the chatbot performed in those conversations.</p></li></ul><p style="text-align: justify;">Together, they create a tight iteration loop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4eth!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4eth!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 424w, https://substackcdn.com/image/fetch/$s_!4eth!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 848w, https://substackcdn.com/image/fetch/$s_!4eth!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!4eth!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4eth!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png" width="1456" height="1028" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1028,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199798126?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4eth!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 424w, https://substackcdn.com/image/fetch/$s_!4eth!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 848w, https://substackcdn.com/image/fetch/$s_!4eth!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 1272w, https://substackcdn.com/image/fetch/$s_!4eth!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f68eb05-7dc8-448a-b5ab-ad59f8da6c84_1688x1192.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://careersatdoordash.com/blog/doordash-simulation-evaluation-flywheel-to-develop-llm-chatbots-at-scale/">DoorDash Engineering Blog</a></figcaption></figure></div><p style="text-align: justify;">When the team notices a problem, they write an evaluation that captures the specific failure mode they want to fix. A single job trigger then orchestrates the entire pipeline end-to-end, automatically generating test scenarios from historical transcripts, running multi-turn conversations between the simulator and the chatbot, and evaluating the results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!USZ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!USZ5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 424w, https://substackcdn.com/image/fetch/$s_!USZ5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 848w, https://substackcdn.com/image/fetch/$s_!USZ5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 1272w, https://substackcdn.com/image/fetch/$s_!USZ5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!USZ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png" width="1456" height="1039" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1039,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:101184,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199798126?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!USZ5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 424w, https://substackcdn.com/image/fetch/$s_!USZ5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 848w, https://substackcdn.com/image/fetch/$s_!USZ5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 1272w, https://substackcdn.com/image/fetch/$s_!USZ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735b862b-b37e-4817-99a1-1c498fefd6c2_2054x1466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Then they modify the prompt or the system architecture, run the simulator again, and check whether the pass rate climbed. If it did, they would keep going. If it didn&#8217;t, they try something else. They repeat this cycle until the pass rate hits their exit criteria, and then they deploy with confidence that the change actually works.</p><p style="text-align: justify;">The graph below shows the pass rate for no-hallucination evaluation over time</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cKsW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cKsW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 424w, https://substackcdn.com/image/fetch/$s_!cKsW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 848w, https://substackcdn.com/image/fetch/$s_!cKsW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 1272w, https://substackcdn.com/image/fetch/$s_!cKsW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cKsW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:235835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199798126?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cKsW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 424w, https://substackcdn.com/image/fetch/$s_!cKsW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 848w, https://substackcdn.com/image/fetch/$s_!cKsW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 1272w, https://substackcdn.com/image/fetch/$s_!cKsW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb6dd70f-ae10-40a8-84d1-cf5046bbbf48_2514x1398.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Source: <a href="https://careersatdoordash.com/blog/doordash-simulation-evaluation-flywheel-to-develop-llm-chatbots-at-scale/">DoorDash Engineering Blog</a></figcaption></figure></div><p style="text-align: justify;">The speed of this loop makes this a powerful approach. DoorDash can run more than 200 simulated conversations in under five minutes and get automated evaluation results immediately.</p><p style="text-align: justify;">In other words, what used to take days of manual testing and review now takes hours. And because everything happens offline, they never risk degrading the experience for real customers while they iterate.</p><p style="text-align: justify;">Their evaluation suite has grown to more than 50 evaluations covering hallucination detection, tone assessment, issue classification, and other quality dimensions. Before any change goes to production, it must pass the full suite, which serves as both a quality check and a regression test.</p><p style="text-align: justify;">The flywheel sounds straightforward, but both the simulator and the evaluator required solving genuinely hard problems.</p><div><hr></div><h2><a href="https://go.bytebytego.com/Unleashed_060126">FeatureOps Summit 2026 - Feature management in the AI Era (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Unleashed_060126" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256380,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Unleashed_060126&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198890332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!xQ3q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!xQ3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7fe5105-3674-489a-ba00-e9e871fe1b21_1200x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Speed without control is a false economy. As AI code-generation accelerates software delivery, the <strong>FeatureOps Summit 2026</strong> is here to ensure that when we ship more, we break less.This premier virtual event brings together engineers, architects, and product leaders to explore the infrastructure of fearless delivery.</p><p><strong>Key Themes:</strong></p><ul><li><p><strong>AI Safety Nets:</strong> Guardrails for the flood of automated code.</p></li><li><p><strong>Edge Resilience:</strong> Sub-millisecond evaluation at scale.</p></li><li><p><strong>Continuous Flow:</strong> Moving past the &#8220;fixed-release&#8221; mindset. Register today to master the tools and patterns required for a fail-safe release environment.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Unleashed_060126&quot;,&quot;text&quot;:&quot;Register Today&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/Unleashed_060126"><span>Register Today</span></a></p><div><hr></div><h2 style="text-align: justify;">Simulating Customers That Push Back</h2><p style="text-align: justify;">A static test case can check whether the chatbot gives a reasonable answer to a single message, but it can&#8217;t capture what happens when a frustrated customer pushes back three times, provides additional information mid-conversation, or threatens to escalate.</p><p style="text-align: justify;">DoorDash&#8217;s simulator doesn&#8217;t use scripted messages at all.</p><p style="text-align: justify;">Instead, it uses an LLM to play the customer role, generating dynamic responses based on detailed test scenarios. At each turn, the simulator runs through a structured analysis, asking questions such as:</p><ul><li><p style="text-align: justify;">Was the issue addressed?</p></li><li><p style="text-align: justify;">Is the conversation making progress?</p></li><li><p style="text-align: justify;">Does the customer need to provide more information?</p></li><li><p style="text-align: justify;">Is the conversation going in circles?</p></li></ul><p style="text-align: justify;">Based on this analysis, it decides what a realistic customer would say next.</p><p style="text-align: justify;">The test scenarios themselves come from real historical support transcripts, not from engineers imagining what customers might say.</p><p style="text-align: justify;">LLMs analyze past conversations from DoorDash&#8217;s database and extract structured behavioral profiles, including the customer&#8217;s personality traits (frustrated and demanding versus confused and patient), a detailed narrative of the situation, and the specific outcome the customer is seeking. This grounds the simulator in actual customer behavior rather than idealized test cases.</p><p style="text-align: justify;">See the diagram below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L2Ta!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L2Ta!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 424w, https://substackcdn.com/image/fetch/$s_!L2Ta!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 848w, https://substackcdn.com/image/fetch/$s_!L2Ta!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 1272w, https://substackcdn.com/image/fetch/$s_!L2Ta!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L2Ta!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png" width="1456" height="1313" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1313,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138924,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/199798126?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L2Ta!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 424w, https://substackcdn.com/image/fetch/$s_!L2Ta!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 848w, https://substackcdn.com/image/fetch/$s_!L2Ta!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 1272w, https://substackcdn.com/image/fetch/$s_!L2Ta!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5832df44-5f71-4dcf-b4e9-6f38f771758d_2054x1852.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">The simulator also exhibits realistic escalation patterns. It doesn&#8217;t immediately ask for a manager. Instead, it gives the chatbot multiple chances to resolve the issue, only escalating after repeated unhelpfulness or circular exchanges, and re-engaging when progress becomes clear again. This mirrors how real customers behave.</p><p style="text-align: justify;">For a simulated conversation to be meaningful, the chatbot also needs realistic backend data. It needs to look up delivery status, check refund eligibility, and pull order details. DoorDash handles this through mock data that blends real production data with scenario-specific test data, preserving timestamps and relationships to keep interactions realistic. This allows them to test complex edge cases, including fraud scenarios and high-value refunds, that their previous testing infrastructure couldn&#8217;t handle.</p><h2 style="text-align: justify;">Using an LLM to Judge Another LLM</h2><p style="text-align: justify;">Running hundreds of realistic conversations is only useful if you can tell whether the chatbot actually handled them well. However, manually reading through every simulated conversation would defeat the entire purpose of automation. So DoorDash uses an LLM to evaluate the chatbot&#8217;s performance automatically.</p><p style="text-align: justify;">Each evaluation is structured as a function that takes the full conversation transcript (including tool calls and backend responses) along with the relevant company policy, applies a prompt asking whether the chatbot correctly followed that policy, and returns a binary pass or fail with the reasoning behind the judgment.</p><p style="text-align: justify;">The obvious objection here is that this sounds circular. If an LLM caused the problem by hallucinating, why would you trust another LLM to catch the hallucination?</p><p style="text-align: justify;">DoorDash addresses this directly with a concept they call the generator-verifier gap. Acting as a full customer support agent involves complex, multi-step decision-making across a huge range of possible scenarios. That&#8217;s genuinely hard. But verifying a single, narrowly-defined behavior is a much simpler task.</p><p style="text-align: justify;">For example, &#8220;Did the chatbot claim the customer was eligible for a refund when the policy says otherwise?&#8221; is a straightforward binary question. The evaluator isn&#8217;t trying to be a better support agent. It&#8217;s checking one specific thing at a time, and LLMs are much more reliable at these focused binary judgments than they are at open-ended generation.</p><p style="text-align: justify;">But DoorDash doesn&#8217;t just trust the LLM judge out of the box. They calibrate it against human judgment through a structured process. They collect a sample of conversations, have human experts label each one as pass or fail, run the LLM judge on the same samples, and then measure how often the judge agrees with the humans and how often it misses problems or flags false ones. They analyze the reasoning behind any mismatches, revise the evaluation prompt to fix systematic errors, and repeat until the judge reliably matches human expert judgment. This calibration step creates trust in the system.</p><p style="text-align: justify;">The binary nature of the evaluations is important here. DoorDash isn&#8217;t asking the LLM to rate the chatbot&#8217;s performance on a subjective scale of 1 to 10. They&#8217;re asking whether the chatbot followed a specific policy or not. It makes calibration faster, makes disagreements easier to diagnose, and produces more reliable judgments.</p><h2>Fixing Hallucinations by Giving the Chatbot Less Information</h2><p style="text-align: justify;">With the simulator generating conversations and the evaluator grading them, DoorDash had a working flywheel.</p><p style="text-align: justify;">During early launches, human reviewers noticed the chatbot was getting overwhelmed by the sheer volume of data in its context window. Order histories, delivery status updates, refund decisions, and tool call results were all being fed directly to the model as raw event logs. The chatbot would misinterpret a field or suggest a policy that didn&#8217;t exist, not because the information was wrong, but because there was too much of it. This runs directly counter to the intuition that giving a model more information should produce better results.</p><p style="text-align: justify;">DoorDash hypothesized that the same data that was vital for the chatbot&#8217;s reasoning was becoming noise when it came time to generate a response to the customer. Their solution was an architectural layer they called the &#8220;case state,&#8221; which synthesizes the raw tool history into a structured, intermediate representation. Instead of dumping everything into the context window, the case state distills the relevant facts into a clean format that the chatbot can actually use.</p><p style="text-align: justify;">Getting the case state right required the flywheel. Their first attempts at extraction logic didn&#8217;t work well at all. Some versions left out critical information, causing the chatbot to miss details that were essential for driving resolutions. Other versions remained too noisy or poorly structured, confusing the model in different ways. Since the simulator could generate numerous realistic conversations in minutes, the team experimented with dozens of different context shapes and prompt strategies in a rapid feedback loop. Each iteration took hours instead of the weeks it would have required through manual testing.</p><p style="text-align: justify;">Over 11 iterations, the hallucination evaluation pass rate climbed steadily upward, with a notable dip at iteration 3, where a change actually made things temporarily worse. That dip shows that improvement isn&#8217;t linear, even with a flywheel, and that part of the flywheel&#8217;s value is catching regressions before they reach real customers.</p><p style="text-align: justify;">The final result was a 90% reduction in hallucinations in simulation, and that improvement carried over into production. The strong correlation between their offline metrics and live traffic performance gave the team confidence that the flywheel is a reliable development tool, not just an internal sandbox disconnected from reality.</p><h1 style="text-align: justify;">Conclusion</h1><p style="text-align: justify;">The simulation and evaluation flywheel has fundamentally changed how DoorDash develops and deploys chatbot improvements, compressing iteration cycles from days to hours and giving them a way to validate changes across hundreds of scenarios before any real customer is affected.</p><p style="text-align: justify;">However, the flywheel does come with real tradeoffs worth understanding.</p><p style="text-align: justify;">The main limitation is that it can only catch problems for which you&#8217;ve written evaluations. If a failure mode isn&#8217;t captured by an evaluation, the flywheel is blind to it. DoorDash mitigates this by running a full evaluation suite before every deployment, covering hallucination, tone, and issue classification, but new failure modes can always emerge that existing evaluations don&#8217;t cover. This is why human review remains the starting point for every improvement cycle. Despite all the automation, someone still has to look at real conversations and notice what&#8217;s going wrong.</p><p style="text-align: justify;">Simulation fidelity is another inherent limitation. Even with transcript-derived scenarios and hybrid mock data, synthetic conversations are approximations of real user behavior. DoorDash reports a strong correlation between its offline metrics and production results, which validates the approach, but that correlation isn&#8217;t guaranteed to hold for every type of scenario or every kind of system change.</p><p style="text-align: justify;">There&#8217;s also the question of cost. Running hundreds of LLM-to-LLM conversations per test cycle, plus LLM-as-judge evaluations on each one, requires significant compute. For smaller teams or less critical applications, a lighter-weight version with fewer scenarios and more targeted evaluations might be the pragmatic starting point.</p><p style="text-align: justify;">The broader takeaway is that LLM systems require a completely different testing paradigm than traditional software. Since we can&#8217;t trace the branch anymore, we need a feedback loop that lets us simulate, evaluate, and iterate fast enough to build confidence before shipping.</p><p style="text-align: justify;"><strong>References:</strong></p><ul><li><p style="text-align: justify;"><a href="https://careersatdoordash.com/blog/doordash-simulation-evaluation-flywheel-to-develop-llm-chatbots-at-scale/">A Simulation and Evaluation Flywheel to build LLM Chatbots at Scale</a></p></li><li><p style="text-align: justify;"><a href="https://en.wikipedia.org/wiki/LLM-as-a-Judge">LLM as a Judge Pattern</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Must-Know Failure Modes in Distributed Systems]]></title><description><![CDATA[In this article, we will look at the most significant failure mode patterns in distributed systems and the standard approaches to deal with each of them.]]></description><link>https://blog.bytebytego.com/p/must-know-failure-modes-in-distributed</link><guid isPermaLink="false">https://blog.bytebytego.com/p/must-know-failure-modes-in-distributed</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Thu, 28 May 2026 16:31:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VDoG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p style="text-align: justify;">What does it mean for a distributed system to be up?</p><p style="text-align: justify;">On a single machine, the answer is straightforward, since a program is either running or it has crashed, and the line between the two is usually obvious from a stack trace.</p><p style="text-align: justify;">Distributed systems are not so simple. Every server can report healthy while users are seeing errors, the whole system can be technically working but stuck in a state it cannot recover from on its own, and it can quietly serve wrong data while every dashboard glows green.</p><p style="text-align: justify;">None of these may be because of bugs in the conventional sense. They are recurring failure patterns that have been showing up across systems for decades, with names, mechanisms, and standard ways of defending against them.</p><p style="text-align: justify;">In this article, we will look at the most significant failure mode patterns in distributed systems and the standard approaches to deal with each of them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VDoG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VDoG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 424w, https://substackcdn.com/image/fetch/$s_!VDoG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 848w, https://substackcdn.com/image/fetch/$s_!VDoG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 1272w, https://substackcdn.com/image/fetch/$s_!VDoG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VDoG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png" width="1456" height="1698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1698,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:431563,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VDoG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 424w, https://substackcdn.com/image/fetch/$s_!VDoG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 848w, https://substackcdn.com/image/fetch/$s_!VDoG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 1272w, https://substackcdn.com/image/fetch/$s_!VDoG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc1176a-e45f-4b31-b860-38cb99c198bd_2250x2624.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why Failures in Distributed Systems Are Different</h2>
      <p>
          <a href="https://blog.bytebytego.com/p/must-know-failure-modes-in-distributed">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How Airtable Built the Search Layer Behind Their AI Features]]></title><description><![CDATA[In this article, we will look at how Airtable&#8217;s data infrastructure team built its architecture, the challenges they faced, the tradeoffs they accepted, and why the choices they made only make sense once their data is properly understood.]]></description><link>https://blog.bytebytego.com/p/how-airtable-built-the-search-layer</link><guid isPermaLink="false">https://blog.bytebytego.com/p/how-airtable-built-the-search-layer</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Wed, 27 May 2026 15:30:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HKve!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/WorkOS_052726">WorkOS launches auth.md - an open protocol for agent registration (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/WorkOS_052726" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h5lV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 424w, https://substackcdn.com/image/fetch/$s_!h5lV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 848w, https://substackcdn.com/image/fetch/$s_!h5lV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 1272w, https://substackcdn.com/image/fetch/$s_!h5lV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h5lV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png" width="1456" height="799" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:799,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:448246,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/WorkOS_052726&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h5lV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 424w, https://substackcdn.com/image/fetch/$s_!h5lV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 848w, https://substackcdn.com/image/fetch/$s_!h5lV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 1272w, https://substackcdn.com/image/fetch/$s_!h5lV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb614ba2-3051-4017-81df-cf51c4fe6e26_2386x1310.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Sign-up forms were built for humans in browsers, so how do AI agents programmatically register with services?</p><p>Enter <strong>auth.md.</strong> By exposing a single, machine-readable Markdown file at your service root, AI agents can dynamically discover your OAuth Protected Resource Metadata, parse required scopes, and authenticate seamlessly.</p><p>With native support in WorkOS AuthKit, you can now implement this protocol out of the box, giving AI tools a standardized, secure way to log into your application.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/WorkOS_052726&quot;,&quot;text&quot;:&quot;Read the auth.md docs&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/WorkOS_052726"><span>Read the auth.md docs</span></a></p><div><hr></div><p style="text-align: justify;">Airtable holds embeddings for hundreds of thousands of customer databases, and on any given week, roughly three-quarters of them sit completely idle. This fact, more than any algorithm or vendor choice, decided the architecture behind their semantic search system. The interesting story is not which vector database they picked. It is how one peculiar property of their data forced a specific chain of engineering decisions, each one logical only in light of the one before it.</p><p style="text-align: justify;">Airtable is a platform where customers build their own database-like applications, organized into &#8220;bases&#8221; that often hold hundreds of thousands of rows. Their AI feature, called Omni, lets users ask natural-language questions of their data and get answers back in plain English. A separate feature, linked record recommendations, suggests relationships between rows based on meaning rather than exact text matches. Both features depend on the same underlying capability, which is finding the rows in a base that are semantically relevant to a user&#8217;s intent.</p><p style="text-align: justify;">This might sound simple until scale enters the picture. When a base has half a million rows, fitting all of them into a single LLM prompt becomes infeasible. The model has limits on how much context it can absorb, and even if those limits did not exist, sending that much data on every query would be slow and expensive. The system has to find the most relevant rows fast, then hand those rows to the LLM as context.</p><p style="text-align: justify;">In this article, we will look at how Airtable&#8217;s data infrastructure team built its architecture, the challenges they faced, the tradeoffs they accepted, and why the choices they made only make sense once their data is properly understood.</p><p style="text-align: justify;"><em>Disclaimer: This post is based on publicly shared details from the Airtable Engineering Team. Please comment if you notice any inaccuracies.</em></p><h2 style="text-align: justify;">The Data and the Constraints</h2><p style="text-align: justify;">The Airtable team anchored their work around four design priorities:</p><ul><li><p style="text-align: justify;">Queries had to return within 500 milliseconds at the 99th percentile, which means the slowest 1 percent of queries still had to come back within that window. Anything slower would make the AI features feel sluggish.</p></li></ul><ul><li><p style="text-align: justify;">Writes had to be high-throughput since customer data changes constantly, and embeddings have to keep pace.</p></li><li><p style="text-align: justify;">The system had to scale horizontally to support millions of independent bases.</p></li><li><p style="text-align: justify;">Everything had to be self-hosted because customer data privacy required keeping it all inside Airtable-controlled infrastructure.</p></li></ul><p style="text-align: justify;">Beyond those priorities, Airtable&#8217;s data has three properties worth flagging early:</p><ul><li><p style="text-align: justify;">Customer bases vary enormously in size, with some holding a handful of rows and others holding hundreds of thousands.</p></li><li><p style="text-align: justify;">Each base is isolated, meaning one customer&#8217;s data must never leak into another customer&#8217;s results.</p></li><li><p style="text-align: justify;">Most bases are idle most of the time, a fact that becomes important in a later section.</p></li></ul><p style="text-align: justify;">Before going further, we need to understand what an embedding is.</p><p style="text-align: justify;">An embedding is a list of numbers, typically several hundred or a thousand of them, generated by a neural network. The network is trained so that two pieces of text with similar meanings produce numerically close vectors. An embedding can be thought of as a fingerprint of meaning, where similarity in the numbers reflects similarity in what the text says.</p><p style="text-align: justify;">One important practical fact is that embeddings are typically about ten times the size of the original data they represent, which is why Airtable cannot just store them alongside the source rows in their primary database. A separate system is needed, one designed specifically for storing and searching across these large numerical vectors.</p><p style="text-align: justify;">The asynchronous embedding pipeline that generates and updates these vectors as customer data changes is a separate system, which is the database that stores the embeddings and serves queries against them. After evaluating the landscape in late 2024, Airtable selected Milvus as its database. This is because Milvus supported self-hosting, handled multi-tenancy through its partition model, and let them scale ingestion, indexing, and query execution as separate components. Picking Milvus, though, was the easy part. The hard part was figuring out how to organize Airtable&#8217;s data inside it.</p><p style="text-align: justify;">See the diagram below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AB2i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AB2i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 424w, https://substackcdn.com/image/fetch/$s_!AB2i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 848w, https://substackcdn.com/image/fetch/$s_!AB2i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 1272w, https://substackcdn.com/image/fetch/$s_!AB2i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AB2i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png" width="1456" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138669,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AB2i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 424w, https://substackcdn.com/image/fetch/$s_!AB2i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 848w, https://substackcdn.com/image/fetch/$s_!AB2i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 1272w, https://substackcdn.com/image/fetch/$s_!AB2i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2d47c3-a61a-4cb5-9668-512a3eccc8fc_2284x1226.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;">Partitioning Strategy</h2><p style="text-align: justify;">The first real architectural question was how to slice up customer data so that millions of bases can coexist in one system without leaking into each other.</p><p style="text-align: justify;">Two options were on the table.</p><p style="text-align: justify;">The first option of shared partitions would put many bases together in the same physical slice and rely on a customer ID filter at query time to keep results separate. This approach uses resources efficiently because there is no partition for every customer, and small bases do not sit around taking up dedicated storage. The cost is that every query carries the overhead of filtering by customer ID, and deleting a customer&#8217;s data becomes complicated because the rows are scattered across shared partitions.</p><p style="text-align: justify;">The second option of having one partition per base gives each customer their own physical slice. Queries are naturally isolated because they only ever touch one partition. Deletion is trivial since dropping the partition is enough. The cost is operational. With millions of customers, the database ends up managing millions of partitions, which puts pressure on its internal bookkeeping.</p><p style="text-align: justify;">Airtable picked the second option. The reasoning was that strong physical isolation made permission boundaries obvious, deletion stayed simple, and queries avoided the latency cost of post-query filtering.</p><p style="text-align: justify;">Then the team ran into a problem.</p><p style="text-align: justify;">At around 100,000 partitions inside a single Milvus collection, performance fell off a cliff. Partition creation latency went from about 20 milliseconds to roughly 250 milliseconds. Loading a partition started taking more than 30 seconds. Adding hardware would not have fixed any of this, because the issue was not a shortage of capacity. The issue was that too many partitions in one collection overwhelmed the bookkeeping that the database needed to keep them organized.</p><p style="text-align: justify;">The fix was hierarchical capping.</p><p style="text-align: justify;">Each Milvus cluster now holds 400 collections, and each collection holds at most 1,000 partitions, which limits any single cluster to 400,000 bases. As the customer base grows, Airtable provisions new clusters rather than packing more partitions into existing ones.</p><p style="text-align: justify;">The structure trades some operational complexity for predictable performance at every layer. See the diagram below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HKve!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HKve!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 424w, https://substackcdn.com/image/fetch/$s_!HKve!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 848w, https://substackcdn.com/image/fetch/$s_!HKve!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 1272w, https://substackcdn.com/image/fetch/$s_!HKve!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HKve!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png" width="1456" height="1310" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1310,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:195717,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HKve!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 424w, https://substackcdn.com/image/fetch/$s_!HKve!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 848w, https://substackcdn.com/image/fetch/$s_!HKve!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 1272w, https://substackcdn.com/image/fetch/$s_!HKve!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a5a8e4b-1a40-4586-98a0-201558e6bc18_2112x1900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Permissions deserve a brief discussion before we move further. Milvus does not know anything about who is allowed to see what data. It just stores embeddings and returns matches. Permission checks happen later, when the application layer takes the row IDs returned by Milvus and fetches the actual rows from Airtable&#8217;s primary database. This split keeps the vector search system focused on a single job, which is similarity search, and authorization stays where authorization always has lived.</p><p style="text-align: justify;">The pattern of hierarchical capping shows up across distributed systems, from sharded relational databases to message broker topics. Any flat namespace eventually hits a wall, and the fix is almost always to introduce another level of grouping above it. Recognizing this principle is more transferable than memorizing the specific numbers.</p><p style="text-align: justify;">See the diagram below that shows the query flow:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jnv3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jnv3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 424w, https://substackcdn.com/image/fetch/$s_!jnv3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 848w, https://substackcdn.com/image/fetch/$s_!jnv3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 1272w, https://substackcdn.com/image/fetch/$s_!jnv3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jnv3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png" width="1456" height="806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:806,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:115189,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jnv3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 424w, https://substackcdn.com/image/fetch/$s_!jnv3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 848w, https://substackcdn.com/image/fetch/$s_!jnv3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 1272w, https://substackcdn.com/image/fetch/$s_!jnv3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecc858c6-5b49-41ab-8ac5-60b0d7f94a93_2216x1226.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Once the data has been sliced up, the next question is how to actually search inside each slice.</p><h2 style="text-align: justify;">Index Selection</h2><p style="text-align: justify;">Vector search at scale involves an unavoidable tradeoff with three currencies, namely memory, latency, and recall.</p><p style="text-align: justify;">Recall means the percentage of truly relevant results that show up in a query response. Every vector index pays for performance with one of these three currencies, and no option gets all three for free.</p><p style="text-align: justify;">Airtable benchmarked three index types, and the results map cleanly onto this triangle.</p><p style="text-align: justify;">HNSW, which stands for Hierarchical Navigable Small World, builds a graph where similar vectors are connected to each other. A query starts at a small set of entry points near the top of the graph and follows the connections downward, hopping from one vector to its nearest neighbors until it converges on the closest match. HNSW is fast at lookup time, achieves recall in the 99 to 100 percent range, and behaves predictably under load. The cost is that the entire graph has to live in memory, which makes HNSW the most memory-hungry of the three options.</p><p style="text-align: justify;">IVF-SQ8 takes a different approach. The IVF part clusters vectors into groups, so a query only has to search inside the most relevant group rather than the full dataset. The SQ8 part compresses each number in the vector from four bytes to one byte, shrinking the index dramatically. The footprint becomes much smaller, but the compression introduces approximation error that lowers recall.</p><p style="text-align: justify;">DiskANN keeps most of the index on solid-state storage rather than in memory. It scales to enormous datasets per node because holding everything in RAM is not required. The cost is that every query touches disk, and disk is slower than memory, so query latency rises.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_yr5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_yr5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 424w, https://substackcdn.com/image/fetch/$s_!_yr5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 848w, https://substackcdn.com/image/fetch/$s_!_yr5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 1272w, https://substackcdn.com/image/fetch/$s_!_yr5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_yr5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png" width="1456" height="1157" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1157,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:179910,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_yr5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 424w, https://substackcdn.com/image/fetch/$s_!_yr5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 848w, https://substackcdn.com/image/fetch/$s_!_yr5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 1272w, https://substackcdn.com/image/fetch/$s_!_yr5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ba5d15c-70f3-49bd-8b57-129c63a6f510_2230x1772.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Airtable chose HNSW. Given the priorities from earlier in the design, this was almost the only available answer. A 500-millisecond latency target ruled out DiskANN&#8217;s higher per-query cost. The recall directly determines how good Omni&#8217;s responses feel to users, which makes the precision of HNSW worth paying for. The memory cost remained a real concern, but Airtable had a separate way to handle it.</p><p>The right index does not exist in the abstract. It exists relative to the priorities and constraints of a specific system. If Airtable&#8217;s latency tolerance had been looser, DiskANN would have been an interesting candidate. If their recall tolerance had been lower, IVF-SQ8 would have saved them money. None of the three options is universally better than the others.</p><p style="text-align: justify;">This same triangular pattern repeats across systems engineering. Caching works the same way, where hit rate trades against memory and consistency. Database indexes work the same way, where read speed trades against write speed and storage. The technologies stop feeling intimidating once the underlying tradeoff becomes recognizable.</p><h2 style="text-align: justify;">Hot and Cold Data</h2><p style="text-align: justify;">Picking HNSW solved the latency and recall problem, but pushed the entire cost onto memory. Across hundreds of thousands of bases, that memory bill adds up quickly. The team needed a way to shrink it without giving up the index they had just chosen.</p><p style="text-align: justify;">The solution came from looking at how customers actually use Airtable. When the team analyzed access patterns, they found that only about 25 percent of bases were read from or written to in any given week. The other 75 percent sat completely idle. This was not an anomaly. It reflected something real about how people work. Users tend to focus intensively on one base for a stretch of time, set it aside for weeks or months, and then come back when the project requires their attention again.</p><p style="text-align: justify;">Milvus supports offloading partitions from memory to storage and reloading them within seconds. With that capability, Airtable could keep only the hot partitions in memory and push the cold ones out. When a user opens a base that has not been touched in weeks, the partition reloads quickly enough that the user notices a brief warm-up rather than a failure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ivTm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ivTm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 424w, https://substackcdn.com/image/fetch/$s_!ivTm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 848w, https://substackcdn.com/image/fetch/$s_!ivTm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 1272w, https://substackcdn.com/image/fetch/$s_!ivTm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ivTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png" width="1456" height="1087" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1087,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136461,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ivTm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 424w, https://substackcdn.com/image/fetch/$s_!ivTm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 848w, https://substackcdn.com/image/fetch/$s_!ivTm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 1272w, https://substackcdn.com/image/fetch/$s_!ivTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbd37d3a-ccc6-408d-823a-3c79ec9574b2_1980x1478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">This approach works for Airtable specifically because their access pattern is bursty and bimodal. If usage were spread evenly across all bases, with every customer constantly touching their data at the same low rate, cold offloading would not save much. The hot set would be the entire dataset. Airtable&#8217;s pattern is the opposite. A small fraction of bases is active at any moment, and the active set rotates over time.</p><p style="text-align: justify;">What made this work was measurement.</p><p style="text-align: justify;">The Airtable engineering team did not guess about access patterns and did not reach for a generic optimization. They looked at the data, found a property of their actual usage, and built around it. The HNSW choice became economically viable because of this measurement, and the decisions in this system reinforce each other in a way that would not be obvious from evaluating any one of them in isolation.</p><h2 style="text-align: justify;">Recovery</h2><p style="text-align: justify;">The traditional approach to disaster recovery in databases is backup and restore. Snapshots get taken regularly, stored somewhere safe, and used to rebuild the system if something catastrophic happens. Airtable went a different direction.</p><p style="text-align: justify;">Their recovery path is to spin up a fresh Milvus cluster and re-embed customer data from the source. The most-used bases get re-embedded first so that most users see normal service quickly. The remaining bases get rebuilt lazily as customers access them. There is some compute cost during recovery and some delay before every base is fully back, but the path is conceptually simple and works across many failure modes at once. Corruption, model migrations, and certain data residency changes all reduce to the same procedure.</p><p style="text-align: justify;">This option is only available because Airtable has already built an asynchronous embedding pipeline as part of earlier work. That pipeline normally generates new embeddings whenever customer data changes, processing them in the background rather than blocking writes. Recovery is not a separate system created for emergencies. It is just the existing pipeline running against an empty cluster.</p><h2 style="text-align: justify;">Conclusion</h2><p style="text-align: justify;">The system built by Airtable involves four major tradeoffs: how to partition the data, which index to use, when to keep data in memory, and how to recover from failure.</p><p style="text-align: justify;">Every one of those decisions traces back to the same upstream fact about Airtable&#8217;s tenants. Their customers run small, isolated bases that are mostly cold most of the time. Changing any one of those properties can cause the design to fall apart.</p><p style="text-align: justify;">For example, a workload where every base is hot all the time would make cold offloading useless. A workload requiring strict consistency would not tolerate the asynchronous embedding pipeline. A workload with very small per-customer datasets might benefit more from shared partitions than from one-per-base.</p><p style="text-align: justify;">The technologies Airtable uses, including Milvus, HNSW, and the rest, are interchangeable in principle. The same system could be rebuilt on a different infrastructure, and the architectural reasoning would still hold. What is harder to replicate is the discipline of letting the data drive the architecture rather than the other way around.</p><p style="text-align: justify;"><strong>References:</strong></p><ul><li><p><a href="https://medium.com/airtable-eng/productionizing-semantic-search-how-we-built-and-scaled-vector-infrastructure-at-airtable-180fff11a136">Productionizing Semantic Search: How We Built and Scaled Vector Infrastructure at Airtable</a></p></li><li><p><a href="https://en.wikipedia.org/wiki/Airtable">What is Airtable?</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[How Vercel Cut Build Wait Times From 90 Seconds To 5]]></title><description><![CDATA[In this article, we examine the constraints Vercel faced, the choices they made in response, and the optimizations that produced the speedup.]]></description><link>https://blog.bytebytego.com/p/how-vercel-cut-build-wait-times-from</link><guid isPermaLink="false">https://blog.bytebytego.com/p/how-vercel-cut-build-wait-times-from</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Tue, 26 May 2026 15:31:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4rvh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/GitLab_052626">GitLab Transcend is back, June 10, streaming live from London. (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/GitLab_052626" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dw9P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Dw9P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Dw9P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Dw9P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dw9P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg" width="1100" height="577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:577,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:null,&quot;href&quot;:&quot;https://go.bytebytego.com/GitLab_052626&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Dw9P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Dw9P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Dw9P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Dw9P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51fe6edb-8130-47ab-8fbc-debf97c06e3e_1100x577.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On June 10, GitLab Transcend streams live from London with an agenda built for practitioners like you. You can expect an agenda that&#8217;s full of keyboard moments with live demos of Duo Agent Platform, agentic AI use cases from your peers, and The Developer Show hosted live by Senior Developer Advocate, Colleen Lake. Register today.</p><p>GitLab Transcend streams live from London on June 10 (with regional replays for APAC and AMER on June 11). Register for free today.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/GitLab_052626&quot;,&quot;text&quot;:&quot;Stream the event&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/GitLab_052626"><span>Stream the event</span></a></p><div><hr></div><p style="text-align: justify;">In November 2023, Vercel quietly shipped an internal platform that cut its build provisioning time from 90 seconds to 5. That sounds like a story about making things faster. It is, but only on the surface. The real story is that Vercel got faster by accepting a harder constraint, building a more complicated foundation, and then layering three separate optimizations on top of it. The 18x improvement is the result.</p><p style="text-align: justify;">Vercel is a deployment platform for web applications. When a developer pushes code to a connected repository, Vercel pulls that code, runs the build process (compiling, bundling assets, packaging the output) on its own servers, and then deploys the result to a global edge network of geographically distributed servers that deliver the site to end users. The build step happens on Vercel&#8217;s infrastructure, which means thousands of customers run their build scripts on machines that Vercel manages. Every push has to feel instant to the developer, has to run safely on shared hardware, and has to scale through traffic spikes without degrading.</p><p style="text-align: justify;">The platform that handles all of this is internally codenamed Hive, and it has been powering Vercel&#8217;s builds since late 2023.</p><p style="text-align: justify;">Hive is the reason behind the 90-to-5 transformation. In this article, we examine the constraints Vercel faced, the choices they made in response, and the optimizations that produced the speedup.</p><p style="text-align: justify;"><em>Disclaimer: This post is based on publicly shared </em>details<em> from the Vercel Engineering Team. Please comment if you notice any inaccuracies.</em></p><h2>The Trust Problem</h2><p style="text-align: justify;">The architecture rests on a single foundational assumption. Hive operates as if every piece of code it executes might be malicious, running on machines shared by many tenants at once. That assumption influences everything that follows.</p><p style="text-align: justify;">It matters because the trust calculation flips entirely between two situations. When a team runs its own code on its own server, the goal is performance and convenience. The code trusts the machine, and the machine trusts itself. When the code comes from someone else and runs on shared hardware, the calculation changes. The platform has to assume the code might try to break out of its sandbox, read another customer&#8217;s secrets, or interfere with builds running on the same machine. This is hostile multi-tenancy, and it is a different infrastructure problem from running cooperative workloads.</p><p style="text-align: justify;">Vercel sits squarely in this harder category.</p><p style="text-align: justify;">Every customer push is, from Vercel&#8217;s perspective, code written by someone the team has never met, running on a machine that is also running other customers&#8217; code at the same time. The build script could be a normal Next.js compilation, or it could be a deliberately crafted exploit designed to escape the sandbox. Vercel has to handle both cases identically, since the platform cannot tell the difference in advance.</p><p style="text-align: justify;">The obvious answer is to run each build inside a Docker container.</p><p style="text-align: justify;">Containers are how modern infrastructure runs isolated workloads, and most engineers reach for them by reflex. The problem is that containers were designed primarily as a packaging tool, with isolation as a useful side effect. Multiple containers on the same machine all share the same Linux kernel, which is the part of the operating system with direct access to the hardware. Anything that breaks through the kernel can reach other parts of the machine.</p><p style="text-align: justify;">For most workloads, this risk is acceptable, since most workloads are cooperative. A team&#8217;s own microservices have no incentive to attack each other. However, for running strangers&#8217; build scripts at scale, the risk profile is different. A single kernel exploit in one customer&#8217;s build could reach every other customer&#8217;s build on the same machine, and the blast radius would be enormous.</p><p style="text-align: justify;">This is why standard container orchestration was a poor fit.</p><p style="text-align: justify;">Tools like Kubernetes assume cooperative tenants and provide good isolation by default, but not adversarial isolation. Adding hardening on top of Kubernetes was an option, but for a constraint as foundational as tenant isolation, building from primitives gave Vercel more leverage. Containers leave a gap that Vercel could not afford to leave open. The question was how to close that gap without giving up the speed that containers provide.</p><p style="text-align: justify;">See the diagram below that offers some insight:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PY5G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PY5G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 424w, https://substackcdn.com/image/fetch/$s_!PY5G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 848w, https://substackcdn.com/image/fetch/$s_!PY5G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!PY5G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PY5G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png" width="1456" height="927" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:927,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:144844,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PY5G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 424w, https://substackcdn.com/image/fetch/$s_!PY5G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 848w, https://substackcdn.com/image/fetch/$s_!PY5G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!PY5G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa832b83e-9c73-416c-87d4-a6b5607507d4_2010x1280.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>MicroVMs and Firecracker</h2><p style="text-align: justify;">The traditional alternative to containers is the virtual machine.</p><p style="text-align: justify;">A virtual machine runs a complete operating system on top of a virtualization layer, which means two VMs on the same physical machine each have their own kernel. A kernel exploit in one VM cannot reach the other, since the kernels are genuinely separate. The downside is weight. A traditional VM might take 30 to 60 seconds to boot and consume hundreds of megabytes of memory just to exist. For a workload like web hosting, where a single VM runs for months, that overhead is fine. For a workload like running a 2-minute build and then throwing the environment away, it becomes wasteful.</p><p style="text-align: justify;">Around 2018, AWS released Firecracker, an open-source virtualization tool that strips a VM down to the minimum needed to run one short-lived workload.</p><p style="text-align: justify;">Firecracker microVMs boot in around 125 milliseconds and use only a few megabytes of memory each. They provide VM-level isolation, with separate kernels and a hardware-enforced boundary that the CPU itself maintains, at something close to container-level speed. This is a new shape in the isolation tradeoff space, occupying a corner that did not exist before.</p><p style="text-align: justify;">AWS originally built Firecracker to power Lambda, where it now runs at production scale across millions of concurrent functions. That track record gave Vercel a battle-tested foundation rather than an experimental one.</p><p style="text-align: justify;">Vercel adopted Firecracker as the core of Hive. Each customer build runs in a microVM that Vercel calls a cell, and the relationship between cells and Firecracker processes is strictly one-to-one. Each Firecracker process manages exactly one cell, and each cell handles exactly one build. Inside the cell sits a container that runs the actual build script. The container handles packaging, since it carries all the build tools and dependencies the customer&#8217;s project needs. The microVM handles isolation, since it provides the kernel-level boundary that containers alone cannot. Each layer does what it is good at.</p><p style="text-align: justify;">This setup is the architectural answer to the trust problem.</p><p style="text-align: justify;">Vercel can now run a strange piece of code with confidence that, even if the code attempts something hostile, it cannot reach beyond the cell it is running in. The microVM is the wall, and the wall is enforced by the CPU&#8217;s virtualization features rather than by software alone. Firecracker provides the isolation primitive, while the rest of Hive is the machinery that turns one isolated cell into a system capable of running thousands of builds across the world.</p><h2>Inside Hive</h2><p style="text-align: justify;">Hive has a small vocabulary, since the names map directly to physical and logical pieces of the system.</p><ul><li><p style="text-align: justify;">A Hive is a regional cluster, and multiple Hives can exist in a single region.</p></li><li><p style="text-align: justify;">A Box is a physical machine inside a Hive.</p></li><li><p style="text-align: justify;">A Cell is a microVM running on a Box.</p></li><li><p style="text-align: justify;">The Control Plane is the brain of the cluster, and the API is the entry point that the rest of Vercel&#8217;s systems talk to.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4rvh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4rvh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 424w, https://substackcdn.com/image/fetch/$s_!4rvh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 848w, https://substackcdn.com/image/fetch/$s_!4rvh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!4rvh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4rvh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png" width="1456" height="974" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:974,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193443,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4rvh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 424w, https://substackcdn.com/image/fetch/$s_!4rvh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 848w, https://substackcdn.com/image/fetch/$s_!4rvh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!4rvh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5280985-afc9-4b10-96b3-91807a497ad7_2010x1344.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Here is a different view of the same diagram:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mmu8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mmu8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 424w, https://substackcdn.com/image/fetch/$s_!Mmu8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 848w, https://substackcdn.com/image/fetch/$s_!Mmu8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!Mmu8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mmu8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png" width="1456" height="902" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:902,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106763,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mmu8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 424w, https://substackcdn.com/image/fetch/$s_!Mmu8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 848w, https://substackcdn.com/image/fetch/$s_!Mmu8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!Mmu8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2c578de-cf87-4731-b4be-2c7bc887cf55_2070x1282.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">When a customer pushes code, Vercel&#8217;s build pipeline (which is a separate system from Hive) decides which Hive to use based on the customer and the build configuration, and then calls that Hive&#8217;s API to request a cell. The Control Plane finds an available cell on one of the Boxes, hands it to the build pipeline, and the build runs inside the cell&#8217;s container. Once the build finishes, the cell is destroyed, and the resources return to the pool.</p><p style="text-align: justify;">Running multiple Hives per region is a deliberate failure isolation choice. If one Hive has a bad day, the others in the same region keep running. This is finer-grained reliability than running in multiple cloud regions alone, and it means a single bad deploy or infrastructure incident will not take out an entire customer base.</p><p style="text-align: justify;">Two background programs handle the orchestration inside each Box.</p><ul><li><p style="text-align: justify;">A box daemon runs on the physical machine and handles provisioning, spawning new Firecracker processes, and managing the lifecycle of cells.</p></li><li><p style="text-align: justify;">A cell daemon runs inside each microVM and manages the build container that does the actual work.</p></li></ul><p style="text-align: justify;">The two daemons communicate over a socket connection, which is how the orchestration layer scales without becoming a bottleneck. The pattern matters more than the implementation. Responsibility is split between the host machine and the virtual machine, and each side has a clear job.</p><p style="text-align: justify;">See the diagram below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aue7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aue7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 424w, https://substackcdn.com/image/fetch/$s_!Aue7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 848w, https://substackcdn.com/image/fetch/$s_!Aue7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!Aue7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aue7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png" width="1456" height="802" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:802,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:160990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Aue7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 424w, https://substackcdn.com/image/fetch/$s_!Aue7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 848w, https://substackcdn.com/image/fetch/$s_!Aue7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 1272w, https://substackcdn.com/image/fetch/$s_!Aue7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb234f5d3-0519-47ec-aabf-08824902cb8c_2328x1282.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Each cell receives dedicated CPU and memory, which means those resources are partitioned cleanly between cells on the same Box. Disk and network throughput, however, are rate-limited based on the Box&#8217;s overall capacity rather than dedicated.</p><p style="text-align: justify;">This reflects where multi-tenant isolation hits its practical limits. Some resources are easy to slice up, while others must be shared with quotas because slicing them cleanly would waste too much capacity.</p><p style="text-align: justify;">The other architectural choice is that these cells are ephemeral. Once a build completes, the cell is destroyed rather than reused, even though reusing it would be faster. This is a security choice rather than a performance one. A reused cell would create a path for one customer&#8217;s leftover state, whether that means files on disk, processes still running, or memory contents, to leak into another customer&#8217;s environment. Destroying the cell after every build closes that path entirely.</p><h2>The 90-to-5 Breakdown</h2><p style="text-align: justify;">All of this architecture would still be slow if every build had to spin up a fresh cell from scratch. Even with Firecracker, the cold path takes about 5 seconds per cell, since the system has to boot the microVM, mount the disk, load the build container image, and start the container.</p><p style="text-align: justify;">The 18x improvement comes from three places.</p><p style="text-align: justify;">The first layer is faster boots. Inside each Box, Vercel optimized the cell startup path itself. The build container image is large, so pulling it fresh on every cold start used to add significant time. Vercel now caches the build container image so it loads from a local copy rather than from a remote registry, and that change alone shaved around 45 seconds off VM startup times compared to their previous solution. They also use block device snapshotting, where the disk image of a freshly prepared cell is saved at a known-good moment, and new cells start from that saved copy instead of building up from scratch. These optimizations make the cold path itself dramatically faster.</p><p style="text-align: justify;">The second layer is the warm pool, and this is where most of the speedup happens. Vercel keeps a pool of cells already booted, with the build container image loaded, sitting idle and waiting. When a build comes in, it uses a warm cell and starts running immediately. The 5-second provisioning time only applies when the warm pool is empty, which happens during traffic spikes or for specialized builds like Secure Compute (an enterprise feature with stricter isolation requirements). For the common case, the wait is essentially zero. The warm pool means that most builds skip cold-start provisioning entirely.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NmhJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NmhJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 424w, https://substackcdn.com/image/fetch/$s_!NmhJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 848w, https://substackcdn.com/image/fetch/$s_!NmhJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!NmhJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NmhJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png" width="1456" height="898" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:898,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NmhJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 424w, https://substackcdn.com/image/fetch/$s_!NmhJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 848w, https://substackcdn.com/image/fetch/$s_!NmhJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!NmhJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65cf9302-1666-41b6-971d-48220a5269f8_2216x1366.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">The third layer is Firecracker&#8217;s baseline speed. None of the other features would matter if the underlying virtualization were heavy. Traditional VMs take tens of seconds to boot, which makes warm pools impractical at scale, since the pool would have to be enormous to keep up with demand. Firecracker boots in milliseconds, which is what makes warm pools and snapshotting work in the first place. The whole speed story rests on the foundation Vercel chose at the very beginning.</p><p style="text-align: justify;">Across all builds, Vercel saw a 30% improvement in build performance after switching to Hive. Builds that hit the cold path, where a fresh cell has to be spawned, saw build times drop by about 40%. The provisioning portion of those cold-path builds dropped from 90 seconds to 5. None of these numbers came from a single breakthrough. They came from compounding wins on a foundation that allowed them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZNfb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZNfb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 424w, https://substackcdn.com/image/fetch/$s_!ZNfb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 848w, https://substackcdn.com/image/fetch/$s_!ZNfb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!ZNfb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZNfb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:164959,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZNfb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 424w, https://substackcdn.com/image/fetch/$s_!ZNfb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 848w, https://substackcdn.com/image/fetch/$s_!ZNfb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!ZNfb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0759ce90-ef5f-4c89-85d5-754e43428684_2346x1304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Costs and Tradeoffs</h2><p style="text-align: justify;">Warm pools come with a real trade-off in terms of cost.</p><p style="text-align: justify;">Keeping cells pre-warmed means Vercel pays for compute that does no useful work most of the time. The amount is substantial, since the right pool size is a constant balance between waste, where too many idle cells burn money, and tail latency, where too few warm cells leave customers waiting during traffic spikes. For a service with bursty traffic, this is an active operations problem rather than a one-time tuning.</p><p style="text-align: justify;">Building Hive at all was the higher cost. Vercel could have used Kubernetes or ECS and added isolation on top, and they would have shipped something serviceable in a fraction of the time. Building from primitives required enormous engineering investment and an ongoing maintenance burden that off-the-shelf tools would have absorbed for them. The reason it was worth doing is that owning the substrate gave Vercel leverage that an opinionated platform would have caused difficulty. The team can make decisions like destroying every cell after every build, or tuning the warm pool based on customer build patterns, without working around someone else&#8217;s design.</p><p style="text-align: justify;">The payoff for that leverage is visible in product features. The same architecture that drove the 90-to-5 number also enabled Vercel to ship enhanced build machines for customers who need extra memory or disk, and to support Secure Compute for enterprise customers with stricter isolation requirements. These features would have been much harder to build on top of a third-party platform, since they would have required either accepting the platform&#8217;s constraints or fighting them at every step. Building the foundation paid off in product capabilities, not just performance numbers.</p><p style="text-align: justify;">This architecture makes sense for Vercel because they have hostile multi-tenancy at scale, and because builds are core to their business rather than incidental to it. For a team running its own builds for its own code on its own machines, microVMs would be wildly over-engineered, and containers are perfectly appropriate. The lesson is the connection between threat model and architecture, rather than a generic recommendation to always use microVMs.</p><h2>Conclusion</h2><p style="text-align: justify;">Vercel&#8217;s story illustrates a pattern that repeats across the industry. Threat model drives architecture. Cooperative tenants can run on containers and standard orchestration. Adversarial tenants require microVMs, isolated runtimes, or sandboxed execution environments. The shape follows from the assumption.</p><p style="text-align: justify;">The broader takeaway from Hive is that Vercel got faster by accepting a harder problem rather than working around it.</p><p style="text-align: justify;">Containers were inadequate for the trust model, microVMs were the right shape, and the speed came from stacking three separate optimizations on top of a deliberately harder foundation. Starting with the constraint and then optimizing for the speed proved to be a stronger bet than starting with the simple thing and trying to make it run quicker.</p><p style="text-align: justify;"><strong>References:</strong></p><ul><li><p style="text-align: justify;"><a href="https://vercel.com/blog/a-deep-dive-into-hive-vercels-builds-infrastructure">A deep dive into Vercel&#8217;s build infrastructure</a></p></li><li><p style="text-align: justify;"><a href="https://firecracker-microvm.github.io/">What is Firecracker</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[How CockroachDB Built Vector Indexing at Scale]]></title><description><![CDATA[In this article, we will look at how the CockroachDB engineering team built this index and the challenges they faced.]]></description><link>https://blog.bytebytego.com/p/how-cockroachdb-built-vector-indexing</link><guid isPermaLink="false">https://blog.bytebytego.com/p/how-cockroachdb-built-vector-indexing</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Mon, 25 May 2026 15:30:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RPtz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/Redis_052526">Models are no longer the bottleneck. Agent context is. (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/Redis_052526" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ap-L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 424w, https://substackcdn.com/image/fetch/$s_!Ap-L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 848w, https://substackcdn.com/image/fetch/$s_!Ap-L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!Ap-L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ap-L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2515752,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/Redis_052526&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ap-L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 424w, https://substackcdn.com/image/fetch/$s_!Ap-L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 848w, https://substackcdn.com/image/fetch/$s_!Ap-L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 1272w, https://substackcdn.com/image/fetch/$s_!Ap-L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c282789-365f-42f1-91d8-848bed3b0e07_2160x2160.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">Most AI agents don&#8217;t fail because of the model. They fail because the context is broken&#8212;stale data, fragmented systems, slow retrieval.</p><p>Join Simba Khadder, Head of AI Product &amp; Director of Software Engineering at Redis, on June 10 to see how to turn scattered enterprise data into live, agent-ready context with Redis Iris.</p><p>You&#8217;ll learn:</p><ul><li><p>The four failure modes of how context breaks in production</p></li><li><p>How to make your enterprise data navigable for runtime</p></li><li><p>How Redis Context Retriever, Search, Data Integration, and Agent Memory work together</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/Redis_052526&quot;,&quot;text&quot;:&quot;Reserve your spot &#8594;&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/Redis_052526"><span>Reserve your spot &#8594;</span></a></p><div><hr></div><p style="text-align: justify;">The CockroachDB team wanted to add vector search to their distributed database, and dozens of well-known algorithms already existed.</p><p style="text-align: justify;">To facilitate the decision-making process, they wrote down a list of architectural requirements, including a refusal to depend on any central coordinator, a refusal to allocate large in-memory caches, a need for real-time updates, an intolerance for hot spots, and a requirement to support sharding. Then they checked the list against the popular options.</p><p style="text-align: justify;">Most failed at least one requirement, and some failed several. The team&#8217;s response was to build something new, called C-SPANN, that satisfied every constraint by treating the index as ordinary table data inside CockroachDB rather than as a separate system.</p><p style="text-align: justify;">In this article, we will look at how the CockroachDB engineering team built this index and the challenges they faced.</p><p style="text-align: justify;"><em>Disclaimer: This post is based on publicly shared details from the CockroachDB Engineering Team. Please comment if you notice any inaccuracies.</em></p><h2>Vectors and Approximate Nearest Neighbor Search</h2><p style="text-align: justify;">A vector is a long list of numbers that captures the meaning of something.</p><p style="text-align: justify;">Modern neural networks like the ones behind ChatGPT can take an image, a document, or a snippet of audio and convert it into a vector of floating-point numbers, typically a few hundred to a few thousand dimensions long.</p><p style="text-align: justify;">The useful property of these vectors, often called embeddings, is that similar things produce similar vectors. For example, two photos of beaches end up close to each other in this multi-dimensional space, and a photo of a beach and the word &#8220;beach&#8221; end up in roughly the same neighborhood, which is what makes semantic search possible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RPtz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RPtz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 424w, https://substackcdn.com/image/fetch/$s_!RPtz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 848w, https://substackcdn.com/image/fetch/$s_!RPtz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 1272w, https://substackcdn.com/image/fetch/$s_!RPtz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RPtz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png" width="1456" height="1008" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1008,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:171626,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RPtz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 424w, https://substackcdn.com/image/fetch/$s_!RPtz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 848w, https://substackcdn.com/image/fetch/$s_!RPtz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 1272w, https://substackcdn.com/image/fetch/$s_!RPtz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58352b2e-8406-420a-a97d-c11734237c85_2322x1608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">The trick is finding those neighbors quickly when you have billions of vectors to search through.</p><p style="text-align: justify;">Traditional database indexes work because numbers and strings have a natural ordering. We can sort them, store them in a B-tree, and walk that tree to find what you want.</p><p style="text-align: justify;">Vectors do not have that property. Should beach photos come before or after food photos? What about photos of food at the beach? There is no answer, because the data has no inherent sequence, which means a B-tree cannot help you.</p><p style="text-align: justify;">The brute-force alternative is to compare your query vector against every stored vector and return the closest matches. This works fine for a few thousand vectors, but falls apart somewhere in the tens of thousands, and becomes hopeless once you reach the millions.</p><p style="text-align: justify;">Vector indexes solve this by giving up on exact answers. They find approximate nearest neighbors, accepting a small loss of accuracy in exchange for orders of magnitude better performance. The results are usually close enough that real users cannot tell the difference, and the search runs fast enough to feel instant. That tradeoff between accuracy and speed is the foundation of every vector index, and the interesting engineering question is how you make the rest of the system work around it.</p><p style="text-align: justify;">Even with a good algorithm for finding nearest neighbors, plugging it into a distributed transactional database is its own problem. That is where the CockroachDB story actually begins.</p><h2>Architectural Constraints in a Distributed SQL Database</h2><p style="text-align: justify;">CockroachDB is a distributed SQL database. This means that the data lives across multiple machines, often across regions, and the system is designed to scale linearly. It guarantees transactional consistency and supports real-time updates, and all of this has to keep working when machines die, disks fail, or networks partition.</p><p style="text-align: justify;">These properties impose a set of architectural constraints on any new feature, and a vector index is no exception. The CockroachDB team wrote down six requirements that any candidate algorithm had to satisfy.</p><ul><li><p style="text-align: justify;">The first requirement is that no single node can act as a central coordinator. Any node in the cluster should be able to serve reads and writes, because relying on a single leader to direct traffic creates a bottleneck and a single point of failure.</p></li><li><p style="text-align: justify;">The second requirement is that the index cannot rely on large in-memory structures. Index state has to live in persistent storage, since the team could not assume every node has gigabytes of RAM available for caching vectors. They also wanted to avoid the long warm-up times that come with rebuilding in-memory caches after a restart, which matters especially for serverless deployments where nodes spin up and down on demand.</p></li><li><p style="text-align: justify;">The third requirement is that network hops have to stay minimal. Round-trips between nodes are expensive, and any algorithm that requires sequential traversal across the cluster will accumulate latency unpredictably.</p></li><li><p style="text-align: justify;">The fourth requirement is that the index data layout has to be sharding-compatible. Index data has to map naturally to CockroachDB&#8217;s key-value storage so that it can be split, merged, and rebalanced like any other table.</p></li><li><p style="text-align: justify;">The fifth requirement is that the index must avoid creating hot spots. As inserts and queries scale up, the load has to spread across the cluster, because concentrating traffic on a single node defeats the point of running a distributed system in the first place.</p></li><li><p style="text-align: justify;">The sixth requirement is that the index has to support incremental updates. Inserts and deletes need to be applied in real time without blocking queries, requiring batch rebuilds, or degrading search quality over time.</p></li><li></li></ul><p style="text-align: justify;">This list rules out the most popular vector indexes.</p><p style="text-align: justify;">HNSW, the graph-based algorithm that powers pgvector, Weaviate, and many other systems, is excellent on accuracy benchmarks but builds its graph in memory and resists sharding. Classic IVF is closer in spirit but assumes a single-node deployment and struggles with dynamic updates. Specialized vector databases like Pinecone solve these problems by being separate systems entirely, which works fine if you are willing to keep your vectors in one database and your transactional data in another.</p><p style="text-align: justify;">CockroachDB needed something that handled both inside the same system, with the same guarantees.</p><p style="text-align: justify;">Faced with this list, the team built something new. They called it C-SPANN, and the design choices that make it work are mostly about what it does not try to do.</p><h2>The C-SPANN Architecture</h2><p style="text-align: justify;">C-SPANN borrows ideas from three places.</p><p style="text-align: justify;">Microsoft&#8217;s SPANN paper contributed the tree structure for partitioning vectors, the follow-up SPFresh paper contributed techniques for incremental updates, and Google&#8217;s ScaNN project contributed ideas around quantization.</p><p style="text-align: justify;">The CockroachDB team combined these with their distributed SQL architecture to produce something none of the source papers describe directly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mNGz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mNGz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 424w, https://substackcdn.com/image/fetch/$s_!mNGz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 848w, https://substackcdn.com/image/fetch/$s_!mNGz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 1272w, https://substackcdn.com/image/fetch/$s_!mNGz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mNGz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png" width="1456" height="1079" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1079,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:137726,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mNGz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 424w, https://substackcdn.com/image/fetch/$s_!mNGz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 848w, https://substackcdn.com/image/fetch/$s_!mNGz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 1272w, https://substackcdn.com/image/fetch/$s_!mNGz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8f7a20d-1d32-4e69-ab4a-caca3c50151b_2140x1586.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">At the core is a hierarchical K-means tree. Vectors are grouped into partitions based on similarity, where each partition typically contains dozens to hundreds of vectors and has a centroid that represents the average of the vectors it contains. Think of the centroid as the partition&#8217;s center of mass. Those centroids are themselves grouped into higher-level partitions with their own centroids, and that process repeats until you reach a single root partition at the top.</p><p style="text-align: justify;">The result is a wide, shallow tree. With a fanout of around 100, an index of one million vectors needs only three levels, and an index of ten billion vectors needs only five. Searching the tree means starting at the root, comparing the query vector to the centroids at that level, descending into the closest partition, and repeating until you reach the leaves. At each level, partitions can be processed in parallel, which keeps latency low and predictable. At the leaves, the system scans a few hundred candidate vectors using SIMD CPU instructions for speed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xwlf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xwlf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 424w, https://substackcdn.com/image/fetch/$s_!Xwlf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 848w, https://substackcdn.com/image/fetch/$s_!Xwlf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 1272w, https://substackcdn.com/image/fetch/$s_!Xwlf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xwlf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png" width="1456" height="891" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:891,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:170349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xwlf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 424w, https://substackcdn.com/image/fetch/$s_!Xwlf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 848w, https://substackcdn.com/image/fetch/$s_!Xwlf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 1272w, https://substackcdn.com/image/fetch/$s_!Xwlf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc676be9-e09e-4f9d-991b-615876cf14d9_2606x1594.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">That much describes the algorithm. The interesting part is what happens to the data structure once it is built.</p><p style="text-align: justify;">Each partition is stored as a self-contained set of key-value rows inside CockroachDB. Partition data lives in CockroachDB ranges, which are the same units of storage that hold every other table in the database. Therefore, the index is not a parallel structure attached to the database. It is table data with extra meaning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jEGM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jEGM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 424w, https://substackcdn.com/image/fetch/$s_!jEGM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 848w, https://substackcdn.com/image/fetch/$s_!jEGM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 1272w, https://substackcdn.com/image/fetch/$s_!jEGM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jEGM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png" width="1456" height="1110" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1110,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:227290,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jEGM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 424w, https://substackcdn.com/image/fetch/$s_!jEGM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 848w, https://substackcdn.com/image/fetch/$s_!jEGM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 1272w, https://substackcdn.com/image/fetch/$s_!jEGM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221a2032-c5c8-4fa9-88dc-d6c57c7399e1_2382x1816.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">This decision pays dividends. CockroachDB already knows how to split a range when it grows too large, how to rebalance ranges across nodes when load shifts, and how to cache frequently accessed rows in its block cache.</p><p style="text-align: justify;">All of this setup applies to vector index data automatically, without writing a single line of new infrastructure code. When a new node joins the cluster, ranges containing index partitions get distributed to it the same way ranges containing user tables do. When a node restarts, the index is immediately ready to serve queries because it lives on disk, rather than in some warm-up cache that has to be rebuilt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sqIR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sqIR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 424w, https://substackcdn.com/image/fetch/$s_!sqIR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 848w, https://substackcdn.com/image/fetch/$s_!sqIR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 1272w, https://substackcdn.com/image/fetch/$s_!sqIR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sqIR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png" width="1456" height="674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:674,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:102332,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sqIR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 424w, https://substackcdn.com/image/fetch/$s_!sqIR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 848w, https://substackcdn.com/image/fetch/$s_!sqIR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 1272w, https://substackcdn.com/image/fetch/$s_!sqIR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53dfd128-605c-4837-a5b0-19c289b883d8_2732x1264.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">However, building the index is one thing. Keeping it healthy as data flows in and out is harder, especially when the index needs to be compressed aggressively to stay affordable.</p><h2>Index Maintenance, Quantization, and Multi-Tenant Partitioning</h2><p style="text-align: justify;">A K-means tree is not static. Partitions grow as vectors are inserted and shrink as vectors are deleted, so the system needs background machinery to keep partitions at a reasonable size and to keep vectors grouped with their nearest centroids.</p><p style="text-align: justify;">When a partition grows too large, C-SPANN splits it. A balanced variant of the K-means algorithm divides the vectors into two roughly equal groups, each with its own new centroid, and the tree is updated so that future inserts route to whichever new partition is closer. When a partition shrinks too small, the system merges it away and reassigns its vectors to neighboring partitions. Both operations happen in the background to avoid interfering with foreground transactions.</p><p style="text-align: justify;">There is one point worth noting.</p><p style="text-align: justify;">After a split, some vectors in the original partition might actually be closer to a neighboring partition&#8217;s centroid than to either of the two new centroids, and they get reassigned. Likewise, a vector in a neighboring partition might now be closer to one of the new centroids and migrate in. This idea, called nearest partition assignment, comes from the SPFresh paper and is what keeps the index accurate over time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IXN9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IXN9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 424w, https://substackcdn.com/image/fetch/$s_!IXN9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 848w, https://substackcdn.com/image/fetch/$s_!IXN9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!IXN9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IXN9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png" width="1456" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184162,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198675655?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IXN9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 424w, https://substackcdn.com/image/fetch/$s_!IXN9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 848w, https://substackcdn.com/image/fetch/$s_!IXN9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!IXN9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe62192a-18ff-4780-b7a0-c571b7d62e03_2546x1380.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: justify;">The consequence is that you can start with an empty table, insert millions of vectors, and end up with an accurate, well-balanced index without ever rebuilding it. The maintenance setup handles everything incrementally.</p><p style="text-align: justify;">The other operational factor is size. An OpenAI embedding has 1,536 dimensions stored as 2-byte floats, which works out to about 3 KB per vector. A billion vectors at full precision is 3 TB just for the embeddings, before any indexing overhead is counted. Storing and scanning that much data is expensive both in disk space and in the CPU and memory used during search.</p><p style="text-align: justify;">C-SPANN compresses vectors using a technique called RaBitQ, which reduces each dimension to a single bit. The compressed representation is roughly 200 bytes per vector, a 94 percent size reduction. The math behind the compression involves a random orthogonal transform that preserves distances while spreading data evenly across dimensions</p><p style="text-align: justify;">What matters for the system is that quantization is lossy, so distance estimates from compressed vectors are only approximate. C-SPANN compensates with a reranking step, where the system scans quantized vectors to assemble a candidate set, then fetches the original full-precision vectors for those candidates to compute exact distances. By fetching candidates, the system can absorb quantization error and still return accurate results. The pattern of cheap approximate filtering followed by precise refinement on a small candidate set shows up in many other systems too, and recognizing it here makes it easier to spot elsewhere.</p><p style="text-align: justify;">The third operational reality is multi-tenancy. In most real applications, vectors belong to someone, whether a user, a customer, or an organization, and most queries are scoped to a single owner. Mixing one user&#8217;s vectors with another&#8217;s during search is wasteful, and it is also a security problem.</p><p style="text-align: justify;">CockroachDB handles this through prefix columns on the vector index. Here is what the schema looks like.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bf631e44-30a9-48b3-93cf-c71703f7491c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">CREATE TABLE photos (
  id UUID PRIMARY KEY,
  user_id UUID,
  embedding VECTOR(1536),
  VECTOR INDEX (user_id, embedding)
);</code></pre></div><p>A query for one user&#8217;s nearest matches uses pgvector-compatible syntax.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;dockerfile&quot;,&quot;nodeId&quot;:&quot;ea4ff9ee-c7fc-48c5-adbe-360ca87e232a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-dockerfile">SELECT id
FROM photos
WHERE user_id = $1
ORDER BY embedding &lt;-&gt; $2
LIMIT 10;</code></pre></div><p style="text-align: justify;">Behind the scenes, the index maintains a separate K-means tree for each distinct user. Performance scales with how many vectors the user owns, rather than with the total size of the index, so a billion vectors split across a million users behaves, from any one user&#8217;s perspective, like a million-vector index.</p><p style="text-align: justify;">Combined with CockroachDB&#8217;s REGIONAL BY ROW tables, prefix columns can also partition the index by geography. For example, a user in Europe gets their data and their index entries stored in a European region, with fast local access and compliance with data domiciling requirements, while the same table serves a US user with equally low latency from US infrastructure. The combination of region, ownership, and embedding as prefix columns produces an index that is efficient, secure, and locality-aware by default.</p><h2>Conclusion</h2><p style="text-align: justify;">C-SPANN refused several compromises that most vector databases quietly accept.</p><p style="text-align: justify;">Freshness in CockroachDB is real-time and transactional rather than batched or eventually consistent, which means a vector inserted in a transaction becomes searchable as soon as that transaction commits, with the same consistency guarantees as any other write. Scaling is native to the distributed architecture rather than a feature retrofitted onto a single-node system, and vectors live alongside transactional data in the same database, inside the same queries, under the same operational umbrella. Since the index lives on disk, nodes serve queries immediately after a restart without any warm-up phase.</p><p style="text-align: justify;">In return, the team accepted some real limitations. The 25.2 release is a preview, and several optimizations are still being built, including fuller SIMD usage, root partition caching, and complete merge support. The current implementation supports only Euclidean distance, with cosine and inner product on the roadmap. Filtering on non-prefix columns is limited today, though that scope is expanding. Also, on raw vector search benchmarks against specialized in-memory systems, C-SPANN does not win on pure latency.</p><p style="text-align: justify;">The tradeoff suggests where this design fits and where it does not. CockroachDB&#8217;s vector index is a strong choice for applications where vectors and transactional data need to coexist, where multi-tenant isolation matters, and where multi-region deployment with data domiciling is a requirement. Specialized vector databases remain a better fit for pure vector workloads with no transactional component, for read-heavy batch-updated datasets where freshness is not a concern, and for cases where every microsecond of search latency is critical.</p><p style="text-align: justify;">The architectural logic underneath all of this is worth keeping in mind. The CockroachDB team treated the vector index as ordinary table data and inherited their existing distributed machinery for free, so splits, caching, sharding, replication, and multi-region behavior all worked from day one because they already worked for everything else in the database. The algorithm is the part that gets the headlines, but the integration is what makes the system possible.</p><p style="text-align: justify;"><strong>References:</strong></p><ul><li><p><a href="https://www.cockroachlabs.com/blog/cspann-real-time-indexing-billions-vectors/">Real-Time Indexing for Billions of Vectors: How we built fast, fresh vector indexing at scale in CockroachDB</a></p></li><li><p><a href="https://en.wikipedia.org/wiki/Vector_database">Vector databases</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[EP216: RAGs vs Agents]]></title><description><![CDATA[Ask an LLM about your company's data and it will guess. The two patterns that fix this are RAG and agents, and they solve different problems.]]></description><link>https://blog.bytebytego.com/p/ep216-rags-vs-agents</link><guid isPermaLink="false">https://blog.bytebytego.com/p/ep216-rags-vs-agents</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Sat, 23 May 2026 15:31:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eLEi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><a href="https://go.bytebytego.com/QAWolf_052326Headline">Map workflows, automate E2E tests, and ship faster with QA Wolf (Sponsored)</a></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/QAWolf_052326CTA" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jXWJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!jXWJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!jXWJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!jXWJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jXWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png" width="1200" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6039af28-3c33-437b-8459-b826572b02cf_1200x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90284,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/QAWolf_052326CTA&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198874402?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jXWJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 424w, https://substackcdn.com/image/fetch/$s_!jXWJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 848w, https://substackcdn.com/image/fetch/$s_!jXWJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 1272w, https://substackcdn.com/image/fetch/$s_!jXWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6039af28-3c33-437b-8459-b826572b02cf_1200x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://go.bytebytego.com/QAWolf_052326QAWolf">QA Wolf&#8217;s</a> AI agent maps and tests your app&#8217;s most complex user flows. It turns your prompts into real Playwright and Appium code that runs 12x faster and more reliably than other computer-use agents.</p><p>What sets our AI apart:</p><ul><li><p>Maps <strong>200+ test cases in minutes</strong> instead of weeks of manual planning.</p></li><li><p>Executes tests <strong>12x faster</strong> than computer-use agents.</p></li><li><p>Runs entire suites <strong>100% parallel</strong> with consistent results.</p></li><li><p>Produces open-source tests your team owns, with <strong>zero vendor lock-in</strong>.</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/QAWolf_052326CTA&quot;,&quot;text&quot;:&quot;Get started today&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/QAWolf_052326CTA"><span>Get started today</span></a></p><div><hr></div><p>This week&#8217;s system design refresher:</p><ul><li><p>RAGs vs Agents</p></li><li><p>Build with Claude Code: New Cohort Launch</p></li><li><p>Forward Proxy, Reverse Proxy, and API Gateway Explained</p></li><li><p>How does a request actually travel through Claude Code?</p></li><li><p>How does Claude Code keep long sessions from running out of context?</p></li></ul><div><hr></div><h2>RAGs vs Agents</h2><p>Ask an LLM about your company's data and it will guess. The two patterns that fix this are RAG and agents, and they solve different problems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eLEi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eLEi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!eLEi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!eLEi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!eLEi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eLEi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png" width="1456" height="1760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!eLEi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!eLEi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!eLEi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!eLEi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52ccd4c2-dc53-4b9a-95b9-d87d6c353f88_2484x3002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>RAGs: RAGs combine LLMs with retrieval to ground answers in 4 steps.</p><ul><li><p>Step 1: The user query is embedded and sent to a retrieval step. </p></li><li><p>Step 2: Retrieval pulls the most relevant chunks from a knowledge base (PDFs, wikis, etc.)</p></li><li><p>Step 3: Those chunks are pasted into the prompt as context. </p></li><li><p>Step 4: The LLM writes the answer, grounded in the retrieved text.</p></li></ul><p>One retrieval. One generation. Cheap, predictable, and easy to debug.</p><p>Agents: Agents wrap LLMs in a reasoning loop with tools to take action.</p><ul><li><p>Step 1: The user query goes into the agent runtime. A reasoning loop wrapped around an LLM. </p></li><li><p>Step 2: The LLM reads the goal and picks a tool (Read, Write, Edit, Bash, etc.)</p></li><li><p>Step 3: The runtime executes the tool and feeds the result back to the LLM. </p></li><li><p>Step 4: The LLM reasons again, picks the next tool, and loops until the task is done.</p></li></ul><p>More flexible. More tokens. Harder to debug because errors drift across steps.</p><p>The rule of thumb: Use RAG when the answer lives in your documents. Use an agent when the answer requires action on other systems.</p><p>Over to you: When do you prefer RAG over agent?</p><div><hr></div><h2>Build with Claude Code: New Cohort Launch</h2><p>We&#8217;re launching a new 2 day intensive, cohort based course called Build with Claude Code, taught by John Kim, who has trained hundreds of engineers at Meta to use Claude Code in real production workflows.</p><p><strong>The course starts soon on May 28.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c1-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/claude-c1-substack"><span>Check it out now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/claude-c1-substack" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Ym7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 424w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 848w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png" width="1456" height="1801" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1801,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:854119,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/claude-c1-substack&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198752484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0Ym7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 424w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 848w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>A few things you&#8217;ll learn:</p><ul><li><p>The agentic loop, context engineering, and memory layers that make Claude Code useful for real projects</p></li><li><p>How to build with Claude Code Skills, MCPs, and hooks to give Claude the tools and feedback loops it needs to self correct</p></li><li><p>Parallel development with Git worktrees, subagents, and agent teams</p></li><li><p>A capstone project where you ship something real on your own stack</p></li></ul><p>The course includes live sessions, assignments, and office hours, so there&#8217;s plenty of room to ask questions and get unstuck.</p><p>The first cohort starts in just a few days: May 28 to 29, 2026. If you want to learn everything from the fundamentals of Claude Code to advanced production workflows, including working with large codebases, this could be a great way to level up.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c1-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/claude-c1-substack"><span>Check it out now</span></a></p><div><hr></div><h2>Forward Proxy, Reverse Proxy, and API Gateway Explained</h2><p>People mix these up all the time, since they all sit between a client and a server. The real difference is which side they represent and what problem they solve.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cqW0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cqW0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!cqW0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!cqW0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!cqW0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cqW0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png" width="1456" height="1760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!cqW0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 424w, https://substackcdn.com/image/fetch/$s_!cqW0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 848w, https://substackcdn.com/image/fetch/$s_!cqW0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 1272w, https://substackcdn.com/image/fetch/$s_!cqW0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e0de719-d0ac-497e-8f9c-b9c841422db8_2484x3002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A forward proxy sits next to the client. Your laptop sends a request, the proxy forwards it out, and the destination never sees your real IP. Corporate networks use this to enforce policy, block sites, and cache traffic.</p><p>A reverse proxy sits next to the server. The client has no idea how many machines are behind it. The proxy decides who handles the request, terminates TLS, and keeps your backend off the public internet. NGINX and HAProxy are commonly used here, typically paired with a load balancer in front.</p><p>An API gateway is a reverse proxy that does more than route traffic. It also handles auth, rate limits, API keys, versioning, and request shaping. Without it, each microservice has to implement its own version of validation, throttling logic, and request logging.</p><p>A forward proxy represents the client, a reverse proxy represents the server, and an API gateway is what you add when ten services need the same authentication and rate limiting rules applied consistently.</p><p>In most real systems, all three are running at different layers. The forward proxy filters outbound traffic, the reverse proxy fronts the application servers, and the API gateway sits in front of your APIs to enforce policies before requests reach them.</p><p>Over to you: What's your proxy + gateway combo? Always interesting to see what teams pair together.</p><div><hr></div><h2>How does a request actually travel through Claude Code?</h2><p>Most of us type a prompt and watch the magic happen. The diagram below shows what's really going on behind the curtain, based on the Claude Code source code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VfjL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VfjL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VfjL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VfjL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VfjL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VfjL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg" width="1280" height="1546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1546,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;graphical user interface, application&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="graphical user interface, application" title="graphical user interface, application" srcset="https://substackcdn.com/image/fetch/$s_!VfjL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VfjL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VfjL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VfjL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38f19060-e838-440a-971d-bdc1f2d3a167_1280x1546.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let's trace one real request: "Fix the failing test in auth.test.ts."</p><ul><li><p>Step 1: The user sends a prompt to Claude Code through their interface.</p></li><li><p>Step 2: The interface (CLI, IDE, or SDK) wraps the prompt with repo and file context and hands it to the agent loop as a request.</p></li><li><p>Step 3: The agent loop plans the next move and proposes an action: Edit(auth.ts, lines 42&#8211;58).</p></li><li><p>Step 4: The permission system checks the proposed action against the rules.</p></li><li><p>Step 5: The approved action becomes a tool call: Edit(auth.ts, patch), dispatched to the matching tool.</p></li><li><p>Step 6: The tool runs in the execution environment (shell, cloud, or sandbox) as a real syscall.</p></li><li><p>Step 7: The execution returns a tool result back to the agent loop.</p></li><li><p>Step 8: The agent persists the turn to state and streams the final message to the user.</p></li></ul><p>The whole system is just this loop, repeated until the model stops asking for tools.</p><p>Over to you: which step in this loop do you think is the hardest one to get right when building your own coding agent?</p><div><hr></div><h2>How does Claude Code keep long sessions from running out of context?</h2><p>It uses 5 strategies, run in sequence before every model call. Each one only runs if the previous doesn&#8217;t free enough room.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fap_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fap_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 424w, https://substackcdn.com/image/fetch/$s_!fap_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 848w, https://substackcdn.com/image/fetch/$s_!fap_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!fap_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fap_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png" width="1378" height="1390" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1390,&quot;width&quot;:1378,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fap_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 424w, https://substackcdn.com/image/fetch/$s_!fap_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 848w, https://substackcdn.com/image/fetch/$s_!fap_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!fap_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc111a2c6-71bd-45a7-8938-e8bb4f772cbc_1378x1390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>Budget Reduction: caps individual tool results. Oversized outputs are swapped for a content reference.</p></li><li><p>Snip: trims the oldest history segments and emits a boundary marker.</p></li><li><p>Microcompact: prunes tool turns by tool_use_id so the prompt cache stays warm.</p></li><li><p>Context Collapse: a read-time projection over the full history.</p></li><li><p>Auto-compact: the last resort. It calls the model to produce a full summary of prior turns.</p></li></ol><p>The pattern is lazy degradation: apply the least disruptive shaper first, escalate only when cheaper layers prove insufficient.</p><p>Over to you: how often do you run out of context?</p><p></p>]]></content:encoded></item><item><title><![CDATA[Build with Claude Code: New Cohort Launch]]></title><description><![CDATA[The first cohort starts in about a week: May 28-29, 2026.]]></description><link>https://blog.bytebytego.com/p/build-with-claude-code-new-cohort</link><guid isPermaLink="false">https://blog.bytebytego.com/p/build-with-claude-code-new-cohort</guid><dc:creator><![CDATA[ByteByteGo]]></dc:creator><pubDate>Fri, 22 May 2026 15:31:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0Ym7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;re launching a new 2 day intensive, cohort based course called Build with Claude Code, taught by John Kim, who has trained hundreds of engineers at Meta to use Claude Code in real production workflows.</p><p><strong>The course starts soon on May 28.</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c1-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://go.bytebytego.com/claude-c1-substack"><span>Check it out now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://go.bytebytego.com/claude-c1-substack" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Ym7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 424w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 848w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png" width="1456" height="1801" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1801,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:854119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://go.bytebytego.com/claude-c1-substack&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.bytebytego.com/i/198752484?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Ym7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 424w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 848w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ym7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60dca5aa-afd4-4f29-9233-3b5d82c84154_2360x2920.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>A few things you&#8217;ll learn:</p><ul><li><p>The agentic loop, context engineering, and memory layers that make Claude Code useful for real projects</p></li><li><p>How to build with Claude Code Skills, MCPs, and hooks to give Claude the tools and feedback loops it needs to self correct</p></li><li><p>Parallel development with Git worktrees, subagents, and agent teams</p></li><li><p>A capstone project where you ship something real on your own stack</p></li></ul><p>The course includes live sessions, assignments, and office hours, so there&#8217;s plenty of room to ask questions and get unstuck.</p><p>The first cohort starts in just a few days: May 28 to 29, 2026. If you want to learn everything from the fundamentals of Claude Code to advanced production workflows, including working with large codebases, this could be a great way to level up.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://go.bytebytego.com/claude-c1-substack&quot;,&quot;text&quot;:&quot;Check it out now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://go.bytebytego.com/claude-c1-substack"><span>Check it out now</span></a></p>]]></content:encoded></item></channel></rss>