Great read. This does a fantastic job explaining the hardware side of the AI revolution, especially why LLMs are fundamentally a hardware problem (data movement and linear algebra), not just a software one.
The CPU vs GPU contrast, memory wall discussion, and breakdown of Tensor Cores and HBM make it clear why this wave of progress was only possible now, and why this moment is special from a technology revolution standpoint. Same for TPUs and systolic arrays: extreme specialization, massive efficiency gains.
As humans, software is where we experience AI, but silicon is where the revolution is actually happening!
Great read, thanks for the writeup! Got a question on "While a standard core completes one floating-point operation per cycle, a Tensor Core executes a 4×4 matrix multiplication involving 64 individual operations (16 multiplies and 16 additions in the multiply step, plus 16 accumulations) instantly." Naively I'd expect 5 individual operations (4 multiplies and 1 addition) for each element in the result matrix, which sums up to 5*16 = 80. Is there any optimization steps that I'm missing?
A bit of history: 1st: ‘computers’ were people trained to solve complex problems (see movie “hidden figures”) 2nd: computers used to be big and expensive. They needed specially trained people to operate them. Then came mini-computers which took up less space, but had limited function and were used to control machines (look up Naked Mini). Then Intel decided rather than use a complex custom circuit to use a programmable system to, if my memory server, run a Mainframe disk drive. A couple of hobbyists wrote an article for Popular Electronics using that processor and a bunch of 100 pin connectors (because they found those on sale for the 50 or so kits they thought they would sell) They got a thousand orders within a week and the hobby computer craze started. A couple of kids in California though they could sell pre-built hobby computers for those who didn’t want to solder their own and allowed others to write programs for it (you may have heard of that company…Apple).
Up until this point if you wanted more than one person to have access to a computer (or wait in line for hours to access the mainframe, you had to connect using a simple keyboard and monitor setup.
While we now can have in our hands computers which are far more capable than the old mainframes there are somethings that your desktop can’t provide: reliability and scale. A computer center (whether its company owned, shared or a cloud center) has is redundant communications links, 24/7 power with regularly tested backups and the reliable hot swappable components. On a mainframe you can replace processors, memory , mass storage without shutting anything down. On a smaller scale, what looks like a tower computer, if its a server will have redundant power supplies, error correcting memory and mass storage which can support disk failure with replacement disk able to be swapped in and the system restored in the background. What happens when say the redundancy fails, major airlines have to stand down operations for several days. What happens if the bank’’s data center fails, all their ATMs stop issuing money.
This is one of the worst articles I have read from you. It jumped straight into interactions without first explaining the underlying concepts. Basic ideas like multiplication or matrix multiplication should have been introduced first. I read half of it, understood almost nothing, and discarded it. The writing is badly flawed.
Couldn't agree more. It's brilliant to frame AI as a physics problem, not just software. I even think about the massive parallelism during my Pilates practice! The precison and coordination needed for movements sometimes feels like a biological GPU. So insightful!
Great read. This does a fantastic job explaining the hardware side of the AI revolution, especially why LLMs are fundamentally a hardware problem (data movement and linear algebra), not just a software one.
The CPU vs GPU contrast, memory wall discussion, and breakdown of Tensor Cores and HBM make it clear why this wave of progress was only possible now, and why this moment is special from a technology revolution standpoint. Same for TPUs and systolic arrays: extreme specialization, massive efficiency gains.
As humans, software is where we experience AI, but silicon is where the revolution is actually happening!
Great read, thanks for the writeup! Got a question on "While a standard core completes one floating-point operation per cycle, a Tensor Core executes a 4×4 matrix multiplication involving 64 individual operations (16 multiplies and 16 additions in the multiply step, plus 16 accumulations) instantly." Naively I'd expect 5 individual operations (4 multiplies and 1 addition) for each element in the result matrix, which sums up to 5*16 = 80. Is there any optimization steps that I'm missing?
Great article
Thanks for a great summary , it kind of gives any one a great intuition of why we needed one over the other
The 2nd half of this article about TPU is identical to the previous Google TPU article.
A bit of history: 1st: ‘computers’ were people trained to solve complex problems (see movie “hidden figures”) 2nd: computers used to be big and expensive. They needed specially trained people to operate them. Then came mini-computers which took up less space, but had limited function and were used to control machines (look up Naked Mini). Then Intel decided rather than use a complex custom circuit to use a programmable system to, if my memory server, run a Mainframe disk drive. A couple of hobbyists wrote an article for Popular Electronics using that processor and a bunch of 100 pin connectors (because they found those on sale for the 50 or so kits they thought they would sell) They got a thousand orders within a week and the hobby computer craze started. A couple of kids in California though they could sell pre-built hobby computers for those who didn’t want to solder their own and allowed others to write programs for it (you may have heard of that company…Apple).
Up until this point if you wanted more than one person to have access to a computer (or wait in line for hours to access the mainframe, you had to connect using a simple keyboard and monitor setup.
While we now can have in our hands computers which are far more capable than the old mainframes there are somethings that your desktop can’t provide: reliability and scale. A computer center (whether its company owned, shared or a cloud center) has is redundant communications links, 24/7 power with regularly tested backups and the reliable hot swappable components. On a mainframe you can replace processors, memory , mass storage without shutting anything down. On a smaller scale, what looks like a tower computer, if its a server will have redundant power supplies, error correcting memory and mass storage which can support disk failure with replacement disk able to be swapped in and the system restored in the background. What happens when say the redundancy fails, major airlines have to stand down operations for several days. What happens if the bank’’s data center fails, all their ATMs stop issuing money.
This is one of the worst articles I have read from you. It jumped straight into interactions without first explaining the underlying concepts. Basic ideas like multiplication or matrix multiplication should have been introduced first. I read half of it, understood almost nothing, and discarded it. The writing is badly flawed.
Couldn't agree more. It's brilliant to frame AI as a physics problem, not just software. I even think about the massive parallelism during my Pilates practice! The precison and coordination needed for movements sometimes feels like a biological GPU. So insightful!
I really enjoyed this article — it offers great depth and delivers exactly what the title promises. Thank you for sharing!