Discussion about this post

User's avatar
Yuzu Xu's avatar

The TPU 8t versus 8i split illustrates something Huawei cannot do yet: bifurcate silicon for training versus inference. Ascend 910C handles both because wafer capacity constraints make two specialized product lines impractical. The result is that Chinese AI inference runs on throughput-optimized hardware where latency suffers. Not a design choice. A constraint that the throughput versus latency versus bandwidth framework describes at the silicon layer.

Clint Cain's avatar

4. Open-Weight Everywhere < yes, I think most teams and org will switch to using more open and local models. Mostly cause of PII and just security and confidence anyways.

No posts

Ready for more?