Google TPU v7 Unleashed: Is NVIDIA Losing the Lead?

On April 9 local time, at the “Google Cloud Next 25” conference, Google officially launched its first AI inference-specialized TPU chip, codenamed “Ironwood,” also known as TPU v7. This chip is specifically designed for deep thinking models and represents a major leap in performance.

The TPU v7 delivers a peak FP8 performance of up to 4614 TFlops, a 3600-fold increase compared to the second-generation TPU from 2017, and a tenfold improvement over the fifth-generation TPU from 2023. It also excels in scalability, with top-tier clusters supporting up to 9216 liquid-cooled chips, achieving a peak performance of an astonishing 42.5 ExaFlops—capable of 42.5 quintillion operations per second—24 times that of the world’s most powerful supercomputer, EL Capitan.

AI is currently transitioning from responsive to proactive generation of insights and interpretations. Deep reasoning models represented by DeepSeek-R1 and Google Gemini Thinking often use MoE (Mixture of Experts) architectures. Although these architectures activate relatively few parameters, their total parameter counts are massive, requiring extensive parallel processing and efficient memory access—needs that a single chip cannot fulfill. TPU v7 is designed specifically to meet these demands, minimizing on-chip data movement and latency during large-scale tensor operations. Compared to the previous generation TPU v6, the TPU v7’s High Bandwidth Memory (HBM) capacity has been increased to 192GB—six times that of its predecessor—and its per-chip memory bandwidth has been raised to 7.2 TBps, 4.5 times higher. Additionally, the TPU v7 system features a low-latency, high-bandwidth ICI (Inter-Chip Interconnect) network, with bidirectional bandwidth reaching 1.2 Tbps, 1.5 times that of the previous generation, and offering double the performance per watt.

On top of the hardware upgrades, TPU v7 also features optimized software-hardware synergy. It is equipped with an enhanced version of SparseCore to handle massive embeddings commonly found in advanced ranking and recommendation workloads. It also supports Pathways, a machine learning runtime developed by Google DeepMind, enabling efficient distributed computation across multiple TPU chips.

Google plans to integrate TPU v7 into its Google Cloud AI supercomputing platform soon, supporting applications such as recommendation algorithms, the Gemini model, and AlphaFold. The announcement quickly stirred up discussions online, with many believing that NVIDIA could face significant competitive pressure. Based on parameter comparisons, TPU v7’s FP8 computing power of 4614 TFlops slightly exceeds NVIDIA B200’s stated 4.5 PFlops (4500 TFlops), while its memory bandwidth of 7.2 TBps is slightly below NVIDIA B200’s 8 TBps, putting them at a roughly comparable level. Besides Google, Amazon’s Trainium, Inferentia, and Graviton chips, as well as Microsoft’s MAIA 100 chip, are also pushing into the AI chip space, signaling increasingly fierce competition in the AI chip market.

Disclaimer:

This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.

Sign up for Newsletter

Leo

Leave a Reply

Product

Solution

Guides

Quote