The current generative AI and large model technologies are rapidly becoming one of the most influential technological transformations in history, spreading with an unprecedented speed and scale.
From using large language models to assist in EDA design to building digital twins of semiconductor fabs, AI’s empowerment of the semiconductor industry is also driving it to a new turning point.
On one hand, the physical and cost limits have declared the end of traditional chip evolution methods represented by Moore’s Law and semiconductor scaling technologies, leading to the rise of alternatives such as chiplets and advanced packaging. On the other hand, AI large models are iterating rapidly, and in many scenarios, the demand for chip computing power and complexity driven by AI workloads is continuously increasing.
What kind of transformation is the chip industry undergoing due to the arrival of the AI era? As a leader in the chip industry, companies like Arm, which are at the top of the supply chain, often have accurate forward-looking insights into the industry. Recently, Arm released an industry report that systematically shared insights and thoughts on the development of the chip industry in the AI era. Kevork Kechichian, Executive Vice President of Arm’s Solution Engineering Division, shared and discussed the report with media including Jiwei Network.
In Arm’s view, widespread ecosystem cooperation, overall system-level optimization, standardization strategies such as interfaces, modular and customizable design directions, and flexible and powerful security frameworks will be the keys to success in AI-era chip design.
01
Energy Efficiency Has Become a Primary Concern
In recent years, the global AI computing power race has become one of the most prominent aspects of the AI era. The scale and complexity of computational workloads are continuously increasing, from training massive models to performing multi-step inferences, all of which come with substantial electricity and energy consumption.
At the same time, substantial investments are being made. According to estimates by the LessWrong website regarding the GPU/TPU numbers of major AI giants, by 2025, the equivalent H100 GPUs owned by Microsoft, Google, Meta, Amazon, and xAI will exceed 12.4 million units, worth several hundred billion dollars.
According to data from Boston Consulting Group, it is estimated that by 2030, U.S. data centers will consume 7.5% of the total U.S. electricity, about 390 billion kWh, equivalent to the annual electricity consumption of around 40 million U.S. households, nearly a third of all U.S. households.
Clearly, the “brute-force” development path of stacking computing power through thousands of devices and training models over months is not economically sustainable. Similarly, in smaller terminals where space and power consumption are more limited, the need for energy efficiency is equally pressing. This has led to an urgent demand for smarter, higher energy-efficient chip solutions.
Therefore, energy efficiency and power management have become the top concerns driving AI computing and chip design, involving three main factors:
- Computation: AI relies on massive multiplication and accumulation operations, and chips typically need to integrate highly efficient computing architectures.
- Data Transmission: In most cases, the output of computations needs to be further processed in other components of the chip, so optimizing data transmission and communication flows between components is necessary.
- Cooling: High-performance chips, such as those with integrated computing and memory units, often use packaging methods that minimize latency and power loss, but they also bring challenges in heat dissipation, requiring efficient cooling solutions.
In summary, chip design is increasingly integrating optimized memory hierarchies, system design communication mechanisms, and other solutions to reduce data transmission, while using technologies like chip stacking, HBM, and advanced packaging to minimize energy consumption during data transfer, supported by mature power management technologies to reduce energy consumption and maintain high energy efficiency.
Reducing energy consumption in chip design is becoming a systematic project. When discussing how to reduce energy consumption while balancing computing power and energy efficiency, Kevork Kechichian provided his insights:
First, starting from the lowest level, such as the transistor layer, collaborating closely with foundries to ensure optimization in power consumption and performance, whether it’s dynamic power or leakage power; second, at the architecture level, optimizing the instruction sets for CPUs and various processing engines; third, optimizing from the system-on-chip (SoC) design, packaging, to data centers. The key here is protecting data and its transmission process, reducing power consumption during data transfer between memory units; finally, at the software layer supporting large data centers, implementing intelligent load balancing, optimizing processing for different aspects of AI, and rationally distributing workloads to minimize data transfer between nodes.
02
Standardization Construction is Imperative
The end of traditional scaling technologies has made advanced packaging technologies one of the key directions for chip evolution in the post-Moore’s Law era, thus driving the development of Chiplet technologies. This technology enhances performance and energy efficiency by stacking and interconnecting multiple semiconductor dies.
Chiplets bring new ideas and advantages to chip design. For example, in certain scenarios, chip manufacturers do not need to redesign products but can simply add more chiplets to increase computing power and performance or even upgrade existing chiplets to optimize different components for specific functions. At the same time, they allow for more flexible, differentiated design options, reduce costs, accelerate R&D, and bring products to market faster. Moreover, it also helps improve yield and realize higher reuse potential across different products.
However, these advantages come with challenges, and energy consumption is a major issue. For example, when SoC components are distributed across multiple dies, power supply becomes more complicated. Although 3D stacking increases power density, it also brings challenges in power supply and thermal management. Furthermore, interfaces between chiplets have raised concerns about latency control, power management, and energy efficiency optimization.
Kevork Kechichian pointed out that addressing these challenges requires close industry collaboration to create new cooperation agreements and promote the reuse of results to generate more commercial value for enterprises. In this process, standardization is critical. As a leading company in the development of chiplet technologies, Arm has been cooperating within the entire technology ecosystem, using universal frameworks and industry standards to accelerate the development of the chiplet market.
“The true value of advanced packaging and chiplet technologies lies in achieving true standardization of design and interfaces, covering everything from integration in packaging plants to communication between different chiplets in the system. Therefore, it is crucial to reach a consensus with partners on standardization issues. Through standardization, companies can quickly combine and configure chiplets based on different performance needs, creating chips with different performance characteristics and gaining a competitive advantage in the rapidly evolving market,” said Kevork Kechichian.
Against this backdrop, Arm launched the Chiplet System Architecture (CSA), which aims to standardize aspects like communication methods between chiplets and across the entire system. Additionally, Arm is working with partners to promote the implementation of initiatives like the AMBA CHI chip-to-chip interconnect protocol to ensure interoperability between chiplets from different suppliers through a unified interface protocol.
“In the past, standardization was often seen as abandoning one’s own IP or competitive advantage. But now, given the high complexity of systems and the evolution of cooperation models, standardization has become even more important—every participant will benefit in multiple ways,” emphasized Kevork Kechichian.
03
Significant Advantages of Customization in the AI Chip Industry
The development of chiplet technologies has paved the way for the rise of custom chips. Today, custom chips are showing strong market demand.
To achieve more efficient AI computing and align more closely with their own business needs, many semiconductor industry players are exploring and investing in custom chips, especially the four largest global cloud service providers, who account for nearly half of global cloud server procurement spending in 2024.
For example, AWS Graviton4 is a custom chip solution built on Arm technology, designed specifically to accelerate data center and AI workloads, achieving significant improvements in performance and energy efficiency. In 2023, Microsoft released its first custom chip for cloud services, Microsoft Azure Cobalt, based on the Arm Neoverse Computing Subsystem (CSS), designed to address the challenges of complex computing infrastructure. Recently, Google Cloud also released its Axion custom chip, based on the Arm Neoverse platform, specifically designed for complex server workloads in data centers.
With verified core computing functions and flexible memory and I/O interface configurations, Arm Neoverse CSS accelerates time-to-market, offering significant advantages. It ensures software consistency while providing SoC designers flexibility to add custom subsystems around the CSS to create differentiated solutions.
In addition to large-scale cloud service providers, many small and medium-sized enterprises are actively developing proprietary custom chip solutions to meet increasingly complex computing needs. With the support of Arm technology and Intel Foundry Services (IFS), chip design technology provider ZhiYuan Technology is developing a 64-core custom SoC for data centers and advanced 5G networks. Furthermore, AI chip company Rebellions in South Korea has announced the creation of a new large-scale AI chip platform to improve energy efficiency for AI workloads.
Regarding how to balance the relationship between personalization and generality in custom chips, as well as the issue of high development costs, Kevork Kechichian noted that the key is to ensure high reusability between chips and software. The underlying platform must have a certain level of generality to ensure a degree of reusability across different custom chips, thereby effectively addressing the challenges of cost and time-to-market.
Regarding development costs, Kevork Kechichian stated that it involves both R&D personnel and a large amount of computational resources. To this end, Arm has explored several methods to effectively reduce development investment and significantly shorten partners’ product time-to-market.
“The most basic method is to approach from the platform perspective, identifying reusable modules and resources, ensuring that customization is built on existing foundations rather than starting from scratch. We need to fully assess existing resources and build customized products based on this. It is based on this approach that Arm works closely with SoC and various IP providers to deliver solutions to our partners,” said Kevork Kechichian.
04
Arm: Driving AI Innovation
As mentioned above, whether it’s energy efficiency, advanced packaging, or the trend towards customization, the complexity of modern chip design increasingly requires a systems thinking approach, with a closer cooperative relationship between IP providers, wafer foundries, packaging plants, and system integrators. This gives Arm an advantage, as it is at the foundation layer of the industry ecosystem.
Moreover, with its strengths in technological accumulation and innovation, Arm’s position in the AI era industry is becoming increasingly prominent.
On one hand, with the rise of AI, especially the widespread application of generative AI and large language models, the demand for dedicated AI accelerators is growing more urgent. Taking the data center field as an example, the workload has extremely strict computing power requirements, which can only be met by dedicated hardware for efficient operation.
On the other hand, in the face of the fundamental need for powerful main processors to support these new workloads, whether the accelerators are GPUs, Google’s TPUs, Microsoft’s Maia, or AWS Tranium and Inferentia custom accelerators, they all need excellent main processors to unleash the computing potential of AI accelerators.
Today, processor architecture has become a key factor in determining the energy efficiency and performance of AI systems. With its innovation, customization, and high energy efficiency, Arm has become a key force in this field. Specifically, Arm’s computing platform’s flexibility is reflected in three aspects that will effectively support AI innovation.
- Heterogeneous Computing: Arm-based CPUs are becoming ideal companion processors for AI accelerators like GPUs and TPUs, efficiently managing data flows and general computing tasks while addressing bottlenecks and supporting different types of workloads. All of these processors can serve as AI inference engines deployed in SoCs developed by Arm’s partners.
- Inference Efficiency: While training large AI models typically relies on high-performance GPUs, Arm’s high-energy-efficient processors are well-suited for inference tasks at the edge and in data centers.
- Scalability: The Arm architecture supports seamless integration of CPUs, GPUs, and dedicated accelerators, which is crucial for building optimized AI systems and helps make hardware and software development easier for Arm’s partners.
Arm’s solutions focus on three key areas of modern AI computing:
- Continuous Innovation: Arm regularly releases new CPU architectures and supporting features, focusing on promoting custom chip development that can meet the evolving demands of AI workloads.
- Customization Potential: As AI models grow in complexity and scale, Arm’s architecture flexibility allows it to create dedicated solutions for specific AI tasks.
- Outstanding Energy Efficiency: The high energy efficiency of Arm-based processors makes them increasingly valuable in managing the total cost of ownership (TCO) for large-scale AI deployments.
05
Opportunities: From Data Centers to Edge AI
As Arm plays an important role in the chip design process, the arrival of the AI era offers more opportunities for the company.
Today, AI PCs, AI smartphones, and other AI terminal devices are emerging rapidly, and with the continuous improvement in mobile device computing power, edge AI processing is becoming more common. This is mainly due to chips designed for power-constrained environments, capable of running various AI workloads on edge devices like mobile phones. Low latency, privacy, and cost—edge AI’s advantages are crucial for achieving faster AI experiences.
At the same time, with the emergence of efficient AI models like DeepSeek, AI is increasingly moving toward the edge. For instance, Arm’s optimized cooperation with Meta enabled the Meta Llama 3.2 large language model to run quickly on Arm-based mobile devices, increasing prompt processing speed by five times, token generation speed by three times, and achieving 19.92 tokens per second during the generation phase. This directly reduces latency in processing AI workloads on devices, greatly enhancing the overall user experience. Additionally, the more AI workloads edge devices can handle, the more energy is saved from avoiding round-trip cloud data transmissions, thereby saving energy and costs.
Furthermore, with its innovation, customization, and high energy efficiency, Arm has become a key force in data center architecture. The continuous evolution of workloads, rapid technological innovation, and the growing AI demand have driven Arm’s architecture to continue playing a pivotal role in data centers, simultaneously leading global hyperscale cloud service providers like Microsoft, AWS, and Google to increasingly rely on custom chip solutions based on Arm’s architecture. Although x86 processors will continue to play an important role, the momentum toward Arm architecture solutions is accelerating.
Arm’s Senior Vice President and General Manager of Infrastructure, Mohamed Awad, previously stated that it is expected that by 2025, nearly 50% of the computing power shipped to top hyperscale cloud service providers will be based on Arm architecture. By then, Arm’s journey from mobile devices to the core of data centers will reach a significant milestone, heralding the arrival of a new era—an era driven by diversified, high-energy-efficient, and highly customized computing solutions to meet the ever-evolving needs of the digital age.
Disclaimer:
- This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
- This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
- Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.