Myriad Challenges Face Chipmakers Moving to AI
While suppliers have pivoted resources to produce more AI-focused processors, technical and production issues complicate transition.
The news by now has become an old refrain: artificial intelligence and machine learning are going to be the game-changer in the electronics industry just as PCs, the Internet, and mobile devices have been in decades past. Semiconductor suppliers continue scrambling to develop next-generation GPUs and other chips to power servers capable of handling generative AI workloads.
But the transition has not been smooth. Monetizing the investment in AI chips is taking time as suppliers are pouring hundreds of millions into new or updated plants and manufacturing lines. Achieving sufficient yield from advanced production processes for new AI chips is not happening overnight. Moreover, there are the ongoing issues of providing enough power to the power-hungry AI processors while achieving proper thermal management to prevent them from overheating.
Robust growth
The Futurum Intelligence report, which came out several months ago, estimates the total market for processor and accelerator chips in AI data centers was $38 billion in 2023 and will grow to $138 billion by 2028. These include general-purpose CPUs; GPUs, now the predominant chipset used in AI applications; XPUs; and public cloud AI accelerators.
According to Futurum, 2024 is expected to be a big year for the AI inferencing chip market, as chipmakers are focusing on products that support AI inferencing. The report noted that companies like Qualcomm, hyperscalers like AWS, Meta and AI chip startups are increasingly focused on developing chips that are more energy efficient for AI inferencing.
The report identifies the key vendors in this market as, not surprisingly, Nvidia, Intel, and AMD, with Nvidia comprising 75% of the market.
Tough transition for Intel
Perhaps no top-tier chipmaker has had a rougher transition to the AI age than Intel. While the company has for over a year proudly proclaimed its ambitions to capture the AI market, shifting its business strategy has been anything but smooth.
Hit hard by the post-pandemic slump in processors, Intel scrambled to pivot to AI and machine learning, rolling out several AI chips. But Intel has had to make massive investments in R&D and new plants that in many cases have not come online yet, triggering steep losses in recent quarters and forcing the company to cut 15% of its workforce earlier this year. In its just-ended October 2024 quarter, Intel posted as the company’s core Client Computing Group continues to see declining sales. However, Intel’s fledgling Data Center and AI Group saw sales rise 9%, perhaps an encouraging sign its AI-focused strategy is starting to pay dividends.
Still, Intel faces significant challenges. The company is spinning out its foundry business as an independent entity with the hope the unit can achieve greater operational efficiency, but whether the strategy works in the long run remains to be determined. The chipmaker is going all out to bring out new chips for AI applications. But whether it can make significant headway in a short period of time against Nvidia, as well as archrival AMD, remains to be seen.
For Nvidia, the future would appear to be bright, but is not without challenges, either. A recent report by Octopus Intelligence noted that there is double ordering occurring among two of Nvidia’s top 5 customers to meet their system needs, and that while the short-term outlook is robust for Nvidia the longer-term outlook is more uncertain as AI training needs are expected to level off and the future revenue trajectory is also likely to taper off in a few years. And, there is the omnipresent threat of increased competition from AMD, Intel, and particularly startups that may have the flexibility to respond to rapidly changing market demands, the report suggested.
Technical challenges
Regardless of vendor, implementing AI-grade chips in robust server applications requires attention to power and thermal management. And, of course there are the challenges scaling up production in advanced chip nodes.
Thermal management is a key issue because the powerful AI processors generate massive amounts of heat. Because traditional air cooling methods fall short, system designers are looking at liquid and immersion cooling techniques, but those also take up space and increasing system complexity.
Semiconductor vendors are also looking at alternative packaging technologies. For instance, Intel earlier this year demonstrated a fully integrated chiplet with optical I/O, which the company says can lead to more scalable AI. Called the Optical Compute Interconnect (OCI), the chiplet contains a photonic integrated circuit, an electrical IC for control, and a path to incorporate a detachable optical connector. OCI can be co-packaged with CPUs, GPUs IPUS, or system-on-a-chip with high bandwidth demand.
Chip vendors are also tackling thermal management earlier in the design cycle, using simulation and co-design tools that can analyze thermal behavior before physical testing, enabling them to spot potential problems earlier and help cut design time.
Power Issues
Getting sufficient power to the AI chips is another issue. Training and running deep neural networks involve extensive calculations, thus requiring vast amounts of power. Additional power is needed because AI applications require vast amounts of power between memory and compute units.
Scaling chip architectures to handle larger amounts of power is challenging, and problems such as battery life in edge devices and higher leakage currents can rear their ugly head.
One potential solution was recently unveiled by Empower Semiconductor, which develops integrated voltage regulators. The company announced a vertical delivery platform that allows scalable on-demand power for currents upward of 3,000 A. The ultrathin voltage regulator package fits under the processor and eliminates the need for bulky capacitor banks.
Ramping up production of GPUs and other chips for AI and machine learning also offers its challenges. AI chips require smaller process nodes, such as 3 and 5 nm rather than 7 nm, to achieve high performance and energy efficiency. But that requires chipmakers to scale up the learning curve to achieve repeatable, high yield manufacturing, which can be technically challenging and costly.
In addition, AI chip production requires advanced packaging technology, as well as advanced test methods able to verify chip performance.
Further down the semiconductor supply chain, there’s the omnipresent concern about supplies of raw materials such as cobalt, tungsten, and other rare earth materials. Consider that some of these elements are required for other electronics parts such as batteries, and that many elements are imported, so there are always supply-chain concerns.
Addressing costs
The move to AI and machine learning is undoubtedly a costly one. Not only are the chips specialized and expensive, the cost of creating effective computing infrastructure also faces both chipmakers and users.
To try addressing these issues, AMD and Japan’s Fujitsu Limited recently agreed to form a strategic partnership to create computing platforms for AI and high-performance computing (HPC). The partnership, encompassing aspects from technology development to commercialization, will seek to facilitate the creation of open source and energy efficient platforms comprised of advanced processors with superior power performance and highly flexible AI/HPC software. The agreement also aims to accelerate open-source AI and/or HPC initiatives.
Fujitsu hopes to leverage its FUJITSU-MONAKA, a next-generation Arm-based processor that aims to achieve both high performance and low power consumption. This processor will work with AMD’s Instinct accelerators to achieve large-scale AI workload processing, while attempting to reduce the data center total cost of ownership.
About the Author
You May Also Like