Last May, Sandra Rivera, a senior executive at chip giant Intel, received some alarming news.
Engineers spent more than five years developing a powerful new microprocessor to run their data center computing work and were confident that they had finally got it right. But at a regular morning meeting to discuss the project, signs of a potentially serious technical flaw surfaced.
The problem was so frustrating that Sapphire Rapids, the microprocessor’s codename, had to be delayed – the latest in a series of setbacks for one of Intel’s most important products in years.
“We’re pretty depressed,” said Ms. Rivera, Intel’s vice president of data center and artificial intelligence group. “It was a painful decision”
The Sapphire Rapids launch has been delayed from mid-2022 to Tuesday, nearly two years later than once expected. The long-term development of the product, which combines four chips in a single package, highlights some of the challenges Intel faced in its comeback efforts as it sought to dominate the United States basic computer technology.
Since the 1970s, Intel has been a leading player in the tiny slices of silicon that power most electronic devices, known for a variety called microprocessors that act as electronic brains in most computers. But the Silicon Valley company has lost in recent years its longstanding leadership in manufacturing technology that helped determine how quickly chips can transact.
Patrick Gelsinger, who became CEO of Intel in 2021, pledged to restore manufacturing superiority and build new US factories. He was a prominent figure when Congress debated and passed legislation last summer to reduce the U.S. reliance on chip manufacturing in Taiwan, which China claims is its territory.
Sapphire Rapids’ bumpy development has implications as to whether Intel can rally to deliver future chips on time. This is an issue that could affect PC manufacturers and cloud service providers, not to mention the millions of consumers who enjoy online services powered by Intel technology.
“What we want is a predictable, steady pace,” said Kirk Skaugen, vice president of server sales at Lenovo, a Chinese company that is planning 25 new systems based on the new processor. “Sapphire Rapids is the beginning of a journey.”
A newly machined silicon wafer containing Sapphire Rapids chips at Intel’s headquarters in Santa Clara, California, this week. Credit… Anastasiia Sapon for The New York Times
The pressure continues for Intel. With demand for chips used in personal computers plummeting, the company faces stiff competition for its most profitable business, server chips. Since Mr Gelsinger took office, Intel’s market capitalization has lost more than $120 billion in value, and the issue has Wall Street worried.
Intel plans to hold an online event Tuesday to discuss Sapphire Rapids, named after a section of the Colorado River. More formally, the product is called the 4th Gen Intel Xeon Scalable processor.
In an interview, Mr. Gelsinger said that Sapphire Rapids has had a success despite the delays. She chose Ms. Rivera to take over the unit that developed it in 2021, where Rivera uses lessons learned from experience to change the way Intel designs and tests its products. She said Intel is conducting several internal reviews of Sapphire Rapids and “we’re not done”.
Sapphire Rapids started in 2015 with discussions between a small group of Intel engineers. The product was the company’s first attempt at a new approach to chip design. Companies now routinely put tens of billions of tiny transistors in each piece of silicon, but competitors like Advanced Micro Devices and others have begun making processors from multiple chips packed together in plastic packages.
Intel engineers came up with a four-chip design with 15 processor “cores”, each acting as separate calculators for general-purpose computing tasks. The company also decided to add extra circuit blocks for special tasks, including artificial intelligence and encryption, and communicate with other components such as chips that store data.
Shlomit Weiss, who co-leads Intel’s design engineering group, said the interplay between so many elements is “very complex.” “Complexity often brings problems.”
The Sapphire Rapids team grappled with bugs, defects or manufacturing disruptions due to designer errors that could cause a chip to miscalculate, run slowly or stop working. They were also affected by delays in the product’s manufacturing process.
But by December 2019, engineers had reached a milestone called “taping”. That’s when the electronic files containing a completed design are transported to a factory to make sample chips.
The sample chips arrived in early 2020 due to the Covid-19 enforced lockdowns. The project’s chief engineer, Nevine Nassif, said engineers soon found the computing cores in Sapphire Rapids communicating with each other. But more work remained than expected.
One of the key tasks was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing jobs and catch errors. Once defects are found and fixed, designs can be returned to the factory to make new test chips, which usually takes more than a month.
Repeating this process led to missed deadlines. Ms. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was released in March 2021. But it still wasn’t ready that June, when Intel announced a delay until next year to allow for further verification.
That’s when Miss Rivera stepped in. The longtime Intel executive had built a successful networking business before being appointed chief human officer in 2019.
“We needed to get our execution spell back,” said Mr Gelsinger. “I needed someone to run to the fire and fix this for me.”
In October 2021, Ms. Rivera and a senior design manager held weekly Sapphire Rapids status meetings, held at 7am every Monday. .
Then came the discovery of the flaw last May. Ms. Rivera did not explain this in detail, but said that it affects the performance of the processor. In June, it used an investor event to announce a delay of at least a quarter, which pushed Sapphire Rapids into the wake of the launch of a competing AMD chip in November.
“We were ready to ship,” said Ms. Nassif. The final delay was “deeply regrettable given all the effort put into it.”
Ms. Rivera learned a series of lessons from setbacks. The first was that Intel packed a lot of innovation into Sapphire Rapids instead of offering a less ambitious product earlier.
He also concluded that the team should spend more time perfecting and testing their design using computer simulations. Ms. Rivera said it’s cheaper to find faults before they’re found in the sample chips, making it possible to remove features to simplify the product. She has since moved to support Intel’s simulation and validation capabilities.
“We used to have a lot of this kind of muscle that we let atrophy,” Ms. Rivera said. “We’re rebuilding now.”
He also determined that Intel was planning more products than its engineers and customers could easily handle. That’s why it modernized this product roadmap, including pushing Sapphire Rapids’ successor back from 2023 to 2024.
More generally, Ms. Rivera and other Intel executives have pushed the organization to develop better processes for documenting technical issues and sharing this information internally and externally.
Some Intel customers say that communication has gotten better.
“Is everything okay? No,” said Mr. Skaugen of Lenovo, who once ran Intel’s server chip business. “But we were a lot less surprised than we were in the past.”