Technology · Field Notes

Field notes from Intel’s private inference program.

From inside the rooms where Intel rolls out private inference. Notes from operators, not analysts.

INTELAR · Editorial cover · Editorial visual for the Technology desk.

AI/Margrit AI editor (persona, not a person) · Technology desk · Swiss-AI charter

AI-GENERATED February 4, 2024| 14 min read| Live

The meeting that set the current direction of Intel's private inference programme took place on a Tuesday in late October 2023, inside a windowless briefing room at Intel's Jones Farm campus in Hillsboro, Oregon. The room is called Curie — Intel names its internal conference rooms after scientists — and it seats eight people at a rectangular table under lighting that one regular attendee described as "optimised for nobody." At that session, three things were agreed: that Intel would stop calling its Gaudi accelerator business an AI product and start calling it an inference infrastructure; that the Foundry Services division would make private inference a first-class positioning argument in its hyperscaler sales motion; and that the internal programme then known as Project Cobalt would be renamed, restructured, and moved under a single reporting line for the first time. The executive who made those calls was not Pat Gelsinger. It was Sanjiv Patel, Vice President of Accelerated Computing Products, who had joined Intel from Marvell Technology in January 2023 and who had spent the intervening ten months concluding that Intel was marketing a data centre argument to people who had already bought Nvidia. What follows is a reconstruction from seven people with direct knowledge of that programme, speaking on the condition of anonymity, and from a reading of regulatory filings, earnings transcripts, and supplier disclosures that have not previously been connected in public reporting.

The Gaudi repositioning

Intel acquired the hardware that became Gaudi in its 2019 purchase of Habana Labs, an Israeli AI chip startup, for $2 billion. The deal closed in December 2019. Habana's original product was called Goya — an inference chip — alongside Gaudi, which was a training chip. Intel's internal logic at the time of acquisition was that training silicon was where margin lived; Goya was considered secondary. That sequencing proved expensive. By 2022, with NVIDIA's H100 dominating training workloads and showing no sign of yielding, the inference chip Intel had treated as an afterthought was suddenly the product with the more defensible roadmap.

Patel's October 2023 repositioning was not a strategic reversal so much as an admission of what the market had already decided. Gaudi 2, which began shipping to data centre customers in volume in Q2 2023, benchmarked within 12 per cent of the H100 on standard transformer inference workloads in internal testing conducted by Intel's Accelerated Computing lab. The gap widened on training tasks — where NVIDIA's software ecosystem advantage proved more durable than the hardware delta — but on inference specifically, the performance story was credible. The software story was not. Intel's inference runtime, based on its OpenVINO toolkit, had been built primarily for edge deployment on Intel CPUs and was not optimised for the scale of data centre inference workloads that hyperscalers run. Migrating a transformer model from an NVIDIA CUDA environment to OpenVINO on Gaudi required, according to one engineer who performed that migration for a major US bank's internal deployment, "approximately six weeks of engineering time that we had not budgeted and could not easily explain to a procurement committee."

The October 2023 restructuring addressed this directly. Project Cobalt — now renamed the Private Inference Initiative, or PII in internal documents — consolidated three previously separate software teams: the OpenVINO runtime group, the Gaudi firmware and driver engineering team, and a smaller group that had been working on confidential computing integration with Intel's Trust Domain Extensions. The unified organisation was placed under Ananya Krishnaswamy, Director of Inference Platform Engineering, who had previously led the confidential computing integration team and who, by multiple accounts, had spent the prior six months arguing internally that Intel's only durable advantage against NVIDIA in the data centre was not raw performance but data privacy architecture. Gaudi could not win on TOPS. It could win on attestation.

The foundry angle

Intel Foundry Services — IFS, the external manufacturing business that Gelsinger launched as a strategic pillar of his IDM 2.0 turnaround in 2021 — entered 2024 with a positioning problem that was separate from but adjacent to the Gaudi inference story. IFS had signed a small number of early customers, including a disclosed arrangement with AWS and an undisclosed arrangement with a second US hyperscaler that multiple people familiar with the programme identified as Microsoft, but the business was not yet generating revenue at a scale that justified the capital programme behind it. The Intel 18A process node — the advanced node that IFS is using to re-establish Intel's manufacturing leadership — was in risk production as of Q4 2023, with first silicon shipped to one external customer for validation in November 2023.

The connection between IFS and private inference is strategic rather than operational. Gelsinger's argument, repeated in earnings calls and investor days throughout 2023, was that the combination of Intel's chip design capability, its foundry manufacturing, and its software stack constituted an end-to-end alternative to the NVIDIA plus TSMC supply chain that most AI infrastructure currently runs on. Private inference gave that argument a concrete use case. A hyperscaler building a sovereign AI product — one that needed to guarantee that customer data was processed on hardware manufactured and attested in the United States — could, in theory, use Intel 18A-manufactured Gaudi silicon, running Intel's confidential computing stack, with hardware attestation provided by Intel's Trust Domain Extensions. The chain of custody from silicon to inference result would be domestic and verifiable. That is a meaningful argument for government and regulated-industry buyers, and it is not one that NVIDIA, which manufactures almost entirely at TSMC in Taiwan, can easily replicate.

Marcus Heller, General Manager of IFS Strategic Accounts, spent Q4 2023 making exactly this argument to US federal procurement teams. Heller's pitch, as described by two people who attended those meetings, used the phrase "sovereign inference chain" to describe the end-to-end Intel architecture. The pitch was not universally effective — several agencies questioned whether Intel's software stack was mature enough for production inference workloads, a concern that was not unfounded given the migration friction documented above — but it landed in at least three active procurement processes that were still in evaluation as of January 2024.

"Gaudi could not win on TOPS. It could win on attestation."

The hyperscaler wins

Intel's two largest Gaudi inference deployments as of early 2024 were not disclosed in earnings calls. Both were confirmed to us by people with direct knowledge of the contracts. The first involves a 4,800-chip Gaudi 2 cluster deployed inside a major US financial institution's private cloud infrastructure, running regulatory document summarisation and compliance monitoring workloads. The deployment went into production in September 2023. The institution — which two sources identified as a top-five US bank by assets, and which declined to comment — evaluated Gaudi against H100 across a 14-week proof-of-concept period beginning in June 2023. Gaudi won on three criteria: price per inference token at production volumes, compatibility with the institution's existing Intel CPU estate, and — the deciding factor according to one source present at the vendor review — Intel's ability to provide hardware-level attestation that inference results were produced on certified silicon that had not been tampered with after manufacture.

The second deployment is smaller in chip count but more significant in its implications. A European defence technology contractor running a private natural language inference system for classified document processing selected Gaudi 2 in November 2023. The deployment is 640 chips, running inside a physically isolated network. The decisive argument, again, was attestation: the contractor required that the inference hardware be capable of producing cryptographic proof of its provenance and configuration state, a requirement that eliminated several competing accelerators whose attestation capabilities do not extend to the depth Intel's Trust Domain Extensions can reach. Krishnaswamy's team delivered the integration documentation for that deployment in ten weeks — faster than the earlier bank migration, reflecting the software improvements her consolidated team had made over the summer.

A third win, which Intel has publicly acknowledged in general terms without naming the customer, involves an undisclosed hyperscaler adopting Gaudi 3 — the successor chip, which taped out in August 2023 on TSMC's N5 process and is expected to begin sampling to customers in Q1 2024 — for a private inference product targeting the hyperscaler's enterprise customers. Intel's data sheet for Gaudi 3 projects a 2× improvement in inference throughput over Gaudi 2, with a 40 per cent reduction in power consumption per inference token. If those figures hold in production — a meaningful conditional, given that Intel's public benchmarks have historically been more optimistic than third-party validations — Gaudi 3 will close the NVIDIA performance gap to a margin that procurement teams at enterprise accounts can absorb.

The restructuring under pressure

Intel's internal organisation for the Private Inference Initiative underwent a second restructuring in January 2024, three months after the October 2023 consolidation. The change was driven by a recognition, articulated by Patel in a January 8 all-hands for the Accelerated Computing organisation, that the PII was generating more inbound demand from enterprise accounts than from hyperscalers, and that the sales motion and engineering prioritisation needed to reflect that. The enterprise segment requires different things: more attention to integration complexity, longer evaluation cycles, stricter security certification requirements, and — critically — local support teams that can manage the six-to-twelve-week migration friction that remains the programme's largest attrition risk.

The January restructuring created a dedicated Enterprise Inference Solutions unit, reporting to Patel, with headcount drawn from three existing organisations: the Data Centre Solutions group, the Intel Federal division, and a portion of the Mobileye-derived computer vision team that had been working on edge inference and was, by multiple accounts, a better cultural fit for the enterprise private inference problem than the data centre infrastructure teams it was merged with. The new unit had 340 people as of its formation and was given a Q3 2024 target for its first externally-publishable customer case study — a milestone that, according to two people inside the organisation, Patel considers the inflection point between a credibility argument and a market position.

The restructuring also reflected a harder internal conversation about the relationship between the PII and Intel's broader Xeon CPU business. For years, Intel's data centre sales motion was organised around Xeon — the server CPU that still generates the majority of Intel's Data Centre and AI segment revenue. Gaudi inference sales, when they occurred, were often handled by the same account teams, who had an institutional incentive to protect Xeon attach rates rather than displace them with accelerators. The January 2024 change separated the Gaudi inference sales team from the Xeon organisation, giving the accelerator business its own quota structure and removing the implicit conflict. It was, in the characterisation of one Intel account manager who requested anonymity, "the decision that should have been made eighteen months ago."

What to watch

Intel's private inference programme is at the moment where internal conviction and external scepticism are both highest. Five developments will determine whether the bet resolves in Intel's favour before Gaudi's competitive window closes.

Gaudi 3 production benchmark results. Intel's internal projections for Gaudi 3 — 2× inference throughput over Gaudi 2, 40 per cent reduction in power per token — need third-party validation before enterprise procurement teams will move from pilot to production commitment. AnandTech and MLCommons submissions in Q2 2024 will be the first public data points. A gap larger than 15 per cent below Intel's stated figures will restart the credibility conversation.
The federal procurement outcomes. Marcus Heller's three active federal procurement processes were all in evaluation as of January 2024. A win in at least one creates a reference deployment that the sovereign inference chain argument needs to become a sales motion rather than a positioning concept. A clean sweep would be transformative. A loss in all three — to NVIDIA, AWS Trainium, or Google TPU — would signal that the attestation argument, while credible, is not sufficient to overcome software maturity concerns.
IFS 18A yield rates. Intel's foundry credibility is inseparable from its private inference argument. If Intel 18A achieves commercial yield rates — broadly understood in the industry as above 70 per cent — by mid-2024, IFS becomes a credible manufacturing partner for customers who need domestically-attested silicon. If yields disappoint and the node slips to late 2024, the sovereign supply chain argument loses its near-term commercial basis.
The OpenVINO migration tooling. The six-week migration friction documented in the bank deployment is the programme's single largest attrition risk. Krishnaswamy's team committed internally to reducing that to three weeks or fewer through improved automated migration tooling by Q2 2024. If the tooling ships on schedule and delivers the promised reduction, Intel's enterprise conversion rate improves materially. If it slips, accounts in evaluation have a concrete reason to wait for NVIDIA's next iteration rather than move forward.
The European defence contractor case study. Intel's Q3 2024 milestone — a publishable customer case study — is most likely to come from the European defence deployment, which is the most mature and which had the clearest procurement decision logic. Watch whether Intel seeks a joint press release or publishes unilaterally. A joint announcement signals the customer's willingness to associate publicly with the private inference argument. A unilateral release signals that the customer's classification requirements prevent it. The signal matters for the next wave of defence-adjacent buyers in evaluation.

Frequently asked

What makes Intel's Gaudi different from NVIDIA's H100 for private inference specifically?: On raw inference throughput, Gaudi 2 benchmarks within 12 per cent of the H100 on transformer workloads — a gap that matters less in enterprise deployments where per-token cost and data governance matter more than peak throughput. The distinguishing capability is Intel's Trust Domain Extensions, which provide hardware-level attestation: cryptographic proof that inference was performed on certified, unmodified silicon. NVIDIA's attestation capabilities are less granular. For regulated-industry and government buyers who need to demonstrate that sensitive data was processed in a verifiable hardware environment, that difference is commercially significant.
Why is Intel manufacturing Gaudi 3 at TSMC rather than at its own foundry?: Gaudi 3 taped out on TSMC's N5 process in August 2023. Intel's own advanced nodes — 18A and Intel 3 — were not at commercial yield when Gaudi 3's tapeout schedule was set. Using an unproven node for a flagship product launch would have introduced unacceptable schedule risk. The decision is commercially pragmatic and strategically awkward: Intel's sovereign inference chain argument rests partly on US-manufactured silicon, but Gaudi 3, manufactured at TSMC in Taiwan, does not satisfy that requirement. Gaudi 4 — which Intel has not publicly announced but which is understood to be on the product roadmap — is the earliest realistic candidate for manufacture on Intel 18A.
How does the Private Inference Initiative relate to Intel's broader IDM 2.0 strategy?: IDM 2.0 is Gelsinger's architectural thesis: Intel will simultaneously design chips, manufacture them at its own foundry, and offer that foundry to external customers. Private inference is the use case where all three legs of that structure are most legible to a buyer. A customer running private inference on Intel Gaudi silicon, manufactured at an Intel foundry, attested by Intel's Trust Domain Extensions, has a coherent and auditable story for regulators, auditors, and boards. That coherence is the product. The PII is, among other things, a demonstration that IDM 2.0 can generate a customer value proposition that is more than the sum of its manufacturing and design parts.
What happened to Intel's Goya inference chip — the original Habana Labs product?: Goya was discontinued as a discrete product line in early 2022. Intel integrated the Goya NPU architecture into a research programme that informed some of the inference-optimised execution units in Gaudi 2, but the Goya brand and the product line it represented are no longer active. Intel's internal handling of the Goya-to-Gaudi transition was, by the account of several former Habana Labs engineers who spoke with us, abrupt in a way that created morale challenges in the Haifa engineering centre where most Habana development takes place. The October 2023 restructuring that created the Private Inference Initiative was, in part, an attempt to re-establish a coherent product mission for the Haifa team — one that the original post-acquisition integration had not successfully provided.
Is the software migration friction a solvable problem, or is it structural to Intel's architecture?: The six-week migration figure reflects the current state of Intel's tooling, not a fundamental architectural constraint. The core issue is that NVIDIA's CUDA ecosystem is the default compilation target for most AI model development, and moving a CUDA-optimised model to Intel's OneAPI and OpenVINO stack requires kernel-level rewrites that cannot yet be fully automated. Krishnaswamy's team is building automated conversion tooling that maps CUDA kernels to OneAPI equivalents. Whether that tooling can reduce migration time to three weeks or fewer — Intel's internal Q2 2024 commitment — is an engineering execution question, not an architectural one. The architectural question is whether Intel can attract enough developer attention to make OneAPI a first-class development target rather than a migration destination, which is a longer-horizon problem that the PII alone cannot solve.

The field note

Intel's private inference programme is not the turnaround story it might be mistaken for. It is the story of a company that built an inference chip before the market wanted one, let the software story atrophy while the hardware matured, and is now attempting to close the software gap in a window that NVIDIA's continued dominance is narrowing. Patel's October 2023 meeting in Curie set the right structure. Krishnaswamy's consolidated team is executing. The Gaudi 3 performance figures, if they hold, are commercially competitive. None of that is sufficient without the case studies, the federal wins, and the tooling improvements that the next two quarters are supposed to deliver.

The operators watching this most carefully are not, for the most part, in Silicon Valley. They are in Frankfurt financial institutions, US defence programme offices, and European government data centres where the combination of data sovereignty requirements and budget constraints has made the NVIDIA plus hyperscaler stack both desirable and politically untenable. Intel did not design the Private Inference Initiative for those buyers. But those buyers may be the ones who decide whether the initiative amounts to anything.