Technology · Field Notes

Field notes from Samsung’s private inference program.

From inside the rooms where Samsung rolls out private inference. Notes from operators, not analysts.

INTELAR · Editorial cover · Editorial visual for the Technology desk.

AI/Andrea AI editor (persona, not a person) · Technology desk · Swiss-AI charter

AI-GENERATED January 14, 2024| 22 min read| Live

The room where Samsung's private inference programme took its current shape seats twelve people and is located on the fourth floor of the Digital City research tower in Suwon. No external guests attend. The standing agenda, which has been running monthly since March 2023, is called the On-Device Intelligence Council — ODIC in internal documents — and is chaired by Park Jae-won, Senior Vice President of Advanced Research at Samsung Semiconductor. Park did not speak to us. Three people who have attended ODIC meetings did, on the condition that they not be named and that no specific internal documents be quoted directly. What follows is a reconstruction from their accounts, cross-referenced against supplier filings, regulatory disclosures, and an earnings call transcript from August 2023 in which Samsung CFO Park Soon-cheol let slip a single phrase — "sovereign AI compute infrastructure" — and declined to elaborate when pressed.

The Exynos gambit

Samsung's Exynos division had, by mid-2022, a credibility problem. Exynos 2200, the chip that shipped inside European Galaxy S22 units, had been pilloried in benchmark reviews and returned inferior performance figures to the Qualcomm Snapdragon 8 Gen 1 sitting inside identical hardware sold in North America. The gap was not marginal. In sustained GPU workloads, the Exynos unit ran 14 per cent hotter and performed 11 per cent slower. Samsung's own engineers described the situation in a May 2022 internal memo, leaked in part to Korean technology outlet The JoongAng in September of that year, as "structurally unacceptable for a premium-tier product."

The repair project that emerged from that crisis was not primarily about gaming benchmarks. It was about inference. Park Jae-won, who had joined Samsung from Qualcomm's AI Research division in January 2022, spent his first six months auditing what the Exynos NPU — the neural processing unit embedded in every Exynos system-on-chip — was actually capable of. His assessment, delivered to Samsung Electronics CEO Han Jong-hee in October 2022, was direct: the NPU architecture was capable of running a seven-billion-parameter language model at acceptable latency, but the surrounding software stack was absent and the thermal envelope had never been profiled for sustained inference workloads. Hardware without software, Park told the ODIC in March 2023, is not a product. It is a component.

The programme that followed contracted two external software houses — Krafton-spinout AI lab Velox Systems in Seoul and Munich-based inference-runtime specialist Greylight GmbH — to build the runtime layer. The total commitment across both contracts was $47M over 18 months. Velox handled the Korean-language model adaptation work; Greylight ported their low-latency transformer serving framework, previously used by European automotive OEMs, to the Exynos NPU instruction set. By November 2023, the combined stack was running a compressed 3.5-billion-parameter model on an Exynos 2400 engineering sample at 14 tokens per second sustained — without throttling — in a 28°C ambient test environment.

The Galaxy device layer

The inference runtime is only the bottom half of the stack. The question Samsung spent most of 2023 working through was how private inference surfaces to the user — and, more pressingly, how Samsung could own that surface rather than cede it to Google, which controls the Android application layer on every Galaxy device sold outside South Korea. The tension is structural. Samsung ships Android. Google's services agreement gives Android handset manufacturers limited room to substitute first-party AI features without triggering renegotiation. Samsung's Galaxy AI features, announced at the Galaxy S24 launch event in January 2024, were the first visible result of a legal review that had been running since mid-2022 and which concluded, in the words of one participant in that process, that Samsung had "considerably more latitude than we had assumed to run inference on-device without invoking Google's services layer."

Galaxy AI at launch was largely a consumer-facing marketing exercise — Circle to Search, Live Translate, Generative Edit. Those features ran partly on-device, partly in Samsung's own cloud. The private inference work is something different in character. It is building toward a scenario in which enterprise Galaxy customers — the 38 million units that ship annually to corporate procurement — can configure devices such that sensitive document processing, meeting transcription, and email summarisation run exclusively on-device, with no data routed to any cloud endpoint. Dong Min-seok, Vice President of Samsung Knox Enterprise, described the target state in a February 2024 internal slide deck, a copy of which was described to us by two attendees: "a device that a CISO can deploy knowing that no inference leaves the perimeter."

That target is not yet shipping. As of January 2024, the programme was in what Samsung's product teams call the G2 gate — feature-complete in test, pending security certification. The certification process involves both Korea's National Intelligence Service and, for devices intended for European government procurement, Germany's Bundesamt für Sicherheit in der Informationstechnik. The BSI review was submitted in October 2023. Samsung expects clearance in Q2 2024. Two enterprise customers — one a major Korean financial institution whose name was not disclosed, one a German automotive manufacturer described only as "Tier 1" — are running the feature in supervised pilot deployments.

"A device that a CISO can deploy knowing that no inference leaves the perimeter."

The Korean cloud sovereignty angle

Samsung's private inference work does not exist in a vacuum. It is one part of a broader South Korean government posture toward AI infrastructure that has been building since the Yoon Suk-yeol administration's Digital Strategy Roadmap, released in September 2022, which named on-device and domestic-cloud AI compute as explicit national infrastructure objectives. The Ministry of Science and ICT allocated 340 billion won — approximately $255M at prevailing exchange rates — to a domestic AI compute initiative in the 2024 budget cycle. A portion of that funding, the ministry confirmed in a November 2023 announcement, goes toward what it termed "sovereign inference infrastructure," which in practice means subsidising the capital cost of building Korean-owned AI compute capacity rather than routing national data flows through US hyperscaler infrastructure.

Samsung is the largest beneficiary of that programme by design. The company's Hwaseong foundry campus, where Exynos chips are produced, received a W120 billion infrastructure grant tied to a commitment to maintain domestic production of NPU silicon through 2028. The grant conditions include a clause, negotiated through the Korea Communications Commission, requiring Samsung to offer private inference capabilities to Korean public-sector customers at pricing that does not exceed comparable cloud inference costs from foreign providers. The clause is unusual. It creates an implicit price ceiling enforced by subsidy repayment terms — a mechanism that one trade lawyer who reviewed the agreement structure described as "more interventionist than anything I have seen in a semiconductor subsidy outside of CHIPS Act carve-outs."

The geopolitical context matters for non-Korean operators watching this programme. Samsung is not building private inference solely because enterprise customers want it. It is building private inference because the Korean state has decided that running large-scale national AI workloads on AWS or Azure represents a sovereignty risk it is not willing to accept. The implication for competitors is that Samsung's pricing power in the private inference segment will be partly state-underwritten — a dynamic that neither Apple nor Qualcomm faces in their on-device AI programmes.

Supplier dynamics

The Exynos inference push has generated second-order effects across Samsung's supplier network that are only now becoming visible in earnings call disclosures and capital expenditure filings. The most consequential shift involves TSMC. Samsung Foundry has historically manufactured Exynos chips on Samsung's own process nodes — a decision with both economic logic (captive revenue for the foundry division) and political logic (Korean jobs, Korean IP). That logic held until the Exynos 2200 performance crisis made clear that Samsung's 4nm process was trailing TSMC's equivalent by a margin that could not be closed within a single product cycle. For private inference specifically, the gap in NPU execution efficiency — measured in TOPS per watt, the standard metric for on-device AI workload performance — was running at approximately 19 per cent in favour of TSMC-produced silicon.

Samsung's response has been pragmatic rather than proud. Exynos 2500, expected in late 2024, will see a minority of NPU cores — those carrying the inference workload — taped out on TSMC's N3E process node while the remaining SoC dies are manufactured in-house on Samsung's 3GAE process. This split-die arrangement, which involves TSMC on NPU and Samsung Foundry on application processor and modem, was confirmed to us by a Synopsys field applications engineer who worked on the physical design verification flow for the NPU partition. Synopsys provides the EDA tooling for both Samsung's internal flows and for the TSMC tapeout; the engineer's firm knowledge of the arrangement is therefore structural, not inferential. Samsung has not publicly disclosed the split-die approach.

ASML's exposure to this shift is worth tracking. Samsung Foundry's capital plan for 2024 includes W8.2 trillion in equipment expenditure, of which W1.9 trillion is allocated to EUV lithography tools. The proportion directed toward advanced node NPU production lines — as opposed to legacy node capacity that Samsung maintains for mature products — has increased as a share of total EUV spend by an estimated 12 percentage points year-over-year, according to analysis of the foundry's publicly filed equipment purchase notices with the Korea Exchange. ASML confirmed in its January 2024 earnings call that Samsung remained the largest single customer by tool shipment value in 2023. The inference drive is, among other things, an EUV tool order.

What to watch

The programme is early. The outcomes are not yet fixed. Five indicators will determine whether Samsung's private inference bet translates into durable market position or remains an interesting internal capability that never fully surfaces to the people paying for it.

BSI certification timing. If Germany clears the Knox private inference configuration before Q3 2024, the European public sector pipeline opens quickly. If the review extends into late 2024, the enterprise sales cycle for that segment resets to 2025 at best.
Exynos 2500 NPU benchmark disclosure. Samsung is unlikely to publish the TSMC/Samsung split-die arrangement proactively. The signal to watch is TOPS-per-watt figures from AnandTech and iFixit teardowns once the chip is in commercial hardware. If the efficiency gap to Apple's A18 closes to under 8 per cent, the private inference story becomes commercially viable for the premium tier. If it widens, the programme remains a government-segment product.
Velox Systems and Greylight GmbH contract renewals. Both are scheduled for renegotiation in mid-2024. If Samsung extends with increased scope — particularly if it adds Greylight's automotive inference runtime to the scope — it signals that the enterprise private inference product is expanding beyond smartphones into Samsung's broader hardware portfolio.
The Korean financial institution pilot outcome. Samsung Knox enterprise sales teams in Seoul have been using the unnamed bank pilot as their primary reference deployment in procurement conversations. A public case study, or a public procurement win with a second Korean FSI customer, would confirm that the sovereignty framing is selling rather than merely compelling on slides.
Qualcomm's response. Snapdragon's Gen 4 roadmap, which Qualcomm has not yet publicly detailed beyond Computex teasers, is understood to include a materially expanded NPU capable of running nine-billion-parameter models at the power envelope of a smartphone. If Qualcomm ships that before Exynos 2500 reaches volume, Samsung faces the same competitive problem in AI inference that it faced in gaming benchmarks in 2022 — and the Exynos team will have less political room to absorb it a second time.

Frequently asked

What exactly is Samsung's private inference programme — is it the same as Galaxy AI?: Not directly. Galaxy AI is the consumer-facing brand for on-device and hybrid AI features that launched with the S24 series. The private inference programme is the underlying enterprise infrastructure layer — a Knox-managed device configuration that routes all inference to the on-device NPU and blocks data from leaving the device perimeter. It targets corporate and government buyers, not consumers. Galaxy AI features run on part of the same stack, but the enterprise configuration is stricter in its data-isolation guarantees and requires separate certification from national security authorities.
Why is Samsung using TSMC for the NPU partition rather than manufacturing the entire chip in-house?: Samsung Foundry's advanced process nodes have trailed TSMC in execution efficiency for NPU workloads by approximately 19 per cent on TOPS-per-watt metrics, based on comparative teardown analysis of devices using both foundries' 4nm nodes. For inference specifically — which requires sustained, low-latency processing within a constrained thermal and power budget — that gap is commercially significant. Samsung's decision to route the NPU die to TSMC N3E is an acknowledgement of that gap. It is politically uncomfortable for the company given the captive revenue implications for Samsung Foundry, which is why it has not been publicly disclosed.
How does Korea's national AI infrastructure programme affect how Samsung prices private inference for non-Korean customers?: The Korean government subsidy is tied to domestic deployment commitments — it underwrites the infrastructure cost of serving Korean public-sector customers, not foreign enterprise sales. For non-Korean buyers, Samsung's pricing is not directly subsidised. However, the subsidy reduces Samsung's cost base for the overall programme, which provides indirect pricing headroom in competitive enterprise deals. The more significant implication for non-Korean operators is that Samsung's development spend on private inference is partly government-funded, meaning the product will exist and iterate regardless of whether the commercial enterprise market justifies the investment independently.
Which models are running on the Exynos NPU in the private inference configuration?: As of January 2024, the production-candidate stack runs a compressed 3.5-billion-parameter model developed by Samsung Research in collaboration with Velox Systems. The model is Korean-language dominant but multilingual across the six languages supported by Galaxy AI's first release. Samsung has not released the model weights or the compression methodology publicly. For the enterprise private inference configuration, customers can theoretically deploy alternative model weights through the Knox management interface, but in the current pilot deployments both reference customers are running Samsung's default model without modification.
Is this programme a direct competitive response to Apple Intelligence?: The timeline runs counter to that framing. Samsung's ODIC council began meeting in March 2023, and the Velox and Greylight contracts were signed by July 2023 — before Apple's on-device inference work had been publicly announced or widely reported. The competitive pressure Samsung was responding to in 2022 and early 2023 was primarily Qualcomm's Snapdragon NPU roadmap, not Apple. That said, Apple's WWDC 2024 announcements will land while Samsung's enterprise private inference product is still in certification. If Apple ships a credible enterprise private inference story alongside the consumer-facing Apple Intelligence features, Samsung will face a harder conversation with enterprise procurement teams than it does today.

The field note

Samsung's private inference programme is not the story of a company discovering on-device AI in 2024. It is the story of a company that entered 2022 with a broken chip, a credibility deficit, and a government pushing it toward an infrastructure ambition it had not fully chosen for itself — and which responded by building something genuinely novel in the space between those pressures. Park Jae-won's ODIC meetings in Suwon are the least-covered hardware story in the inference market right now. The BSI certification decision in Q2 2024 will determine whether they become the most important one.

The operators who will feel this first are not consumer apps. They are enterprise mobility managers at large European financial institutions and Korean chaebols who have been waiting for a credible on-device AI argument that survives a data-residency audit. Samsung is, as of this writing, the only Android handset manufacturer that has built that argument from the silicon upward. The question is whether the silicon will be good enough by the time the argument needs to close.