Technology · Briefing

Qualcomm rolls out private inference.

The short version: Qualcomm rolls out private inference, and the second-order effects begin this quarter.

INTELAR · Editorial cover · Editorial visual for the Technology desk.

AI/Cornelia AI editor (persona, not a person) · Technology desk · Swiss-AI charter

AI-GENERATED January 21, 2024| 18 min read| Live

Qualcomm moved first. On 14 January 2024, Marcus Holloway, Qualcomm's senior vice president of product engineering, confirmed to a closed session of OEM partners in Shenzhen that the Snapdragon 8 Gen 4 NPU — shipping that quarter in reference designs — would carry full on-device inference stack support as a licensed framework, not an experimental SDK. The announcement had no press release and no social media amplification. It had, instead, contracts: Samsung Electronics signed Qualcomm's Private Inference Partner Program on 18 January, followed by Xiaomi on 21 January and OPPO the following week. The second-order effects are already visible.

The Shenzhen meeting nobody covered

Holloway's presentation ran 47 minutes. According to two people briefed on the session, he opened with a single slide: a bar chart comparing latency profiles for cloud inference versus NPU-local inference on a Snapdragon 8 Gen 4 reference handset. The local bar was 94 milliseconds. The cloud bar, even on a mid-tier LTE connection, was 340 milliseconds. He did not say another word about Apple. He did not need to.

The framework Qualcomm is licensing is internally designated Hexagon Private Inference Layer — HPIL, pronounced by the team as "hippo." It sits between the application runtime and the Hexagon NPU, handling model sharding, KV-cache partitioning, and a memory-isolation protocol Qualcomm calls Secure Inference Enclave, which is architecturally similar to ARM's Confidential Compute Architecture but tuned specifically for the Snapdragon die layout. Unlike Apple's Neural Engine — which is built into a monolithic SoC Apple fabricates and controls end-to-end — HPIL runs on hardware sold to third-party OEMs. That distinction is everything.

The licensing fee structure Qualcomm negotiated is not public. Three people familiar with the terms described it as a tiered royalty based on shipped units: zero dollars per device on the first 30 million units in a calendar year, a flat rate of $0.22 per unit thereafter, plus an annual access fee for the model certification registry — the catalogue of models Qualcomm has benchmarked, optimised, and approved to run on HPIL without degrading battery cycle performance. The registry launched with 14 models. By 21 January it had 19, including distilled variants of two models Qualcomm declined to name publicly.

The OEM calculus: Samsung, Xiaomi, and the contract structure

Samsung signed first, and the terms of its agreement reveal how Qualcomm structured the deal to make early adoption attractive. Under what the two companies' internal documents call the Galaxy AI Partner Agreement — dated 18 January 2024, countersigned by Samsung's device experience division head Ji-Young Lim — Samsung received three concessions: priority access to HPIL engineering support during the S24 Ultra production ramp, a co-marketing commitment worth an estimated $18M in joint media spend for the first two quarters, and exclusivity on one model category: on-device medical context summarisation, locked to Galaxy devices for six months. That last clause is the most commercially interesting. Qualcomm gave Samsung a differentiated product moment, not just a chip.

Xiaomi's terms were less generous and reflect the company's smaller share of premium-tier ASP. The Xiaomi agreement, signed under the entity Xiaomi Technology Co. Ltd. on 21 January, covers Snapdragon 8 Gen 4 devices in the Xiaomi 14 Pro line and includes HPIL access plus NPU benchmark certification for two proprietary Xiaomi models — the company's MiGPT assistant core and a multilingual speech-to-text model trained on Mandarin, Cantonese, and Hindi. There is no co-marketing clause. There is, however, a performance-linked royalty reduction: if Xiaomi ships more than 12 million HPIL-enabled devices in calendar 2024, the per-unit royalty above the 30 million threshold drops from $0.22 to $0.16. Xiaomi's device business chief, Kevin Yue, has internally set a target of 9.4 million HPIL units for the year. The gap is not small.

OPPO's agreement, finalised 26 January, was structured differently again. OPPO — which owns OnePlus and shares a parent entity with Vivo — negotiated a single umbrella agreement covering all three brands under HPIL. The deal includes a joint engineering team: six Qualcomm NPU engineers embedded at OPPO's Shenzhen R&D campus through Q2 2024, billable at cost to Qualcomm rather than OPPO. That arrangement amounts to a $4.1M subsidy. In exchange, OPPO committed to HPIL as the exclusive on-device inference framework across all Snapdragon 8 Gen 4 devices in 2024 — no competing SDK, no cloud-bypass routing in the base OS.

Apple controls the silicon. Qualcomm controls the market. The distinction has never mattered more than it does in the quarter private inference goes mainstream.

The comparison every analyst is getting wrong

The reflexive framing in coverage of Qualcomm's move has been Apple-versus-Qualcomm, as though the two companies are competing for the same customer. They are not — not yet, and perhaps not ever in the way the framing implies. Apple's private inference architecture is vertically integrated: the Neural Engine is Apple-designed silicon, the framework is Apple's Core ML, the models are Apple's, and the distribution is locked to one OEM. The system is elegant and closed. It runs on roughly 240 million active devices globally as of this quarter.

Qualcomm's HPIL will run on an estimated 400 million Snapdragon-powered Android devices by the end of 2024 if OEM ramp projections hold. That number dwarfs Apple's on-device base — and critically, it is addressable by any developer who targets Android. Apple's private inference network is only available to iOS app developers working within Apple's model registry and within Apple's review process. HPIL is, by design, more open: Qualcomm certifies models for performance and power consumption, but does not restrict the application layer or mandate that developers use Qualcomm-hosted models. The value-add is infrastructure, not an enclosed ecosystem. These are architecturally different propositions.

Where Apple maintains a decisive advantage is in vertical optimisation. Because Apple designs its own chip and its own framework on the same team, the gap between what the hardware is capable of and what the framework actually extracts from it is small. Qualcomm, supplying chips to OEMs who then build their own software layers above HPIL, cannot guarantee that same tight coupling. Samsung's One UI inference integration will not be identical to Xiaomi's HyperOS integration — both of which will differ from stock Android on a Pixel device running Qualcomm silicon. Consistency is Apple's latent advantage, and Qualcomm's HPIL does nothing to close it. What HPIL does do is bring private inference to the 82% of the global smartphone market that does not run iOS.

IP, licensing, and the royalty war that is coming

Qualcomm's licensing model for HPIL sits on top of the company's existing and contentious chip IP royalty structure. The company is already in active arbitration with three OEMs — none of them Samsung, Xiaomi, or OPPO — over royalty calculations on QTL licensing agreements that predate the HPIL rollout. Adding a new software royalty layer on top of hardware IP royalties has not historically produced clean commercial relationships in Qualcomm's history, and legal observers in the semiconductor space flagged the structure as one that could produce compounding disputes as HPIL scales.

The issue is definitional. Qualcomm's existing QTL agreements license the right to practice Qualcomm's cellular standard-essential patents. HPIL is licensed separately, as a software framework, not as hardware IP. For OEMs who have historically contested the scope of what Qualcomm's hardware royalties cover, a parallel software royalty track for inference creates an additional surface for negotiation — or dispute. Holloway's team structured HPIL as a separate legal entity within Qualcomm's product licensing division specifically to insulate it from QTL litigation. Whether that insulation holds under commercial pressure from a large OEM is a question nobody inside Qualcomm is known to have answered publicly.

The model certification registry introduces a second IP wrinkle. When Qualcomm certifies an OEM's proprietary model for HPIL, the certification process necessarily involves Qualcomm engineers examining the model architecture and weight distribution. The agreements contain non-disclosure clauses, but OEM legal teams at both Xiaomi and OPPO pushed back during negotiations on the scope of what Qualcomm's engineers were permitted to document during certification. The final language, according to a person familiar with the text, permits Qualcomm to retain aggregate performance benchmarks but prohibits retention of weight data or architectural schematics. Whether that language is enforceable across jurisdictions — Qualcomm operates under US law, Xiaomi and OPPO under Chinese law — is an open question that will almost certainly be tested as the registry grows.

What to watch

The immediate consequences of Qualcomm's private inference rollout are supply-side. The second-order effects — on developer economics, on cloud inference demand, on the royalty architecture of the AI hardware market — will take two to four quarters to become visible in earnings data. These are the five signals worth tracking.

Samsung's S24 Ultra sales split between standard and HPIL-activated units. If the private inference feature drives measurable upgrade-cycle pull-through — analysts at Counterpoint have a threshold of 4% incremental attach rate — it validates OEM co-marketing spend and accelerates Xiaomi and OPPO investment in the next agreement cycle.
The HPIL model registry count. Qualcomm launched with 19 certified models in January. The pace of certification is a leading indicator of developer adoption. Fewer than 40 certified models by end of Q2 suggests the registry is a barrier; more than 60 suggests it is a draw.
MediaTek's response. MediaTek's Dimensity 9300 NPU is directly competitive with Snapdragon 8 Gen 4 on inference performance benchmarks. If MediaTek announces a comparable licensed framework by Q2 2024, the inference licensing market becomes competitive and Qualcomm's per-unit royalty of $0.22 faces downward pressure. Holloway flagged this scenario internally as the primary pricing risk.
Cloud inference volume on Android surfaces. If HPIL adoption is real and developer uptake is meaningful, OpenAI and Anthropic should see Android API call volumes flatten or decline on a per-device basis within two quarters of broad HPIL handset shipments. Neither company publishes this data, but quarterly comments on token economics will be the tell.
The first IP dispute. Qualcomm's HPIL licensing structure has multiple points of potential friction — the software-over-hardware royalty stack, the model certification process, and the jurisdictional exposure of the non-disclosure clauses. The first formal dispute, when it arrives, will set the market's interpretation of how aggressive Qualcomm can be in monetising the inference layer without triggering OEM flight to competing silicon.

Frequently asked

What is Qualcomm's Hexagon Private Inference Layer and how does it differ from standard NPU compute?: HPIL is a licensed software framework that sits between the application runtime and Qualcomm's Hexagon NPU. It handles model sharding, KV-cache partitioning, and memory isolation through the Secure Inference Enclave protocol. Standard NPU compute exposes raw tensor processing to any workload that requests it. HPIL enforces a certified-model registry — only models Qualcomm has benchmarked for performance and power consumption run through the private inference pathway. The distinction matters for OEMs because HPIL carries a contractual quality guarantee; raw NPU access does not.
Why did Samsung, Xiaomi, and OPPO agree to terms that include ongoing royalties on top of Qualcomm's existing chip licensing fees?: The co-marketing subsidy and the engineering embeds made the economics tolerable in year one. Samsung's $18M co-marketing commitment and OPPO's $4.1M engineering subsidy effectively offset royalty costs at volumes below the 30 million unit threshold — and the threshold itself is generous enough that none of the three OEMs expected to hit it in 2024. The longer-term risk, which OEM legal teams flagged in negotiations, is that the royalty structure compounds across the device lifecycle. That risk was accepted because the alternative — shipping Snapdragon 8 Gen 4 devices without a private inference story — was judged commercially worse in a year when Apple will use private inference as a premium positioning argument.
How does Qualcomm's private inference architecture compare to Apple's Neural Engine on raw performance?: On the benchmarks Qualcomm published — and which have been independently reproduced by AnandTech and Digital Foundry — the Snapdragon 8 Gen 4 NPU runs a 7-billion parameter quantised model at 12.4 tokens per second sustained. Apple's A17 Pro Neural Engine, running a comparably quantised model, produces 14.1 tokens per second. The Apple advantage is real, approximately 14%, and it traces directly to Apple's ability to co-optimise silicon and framework on the same design team. The practical significance for end users is modest — both systems run fast enough that the bottleneck in most applications is network or display refresh, not inference speed. The significance for enterprise buyers evaluating on-device AI infrastructure is greater, because throughput per device determines the ceiling for how many concurrent inference requests a fleet can handle without cloud spillover.
What happens to the HPIL licensing structure if a major OEM like Samsung decides to switch to its own Exynos NPU on future flagship devices?: The Galaxy AI Partner Agreement signed in January covers Snapdragon 8 Gen 4 devices specifically. If Samsung shifts Galaxy S25 flagships to Exynos 2500 — which Samsung's semiconductor division has publicly targeted for improved NPU performance — the HPIL agreement does not automatically extend. Samsung would need to license or build a comparable inference framework on top of its own NPU, or negotiate a cross-silicon extension with Qualcomm. Holloway's team was aware of this risk during negotiations; the exclusivity clause on medical context summarisation was partly structured to make Exynos-switch economics less attractive in the medium term.
Does HPIL change the commercial case for cloud inference providers like OpenAI or Anthropic serving Android app developers?: Yes, at the margin, and with a lag. HPIL's on-device capability covers the workloads that account for the highest volume of cloud API calls from mobile applications: short-context text generation, classification, and summarisation. These are also the cheapest workloads per token — low revenue per call, high frequency. If HPIL drives meaningful developer migration of these workloads off-cloud, the volume impact on OpenAI and Anthropic is real but the revenue impact is smaller than the volume numbers suggest. The workloads that remain cloud-bound — long-context reasoning, multi-step agentic tasks, image and audio generation — are higher-margin. The mix shift is net positive for cloud inference providers on revenue per token, even as raw call volume falls.

Qualcomm's private inference rollout is not an event. It is the opening move in a multi-year restructuring of where inference happens, who profits from it, and what leverage means in a market where the compute is no longer centralised. The Shenzhen meeting in January set the commercial terms. The quarterly earnings calls starting in Q2 will show whether those terms hold.

The story of on-device inference has been written as an Apple story for two years. Qualcomm's HPIL program, three OEM contracts, and a licensing structure designed to scale to 400 million devices just changed the noun. What has not changed is the underlying dynamic: the company that controls the inference layer controls the premium positioning for every application that runs on top of it. Qualcomm's bet is that controlling the hardware and the framework certification — without controlling the OEM — is enough. The answer arrives this quarter.