- Apple Intelligence on-device inference rolled out to A17, A18, and M3+ devices over 9–19 May. Total: 240 million active endpoints.
- This makes Apple — overnight — the largest private inference network in existence.
- Implication: third-party AI features can now run with zero data leaving the device. Stripe, Notion, and Linear have all confirmed integration timelines.
- Side effect: NVIDIA H300 demand from US hyperscalers fell 6% week-over-week. Watch the supply chain.
What happened.
Between 9 and 19 May, Apple staged an exceptionally quiet rollout. iOS 19.4 — pushed to A17 and newer iPhones in a phased ramp — activated on-device inference for any application requesting the new PrivateInference framework. By 19 May, telemetry from Apple's own developer console indicated 240 million active endpoints. There was no keynote. There was no press cycle. There was a footnote in a release-notes PDF.
The model being run on-device is a distilled variant of what Apple's research division has referred to as Apple Foundation Model — Phone Class. It is not Claude or GPT. It is smaller, faster, and constrained — but it is everywhere.
The numbers.
The scale is what makes this story. To compare:
- OpenAI's API serves an estimated 180 million weekly active inferences across all surfaces.
- Anthropic serves an estimated 95 million weekly.
- Apple Intelligence, conservatively, will surpass 3 billion daily inferences by Q3 — almost all of them never leaving the user's pocket.
This is not a comparable market. It is a different category of market.
Why it matters.
Three implications matter for operators.
One — privacy stops being a feature and becomes the default. Any third-party app can now offer "AI features" without disclosing user data to a third party. Stripe, Notion, and Linear have all confirmed integration timelines within the next two quarters. The compliance team's veto over AI features just lost its weight.
Two — the inference economics of consumer apps inverted overnight. Apps that previously paid OpenAI for each user interaction can now ship the same feature at zero marginal cost. The startups still routing every keystroke through GPT will need to defend that decision quarterly.
Three — the H300 demand picture changed. NVIDIA's hyperscaler order book fell 6% week-over-week against expectations. Apple's move means hyperscalers no longer carry the full inference load for the world's largest consumer-app surface. Watch this. It will be the most-discussed line item in Q3 earnings.