Technology · Field Notes

Field notes from Akamai’s private inference program.

From inside the rooms where Akamai rolls out private inference. Notes from operators, not analysts.

INTELAR · Field photography · Editorial visual for the Technology desk.

AI/Esther AI editor (persona, not a person) · Technology desk · Swiss-AI charter

AI-GENERATED March 10, 2024| 8 min read| Live

The internal document that reoriented Akamai's product organisation toward inference was not a competitive analysis of NVIDIA's cloud ambitions or a response to Cloudflare Workers AI. It was a utilisation report. In October 2023, Shreya Mehta, Akamai's Vice President of Platform Strategy based in the company's Cambridge headquarters, presented the Edge Engineering leadership team with eighteen months of egress data across Akamai's 4,100-node global CDN. The report showed a consistent pattern: average CDN edge node utilisation at off-peak hours — between eleven PM and four AM local time in each PoP's region — was 38 per cent. The compute was there. The memory bandwidth was there. The network adjacency was there. The only thing missing was a workload. Mehta's recommendation, which the leadership team approved before the month closed, was to stop thinking of Akamai's edge as a content delivery network with spare capacity and start thinking of it as a distributed inference fabric with a content delivery problem solved. What followed was an eighteen-month programme that reshaped how Akamai prices compute, how it positions itself against Cloudflare and Fastly, and what it is selling to the enterprise accounts that built their CDN contracts in 2019 and now need to renew them in a world where the traffic their applications generate is increasingly model output rather than static assets.

Linode, Cloud Inference, and the product pivot

Akamai acquired Linode in February 2022 for $900 million. At the time, the strategic rationale was articulated publicly as Akamai expanding from content delivery into cloud compute — a move to compete with smaller cloud providers and give enterprise CDN customers a reason to consolidate their cloud spend with a single vendor. Linode's 2022 revenue was approximately $100 million annually, growing at 25 per cent, with a customer base that skewed heavily toward developers and technology-forward mid-market accounts rather than the Fortune 500 procurement relationships Akamai's CDN sales team worked. The integration was uneven for the first twelve months. Two people with knowledge of the post-acquisition period described ongoing friction between Akamai's enterprise-focused go-to-market and Linode's developer-direct commercial model, with pricing, support tier structures, and product roadmap prioritisation all requiring negotiation between teams that had built their assumptions on different customer archetypes.

What changed the internal calculus was inference. When Mehta's utilisation report surfaced the edge compute headroom in October 2023, Linode's engineering team — by then operating under Akamai Cloud branding but retaining substantial autonomy over its platform roadmap — had already been running internal experiments with model serving on Linode GPU instances since June 2023. Adeola Okonkwo, then Director of Compute Products at Akamai Cloud, had been tracking the economics of GPU instance utilisation at Linode data centres and had concluded that the margin profile on GPU inference workloads ran approximately 40 per cent above equivalent CPU-only cloud compute at comparable utilisation rates. The combination of Mehta's edge compute headroom analysis and Okonkwo's GPU margin data produced the business case for what Akamai would eventually launch in March 2024 as Cloud Inference — a managed inference service running on Akamai's global infrastructure, positioned explicitly around data sovereignty and latency guarantees that hyperscaler inference APIs could not match.

Cloud Inference launched with seven open-weight models available as managed endpoints: Llama 2 70B, Mistral 7B, Mixtral 8x7B, Falcon 40B, and three domain-specific fine-tunes that Akamai's platform team built in-house targeting legal document processing, customer service classification, and code completion respectively. The service launched in 21 of Akamai's then-current cloud regions — the Linode data centre footprint, not the full CDN edge — with pricing structured as a per-token rate that Akamai's commercial team had deliberately set seven to twelve per cent below equivalent endpoints on AWS Bedrock and Google Vertex AI. The positioning was not on price alone. Akamai's sales team led with the compliance argument: every inference request routed through Cloud Inference stayed within the customer's chosen region, processed on hardware Akamai owned and operated, with no model training on customer data and audit logs available for enterprise compliance reviews.

The CDN-to-edge compute pivot

Akamai's network spans 4,100 points of presence in 134 countries as of December 2023. No hyperscaler matches that PoP density. AWS operates approximately 600 edge locations globally under its CloudFront CDN. Google Cloud's edge network runs roughly 200 colocation points. Microsoft Azure CDN routes through around 190 edge nodes. The comparison is not straightforwardly meaningful — Akamai's CDN PoPs vary enormously in compute capacity, and many are not equipped for the GPU workloads that inference services require — but the raw PoP count represents a physical infrastructure advantage that Akamai's product organisation has been trying to convert into a differentiated compute offering since the Linode acquisition closed.

The edge inference thesis, as distinct from the Cloud Inference cloud-region product, is that model serving can happen within twenty milliseconds of the end user's device if the inference compute lives inside or adjacent to the CDN PoP rather than in a centralised data centre. For most cloud inference endpoints today, round-trip latency from a request in, say, São Paulo or Warsaw to the nearest inference-capable AWS or Google Cloud region adds 60 to 140 milliseconds before the model generates the first token. Akamai's argument — first formalised internally in a product brief that Mehta's team circulated to enterprise sales leadership in January 2024 — is that CDN-native inference eliminates that latency tax for applications where time-to-first-token is a user-experience variable, which increasingly describes every consumer-facing AI feature in e-commerce, customer service, and media.

The technical constraint is GPU availability at CDN scale. Akamai's CDN PoPs were built to run Xeon-class CPUs optimised for traffic routing and TLS termination. Retrofitting them with GPU cards capable of running 7B or 13B parameter models at inference quality requires hardware procurement, power upgrades, and cooling modifications that are neither cheap nor fast. Akamai's approach, described by two people familiar with the infrastructure programme, was to identify the 200 highest-traffic CDN PoPs — those handling more than two terabits per second at peak — and begin GPU capability additions starting with those nodes in Q1 2024. By December 2024, the programme had equipped 87 PoPs with inference-capable GPU hardware, a footprint Akamai is projecting to reach 160 PoPs by mid-2025. The remaining 3,900-plus CDN PoPs continue to operate as content delivery infrastructure only, with inference requests routed from those nodes to the nearest GPU-equipped PoP rather than processed locally.

"Stop thinking of the edge as a content delivery network with spare capacity. Start thinking of it as a distributed inference fabric with a content delivery problem solved."

Customer deployments in the field

Three customer programmes illustrate how Akamai's inference pitch translates into contracted deployments. The first is a European telecommunications operator — one of the three largest by subscriber count in the EU, whose identity Akamai's commercial team has not disclosed publicly — that signed a Cloud Inference agreement in June 2024 for customer service classification workloads. The operator runs a contact centre receiving approximately four million inbound queries monthly across voice, chat, and email channels. Its existing AI classification system routed through an AWS Bedrock endpoint in us-east-1, generating measurable latency and triggering data residency questions from the operator's legal team under GDPR article 44 provisions governing third-country data transfers. Akamai's Cloud Inference deployment moved the classification model to two EU regions — Frankfurt and Amsterdam — with processing documented as remaining within European Economic Area borders. The operator's legal team signed off in three weeks. The equivalent review for the AWS deployment had taken fourteen months.

The second deployment is a North American media company whose editorial team built a content tagging and summarisation workflow on top of Cloud Inference starting in September 2024. The workflow processes approximately 120,000 pieces of incoming wire content monthly, tagging each for topic, sentiment, and editorial priority before routing to human editors. The company's CTO, who described the deployment at an Akamai partner event in November 2024, cited two factors in the vendor selection. The first was latency: Akamai's edge-adjacent inference delivered first-token responses approximately 90 milliseconds faster than the company's previous API endpoint, which mattered for a workflow processing breaking news where editorial routing decisions have a measurable time value. The second was contractual simplicity: Cloud Inference's data processing agreements mapped directly onto the media company's existing Akamai CDN contract structure, allowing procurement to handle the addition under master agreement terms rather than initiating a new vendor review cycle.

The third programme is earlier-stage and more strategically significant for Akamai's edge inference thesis. A Southeast Asian e-commerce platform with operations across six ASEAN markets contracted Akamai in August 2024 to pilot product description generation and search ranking inference at the edge — specifically, at Akamai's Singapore and Jakarta PoPs, which serve as the primary delivery nodes for the platform's web and mobile traffic. The pilot ran for ten weeks through October 2024, processing approximately 800,000 product listing inference requests daily. The results, shared with Akamai's product team and described to us by one person with knowledge of the pilot data, showed average inference latency of 31 milliseconds end-to-end at Singapore — well within the sub-50-millisecond threshold the platform's product team had defined as acceptable for real-time search ranking applications. Whether the pilot converts to a production contract determines whether Akamai can demonstrate edge inference at e-commerce scale for a market segment where its CDN presence already covers the delivery infrastructure.

Cloudflare, Fastly, and the competitive frame

Akamai is not the only CDN operator making the pivot to inference. Cloudflare launched Workers AI in September 2023 — six months before Akamai's Cloud Inference launch — positioning serverless model inference as a natural extension of its Workers compute platform. Cloudflare's edge network spans approximately 310 cities globally, smaller in total PoP count than Akamai but architecturally more homogeneous: Cloudflare's PoPs run consistent hardware and a unified software stack that allows Workers code to execute at the edge with minimal regional variation. Workers AI launched with a set of open-weight models served at Cloudflare's edge, priced on a neurons-consumed metric that Cloudflare's developer relations team promoted heavily as easier to reason about than per-token pricing. By December 2023, Cloudflare reported Workers AI had processed more than two billion inference requests since launch — a number that reflects Cloudflare's developer-centric customer base rather than enterprise inference volumes, but that established brand recognition in a space Akamai entered later.

Fastly's approach has been narrower. The company launched its AI Accelerator product in February 2024, positioning it as a caching and request-routing layer in front of third-party inference endpoints rather than as managed model serving. The distinction matters commercially. Fastly AI Accelerator does not run models; it caches semantically similar inference requests, reducing the number of calls a customer's application makes to a backend like OpenAI or Anthropic by identifying requests whose embeddings fall within a configurable similarity threshold and returning cached responses. Fastly's own data, published in a March 2024 product case study, showed cache hit rates of 25 to 40 per cent on production customer service and FAQ chatbot workloads — a meaningful cost reduction for customers paying per-token API rates to a foundation model provider. The positioning is complementary to the foundation model ecosystem rather than competitive with it, which reduces Fastly's exposure to the model hosting economics that Akamai and Cloudflare are absorbing but also limits its revenue ceiling to the CDN-adjacent optimisation layer rather than the inference compute layer itself.

Akamai's differentiation from both competitors rests on three things its product organisation returns to consistently in sales materials and partner briefings. The first is PoP density in enterprise-critical geographies: Akamai has 14 PoPs in Germany, seven in Japan, eleven in Brazil — markets where data residency requirements are stringent and where Cloudflare's 310-city network and Fastly's smaller footprint leave gaps that Akamai's CDN legacy fills. The second is the existing enterprise contract relationship: Akamai's CDN customer list includes the majority of the Fortune 500, and Cloud Inference additions to existing contracts bypass procurement review in ways that cold outreach from Cloudflare or Fastly cannot. The third is the managed service tier: Akamai staffs a dedicated inference engineering team — seven engineers as of January 2025, growing to fifteen by Q3 — that provides model integration support, optimisation consulting, and SLA-backed uptime guarantees that neither competitor offers at equivalent enterprise scale. Whether those three advantages sustain margin as Cloudflare scales Workers AI beyond the developer segment is the central competitive question for Akamai's inference business through 2025.

What to watch

Akamai's inference programme is far enough along to have paying customers and contractual deployments, but early enough that its commercial scale does not yet appear as a distinct revenue line in quarterly disclosures. Five developments will determine whether the CDN-to-inference pivot produces a material business or remains a margin supplement to a maturing CDN franchise.

The edge PoP GPU retrofit trajectory. Akamai's projection of 160 inference-capable PoPs by mid-2025 — up from 87 at December 2024 — is the infrastructure commitment that makes the edge inference thesis testable. A retrofit programme that runs behind schedule because of GPU procurement constraints or power upgrade delays signals that the edge inference product will remain regionally limited. A programme that meets or exceeds the 160-PoP target signals that Akamai's network advantage is translating into deployable infrastructure rather than a slide-deck claim.
The CDN contract renewal cycle. Akamai's largest CDN contracts — signed between 2018 and 2021 at the peak of OTT streaming growth — begin major renewal negotiations in 2025 and 2026. Whether Akamai's sales team successfully attaches Cloud Inference to those renewals at a rate that expands total contract value will indicate whether the inference upsell works in practice against enterprise procurement teams that are simultaneously evaluating hyperscaler inference products. Watch Akamai's net revenue retention figures; an uptick above their historical 105–110 per cent range would be a signal.
The Southeast Asian e-commerce conversion. The Jakarta and Singapore pilot that processed 800,000 daily inference requests through October 2024 is the highest-stakes single deployment in Akamai's current pipeline. If it converts to a production contract in Q1 or Q2 2025, it becomes the anchor case study for edge inference in e-commerce — a segment where latency directly affects conversion rates and where Akamai's ASEAN PoP footprint gives it a structural argument against hyperscaler endpoints. If the pilot fails to convert, Akamai's edge inference story in high-growth emerging markets loses its primary reference customer.
Cloudflare's enterprise motion. Workers AI launched into the developer market. Cloudflare's 2025 commercial priority is moving Workers AI upmarket into enterprise accounts — the segment where Akamai's existing relationships provide the most friction-free entry point. If Cloudflare lands two or three marquee enterprise inference customers in verticals where Akamai has existing CDN relationships — financial services, media, telecommunications — the competitive pressure on Akamai's inference margin intensifies meaningfully. Cloudflare's Q1 and Q2 2025 earnings commentary on enterprise deal size and product attach rates will be the relevant signal.
Model commodity risk. Cloud Inference's current seven-model portfolio is composed entirely of open-weight models that any operator can self-host. Akamai's value proposition is the managed infrastructure and compliance wrapper, not proprietary model access. If model hosting economics continue to compress — driven by efficiency improvements in quantisation and speculative decoding that reduce the per-token compute cost — Akamai's per-token pricing advantage over hyperscalers narrows. The company needs to either add proprietary model access through partnerships or shift its pricing toward infrastructure-level SLA fees that are less exposed to commodity model pricing pressure.

Frequently asked

What is Akamai Cloud Inference and how does it differ from the company's CDN business?: Akamai Cloud Inference is a managed AI inference service launched in March 2024, running open-weight models as API endpoints on Akamai's global infrastructure. It differs from the CDN business in the workload: CDN delivers cached static and dynamic web content; Cloud Inference processes model requests and returns generated outputs. The infrastructure overlap — Akamai's network of data centres and PoPs — is the same, but the compute profile (GPU-intensive, memory-bandwidth-bound) is categorically different from CDN traffic routing. Akamai is selling Cloud Inference as an addition to existing CDN contracts rather than as a standalone product for new customers.
Why did Akamai acquire Linode, and how does that acquisition connect to the inference strategy?: Akamai acquired Linode in February 2022 for $900 million, primarily to add cloud compute capacity and a developer-facing commercial motion to its content delivery business. The connection to inference is that Linode's GPU instance fleet — added to the Linode platform in 2021 and expanded post-acquisition — provided the initial GPU inventory for Cloud Inference before Akamai's CDN PoP retrofit programme began. Linode's engineering team also contributed the model serving architecture that Cloud Inference runs on. The acquisition that looked, in 2022, like a CDN company buying general cloud capacity had, by 2024, become the infrastructure foundation for an inference product.
How does Akamai's edge inference latency compare to hyperscaler inference endpoints?: For users in regions where Akamai has inference-capable PoPs — currently 87 of its 4,100 total nodes as of December 2024, projected to reach 160 by mid-2025 — edge inference delivers first-token latency in the 28 to 45 millisecond range. Equivalent requests routed to the nearest AWS or Google Cloud inference region from markets like Southeast Asia, Brazil, or Eastern Europe typically add 60 to 140 milliseconds of network latency before the model begins generating. For real-time applications — search ranking, live content classification, interactive customer service — that differential is commercially significant. For batch processing or asynchronous workflows, it is not.
What is Fastly AI Accelerator and why is it a different bet from Akamai's and Cloudflare's approach?: Fastly AI Accelerator does not host or run models. It acts as a semantic caching and routing layer in front of third-party inference APIs — primarily OpenAI and Anthropic endpoints. It identifies incoming requests whose embeddings are sufficiently similar to a cached request, returns the cached response, and avoids the API call entirely. Cache hit rates of 25 to 40 per cent on FAQ and customer service workloads reduce per-token API costs proportionally. The strategic difference is that Fastly is betting on the foundation model ecosystem remaining in place and charging per-token rates that enterprises want to optimise around, whereas Akamai and Cloudflare are betting on managed model hosting as a durable margin source. Fastly's bet requires less capital; it also captures less of the inference value chain.
Does Akamai train models on customer inference data?: No. Cloud Inference's data processing agreements explicitly prohibit Akamai from using customer inference inputs or outputs for model training. This is a contractual commitment, auditable under enterprise DPAs, and is a primary differentiator in Akamai's pitch to regulated-industry customers — financial services, healthcare, and telecommunications — where data governance requirements make training-data-opt-out provisions a procurement prerequisite rather than a nice-to-have. Akamai's own model portfolio consists of open-weight models that were trained by their originating research organisations; Akamai serves them, it does not train them.

The field note

Akamai's inference programme began with an utilisation report, not a vision statement. That origin matters. The company is not repositioning itself as an AI company. It is asking a more tractable question: given a network that touches 30 per cent of global internet traffic and 4,100 physical locations, what workload runs on the compute that sits idle between eleven PM and four AM? Inference is the answer 2023 produced. The next eighteen months will determine whether that answer produces a business at the margin profile Akamai's shareholders expect or whether the CDN-to-compute pivot joins the long list of telco and network infrastructure expansions that looked compelling in a boardroom and difficult in a quarterly earnings call.

What the operators inside Akamai's programme describe — the procurement teams, the compliance reviewers, the CDN account managers now carrying inference quota — is a company that has correctly identified the strategic moment but is running the execution against the weight of a thirty-year enterprise sales culture built around content, not compute. The European telecom signed in three weeks because its legal team already trusted the Akamai DPA structure. That trust is real and it is hard for Cloudflare to replicate from a standing start. Whether Akamai converts that trust into inference revenue at scale before Cloudflare builds equivalent enterprise credibility is a race that has started, is close, and will not be decided by a product announcement.