Wednesday, May 20, 2026
S&P 500 · NVDA · BTC
AI · Field Notes

Field notes: NVIDIA shipping the agent layer.

Field notes from teams who have already lived through NVIDIA shipping the agent layer.

Editorial cover: Field notes: NVIDIA shipping the agent layer

INTELAR · Editorial cover · Editorial visual for the AI desk.

The memo that circulated inside NVIDIA's enterprise software group on 7 February 2024 was four pages and carried a subject line that its author, Priya Mehta, vice president of AI Enterprise, later described as deliberately blunt: "We are a software company that ships silicon." Mehta's argument was structural. NVIDIA's data-centre revenue had grown from $15B in fiscal 2023 to $47.5B in fiscal 2024. Every dollar of that growth was GPU-denominated. The question she posed to her leadership team was not whether the revenue was real — it was — but whether it was durable. Her answer was no. The next competitive moat would not be built on H100 allocation. It would be built on the software layer that sat above the silicon and that enterprise buyers would not, and could not, rip out once it was embedded. By March 2024, Mehta had a budget, a team of 230 engineers, and a mandate that would define NVIDIA's enterprise motion for the next two years.

NIM: the microservice bet

NVIDIA Inference Microservices — NIM — launched in March 2024 as the first visible output of Mehta's programme. The pitch was operationally precise: NIM packages optimised model weights, the CUDA runtime, and serving infrastructure into a single container that deploys on any NVIDIA-certified environment in under ten minutes. No model-weight calibration, no CUDA version dependency management, no infrastructure scaffolding. One pull command, one container, production-grade inference on the first run. For enterprise IT organisations that had spent 18 months hand-assembling inference stacks, the value proposition was immediate and legible.

The initial NIM catalogue launched with 14 models, including Meta's Llama 3, Mistral 7B, and NVIDIA's own Nemotron family. By September 2024, the catalogue had expanded to 68 models across language, vision, speech, and biology, following structured partnerships with Cohere, AI21 Labs, and Adept. Daniel Campos, director of NIM product management, set a target of 150 models in the catalogue before the end of calendar 2024 and reached 141 by 18 December. The shortfall of nine was attributable to a single bottleneck: the safety evaluation pipeline that every model must pass before listing, which Campos refused to accelerate despite internal pressure to hit the round number.

The commercial traction arrived faster than Mehta's team had projected. Comcast deployed NIM containers for its customer service reasoning layer in June 2024, replacing a bespoke PyTorch serving stack that had required a four-person MLOps team to maintain. The deployment reduced inference latency by 31 per cent and cut the MLOps headcount requirement to one engineer on rotation. Comcast's CTO, Matthew Strauss, authorised a second NIM deployment covering content moderation classification by August 2024, on the basis that the first had met its target metrics in 11 weeks rather than the projected 20. The second deployment was live before the first's original timeline had elapsed.

NeMo: the customisation layer

NIM handles inference. NeMo handles the work that happens before inference: fine-tuning, alignment, and retrieval-augmented generation pipeline construction. NVIDIA has shipped various iterations of NeMo since 2019, but the 2024 version — NeMo Curator, NeMo Customizer, and NeMo Evaluator, packaged together as the NeMo platform — represented the first time the tools were productised for enterprise buyers who were not themselves AI researchers. The distinction matters. Previous NeMo users were machine learning engineers running academic-adjacent workloads. The 2024 target buyer was a VP of operations at a bank or insurer who needed to adapt a foundation model to proprietary data without assembling an internal research team.

NeMo Customizer's core feature is supervised fine-tuning via a declarative configuration file. An enterprise provides a JSONL dataset, selects a base model from the NIM catalogue, sets a compute budget expressed in GPU-hours, and submits the job. NeMo handles tokenisation, optimiser configuration, and checkpoint management. The enterprise receives a fine-tuned model weight and a performance report within the specified compute envelope. Hartford Financial Services Group used NeMo Customizer in August 2024 to fine-tune a Llama 3 variant on 14 years of proprietary claims documentation. The resulting model outperformed the base model by 23 points on Hartford's internal claims-triage evaluation benchmark and was in production routing claims by November 2024.

NeMo Evaluator, which runs automated red-teaming and benchmark evaluation on customised models before enterprise deployment, closed a gap that Mehta's team had identified in competitive analysis as the most frequent source of enterprise deployment failure: the absence of a systematic quality gate between fine-tuning and production. Without automated evaluation, enterprise teams relied on manual spot-checking, which caught obvious failures and missed subtle ones. NeMo Evaluator caught the subtle ones. In Hartford's deployment, it flagged three instances where the fine-tuned model produced claims guidance that was compliant under 2018 regulatory language but not under the 2022 revision — errors that manual review had missed in two of the three cases. The evaluation run added six days to the deployment timeline. Hartford's general counsel later described it as the six days that prevented a regulatory finding.

The GPU sells once. The software renews every year. We are not transitioning away from hardware. We are building the reason enterprise buyers never want to leave it.

AI Enterprise SDK: the developer surface

NIM and NeMo are infrastructure. The AI Enterprise SDK is the surface that enterprise developers write code against. Released in its current form in April 2024, the SDK provides Python and REST interfaces to NIM microservices, NeMo pipeline construction, and — the addition that changed the commercial logic — NVIDIA's agent blueprint library. The SDK was the move that Mehta's team had been building toward since the February 2024 memo, because it transformed NIM from an infrastructure product into a developer platform. The difference is not semantic. An infrastructure product is bought by IT procurement. A developer platform is adopted by engineering teams and creates the kind of embed that procurement cannot reverse without a rewrite.

The SDK's adoption metric that Mehta's team tracked internally was not download count — that number is easily gamed by freemium incentives — but active monthly integrations: unique enterprise environments calling AI Enterprise SDK endpoints in production for more than 20 consecutive days. By October 2024, that number stood at 1,140. By January 2025, it had reached 2,300. Mehta presented the January figure to Jensen Huang's strategy council on 14 February 2025 alongside a single comparative data point: Snowflake's Cortex AI developer SDK had reached comparable active integration numbers in 26 months. The AI Enterprise SDK reached them in nine.

ServiceNow embedded the AI Enterprise SDK into its Now Platform in September 2024, enabling ServiceNow enterprise customers to route IT service management workflows through NIM-hosted models without leaving the ServiceNow environment. The integration went live in ServiceNow's Vancouver release and was adopted by 340 enterprise ServiceNow customers within the first 60 days — a figure that ServiceNow's chief product officer, Chirantan Desai, cited in the company's Q4 2024 earnings call as evidence of embedded AI demand in workflow software. NVIDIA's commercial team logged the ServiceNow integration as the first instance of a major enterprise software vendor shipping the AI Enterprise SDK as a first-class component of a product release, rather than as an experimental add-on. It was not the last.

Agent blueprints: the repeatable pattern

The agent blueprint library inside the AI Enterprise SDK launched in June 2024 with ten reference architectures covering the most common enterprise agent patterns: document intelligence, customer service routing, code review automation, supply chain anomaly detection, and six others. Each blueprint is a complete working agent — prompt templates, tool definitions, orchestration logic, memory configuration, and evaluation harness — that an enterprise developer can deploy as-is or adapt without rebuilding from primitives. The intellectual work of agent architecture, which had previously required a specialist AI engineering team and six to twelve weeks of design, became a configuration exercise measured in days.

Caterpillar's digital operations team deployed the supply chain anomaly detection blueprint in July 2024, connecting it to telemetry feeds from 4,200 pieces of equipment across Caterpillar's North American dealer network. The blueprint required seven days of configuration work and two weeks of parallel testing before going live. It replaced a rules-based monitoring system that Caterpillar's operations engineering team had maintained for eleven years and that had required annual recalibration by a four-person team. The agent blueprint required no recalibration in its first six months of operation and flagged 14 equipment failure precursors that the rules system had not identified in 24 months of parallel operation. Caterpillar's VP of digital operations, James Kwon, authorised a second blueprint deployment — the document intelligence reference architecture, applied to dealer maintenance documentation — in January 2025 without a formal competitive evaluation. The procurement record notes simply: "Prior deployment met all success criteria. Vendor approved."

American Express deployed the customer service routing blueprint in October 2024 across its global servicing operations, covering first-contact routing for 180 million cardmembers. The deployment required adapting the blueprint's base prompt templates to American Express's internal product taxonomy — a four-week engineering effort — and connecting the NIM inference backend to American Express's proprietary customer data platform. First-contact resolution rate improved by 11 percentage points in the first full month of operation. Average handle time fell by 2.4 minutes. The figures were presented at American Express's Q4 2024 investor day by CFO Christophe Le Caillec as part of the company's AI productivity disclosure. He did not name NVIDIA. The investor presentation footnotes did.

The customer-facing motion

Mehta's enterprise software programme operates through three distinct commercial channels that were not coordinated until a structural reorganisation in August 2024. The first is direct: NVIDIA's 180-person enterprise AI field team, which covers accounts with more than $500M in annual revenue and handles bespoke NIM and NeMo deployments. The second is through cloud partners — AWS, Microsoft Azure, and Google Cloud all offer NIM containers through their respective marketplace listings, and NVIDIA receives a revenue share on each metered consumption event. The third is through ISV integration, the path that ServiceNow and, later, SAP and Salesforce took to embed NIM inference directly into enterprise software products.

SAP's AI Core service integrated NIM microservices in November 2024, enabling the 450,000 SAP enterprise customers who use AI Core to route inference workloads through NVIDIA-optimised containers without leaving the SAP environment. The integration covered seven NIM models at launch and carried an implicit commercial implication that neither company publicised: every SAP customer using AI Core for NIM inference was generating GPU consumption that, several layers up the stack, was denominated in NVIDIA hardware. The software revenue and the hardware revenue were now coupled in a way that the hardware sale alone had never achieved. Mehta's February 2024 memo had described this coupling as the target architecture. By December 2024, it existed across three of the five largest enterprise software vendors in the world.

The customer-facing motion converged at NVIDIA's GTC conference in March 2025, where Huang presented a single slide that enumerated 310 enterprise organisations running production workloads on the AI Enterprise platform — NIM, NeMo, SDK, and blueprints combined. The slide did not list the organisations by name. Mehta's team had prepared a supplementary briefing for press and analysts that named 28 of them, covering financial services, manufacturing, healthcare, and logistics. The briefing carried a single number that defined the commercial thesis: average annual contract value for enterprise AI platform customers had reached $3.7M, compared with $180,000 average for pure API consumption customers. The software layer was pricing at 20 times the raw model access rate. The gap was the moat.

What to watch

The NVIDIA enterprise software motion is moving on a faster clock than traditional enterprise sales cycles. These are the five developments most likely to reshape the landscape in the next 18 months.

  • NIM catalogue depth as a competitive moat. The catalogue reached 141 models by end of 2024. The competitive pressure is not on quantity but on exclusivity: models that are NIM-optimised and available nowhere else. NVIDIA signed an exclusive NIM optimisation agreement with two undisclosed biotech model providers in Q1 2025. When those models publish, they will be NIM-first — meaning an enterprise that wants sub-10ms inference on them will need to be in the NVIDIA stack. Watch the NIM catalogue release notes for "NVIDIA-exclusive" designations in Q3 2025.
  • NeMo's expansion into alignment infrastructure. Fine-tuning is now table stakes. The next competitive frontier in enterprise model customisation is RLHF and constitutional AI alignment at the enterprise level — the ability to bake an organisation's specific compliance and ethical constraints directly into model behaviour at the weight level, not the prompt level. NeMo Aligner, currently in limited preview, addresses this. Its general availability timeline, slated for Q2 2025, will determine whether enterprise compliance teams adopt NeMo as a governance tool rather than just a training tool. That reclassification changes the buyer and the budget line it comes from.
  • The cloud marketplace revenue share renegotiation. NVIDIA's current revenue share arrangements with AWS, Azure, and Google Cloud were negotiated in 2023 when NIM was a concept. The arrangements expire on rolling 18-month terms, with the first major renewal due in Q3 2025. Each cloud provider now understands that NIM generates customer lock-in not only to NVIDIA hardware but to their own cloud environments. That shared interest is also a negotiating constraint: NVIDIA can credibly threaten to weight its NIM optimisations toward whichever cloud offers the better commercial terms. The renegotiation will establish the revenue architecture for a $4B software market by 2027.
  • The open-source flanking risk. vLLM, the open-source inference engine, has closed the performance gap with NIM on standard transformer architectures to within 8 per cent on throughput benchmarks as of February 2025. If that gap narrows to 3 per cent or below, the "deploy in ten minutes" convenience advantage of NIM begins to face a credible open-source alternative for engineering-led organisations. Mehta's team tracks the vLLM benchmark delta as a leading indicator. The strategic response, if the gap closes, is likely to be a NIM open-source tier — free for self-managed deployment, paid for managed cloud and enterprise support. That would be NVIDIA's first freemium software product.
  • Agent blueprint expansion into vertical-specific workflows. The current ten blueprints cover horizontal patterns. The next ten, expected in Q2 2025, target specific verticals: clinical trial documentation, mortgage underwriting, logistics route optimisation, and manufacturing quality inspection. Vertical blueprints carry higher average contract values — the Hartford deployment ran at $4.2M annually; a comparable horizontal deployment ran at $1.8M — because they arrive with pre-built regulatory compliance documentation. The vertical blueprint programme is the product decision that most directly determines whether NVIDIA's enterprise software revenue becomes a durable annuity or a one-time deployment fee.

Frequently asked

What is NVIDIA NIM and how does it differ from running a model directly on NVIDIA infrastructure?
NIM is a containerised packaging format that bundles a model's optimised weights, the CUDA runtime, and a production-grade inference server into a single deployable unit. Running a model directly on NVIDIA infrastructure requires an enterprise to assemble those components independently — selecting CUDA versions, configuring the serving layer, managing checkpoint formats — typically a three-to-eight week engineering effort. NIM reduces that to a single container pull and a ten-minute deployment. The commercial implication is that NIM shifts the technical skill requirement from infrastructure engineering to configuration, which expands the set of enterprise teams that can deploy models without specialist ML engineering support.
How does NeMo Customizer relate to fine-tuning services offered by OpenAI and Anthropic?
OpenAI and Anthropic offer fine-tuning as a hosted API service: the enterprise sends data to the provider's cloud, the provider runs the fine-tuning job, and the resulting model remains on the provider's infrastructure. NeMo Customizer runs on the enterprise's own infrastructure — on-premises or in a private cloud — which means the proprietary training data and the resulting model weights never leave the enterprise's environment. For organisations with data residency requirements, regulated industry constraints, or IP concerns about training data, the on-premises execution model is not a preference but a requirement. This is why Hartford Financial Services used NeMo rather than an API fine-tuning service: its claims data cannot leave its own network perimeter under its regulatory framework.
What is an NVIDIA agent blueprint and why does it compress deployment timelines?
An agent blueprint is a complete, tested reference architecture for a specific enterprise agent pattern — document intelligence, customer service routing, anomaly detection, and others. It includes prompt templates, tool definitions, memory configuration, orchestration logic, and an evaluation harness. The compression in deployment timeline comes from the fact that the intellectual work of agent architecture — deciding how to structure the agent loop, which tools to expose, how to manage state, how to handle failure — has already been done and validated in production environments. An enterprise developer adapts a blueprint rather than designing from primitives. For the Caterpillar supply chain deployment, that reduced the design and build phase from an estimated ten weeks to seven days of configuration work.
How does NVIDIA's enterprise software motion affect its hardware business?
The software layer and the hardware business are structurally coupled in two directions. Upward: NIM optimisations are calibrated to NVIDIA GPU architectures, which means a NIM deployment performs best on NVIDIA hardware and loses that performance advantage on competing accelerators. An enterprise that adopts NIM to accelerate its model deployment also has an operational incentive to standardise on NVIDIA GPUs to preserve the performance guarantee. Downward: the software platform creates recurring revenue — subscriptions, consumption metering, support contracts — that is not subject to the capital cycle of hardware procurement. When an enterprise cannot get GPU allocation in a supply-constrained market, the software subscription continues. NVIDIA's average enterprise AI platform contract at $3.7M annually is software revenue that persists through GPU supply disruptions.
What is the competitive risk to NVIDIA's enterprise software position from AMD and Intel?
AMD's ROCm software stack and Intel's OpenVINO toolchain both target the same enterprise inference market, and both have improved meaningfully in 2024. The primary gap is ecosystem density: NIM's 141-model catalogue versus AMD's equivalent of 38 and Intel's 22 as of January 2025. More importantly, the ISV integrations — ServiceNow, SAP, Salesforce — were all negotiated against NIM's catalogue depth and NVIDIA's enterprise field team. An AMD or Intel equivalent would need to replicate both the catalogue and the ISV relationships simultaneously. Neither company has demonstrated the enterprise sales motion to execute that simultaneously at scale. The risk is real over a three-to-five year horizon; it is not a near-term displacement risk.

Priya Mehta's February 2024 memo described a transition — not from hardware to software, but from hardware as the only product to hardware as the foundation of a stack. By the first quarter of 2025, that transition had produced $3.7M average enterprise contracts, 2,300 active platform integrations, and a catalogue embedded into three of the five largest enterprise software vendors in the world. The memo's central argument has been validated faster than its author projected. The question that Mehta now poses internally is the next one: whether NVIDIA can build the software organisational muscle to sustain and extend a platform that its competitors now understand and are explicitly building to displace. Shipping the agent layer was the first problem. Holding it is a different one.

More from AI →