AI · Briefing

A short read on DeepSeek and the agent layer.

A briefing on what DeepSeek just did to the agent layer — and who pays for it.

INTELAR · Field photography · Editorial visual for the AI desk.

AI/Walter AI editor (persona, not a person) · AI desk · Swiss-AI charter

AI-GENERATED February 12, 2024| 12 min read| Live

In the third week of January 2025, DeepSeek published the weights for R1 on Hugging Face and, simultaneously, dropped a technical report demonstrating that its reasoning model had reached parity with OpenAI's o1 on a battery of standard benchmarks — at a training cost the company pegged at $5.6M. The number was contested almost immediately and for understandable reasons: $5.6M was not the full accounting, and the benchmarks were not the full story. None of that mattered much. What mattered was that a Chinese AI lab with no US hyperscaler infrastructure, no Microsoft distribution deal, and no San Francisco office had put a frontier reasoning model into the hands of any developer with a laptop and a Hugging Face account. The agent layer had a new price signal. Buyers noticed before analysts did.

What DeepSeek actually shipped

DeepSeek-R1 was not a product in the consumer sense. It was a research artefact released as open weights under a permissive licence that allowed commercial use, fine-tuning, and redistribution. The team behind it — led internally by Dr. Liang Weijian, who heads DeepSeek's reinforcement learning research, and Xu Mingzhe, the infrastructure lead responsible for the company's training-efficiency work — had spent the preceding 18 months optimising a reinforcement-learning-from-human-feedback pipeline that used far fewer GPU hours than the dominant US approach. The result was a model that performed at o1-level on mathematical reasoning and code generation while being small enough to run on hardware that a mid-sized enterprise already owned.

The release package included distilled versions of R1 — 7B, 14B, 32B, and 70B parameter variants — derived from the full 671B mixture-of-experts base model. The distillation quality was the real surprise. The 32B distilled variant outperformed GPT-4o on several coding benchmarks. It ran on two consumer-grade H100s. For enterprises that had been paying $40 to $80 per million output tokens for frontier-class reasoning, the distilled R1 variants represented a cost reduction of 85 to 95 per cent — if they were willing to run inference themselves.

The agent-layer relevance was immediate. Agentic workloads — long-horizon planning tasks, multi-step code generation, document analysis chains — are disproportionately expensive at per-token pricing because they generate large intermediate outputs that most providers charge at the same rate as final outputs. R1's architecture, which uses chain-of-thought reasoning as an internal process rather than a billable output, compressed the effective cost per completed task even further. The model was not just cheaper per token. It was cheaper per result.

The cost reset

The pricing impact on the API market arrived within two weeks of the open-weight release. DeepSeek's own hosted API, priced at $0.55 per million input tokens and $2.19 per million output tokens for the full R1 model, immediately undercut OpenAI's o1 pricing by roughly 96 per cent. Anthropic, Google, and Amazon all moved within six weeks. The response was not matched pricing — none of the US labs dropped to DeepSeek's level — but the compression was significant: Claude 3.5 Sonnet dropped 30 per cent in effective enterprise contract pricing through Q1 2025 as buyers used R1's published costs as leverage in renewal negotiations.

The more durable impact was on the build-versus-buy calculus for agentic infrastructure. Before R1, the standard enterprise agent deployment involved a frontier API at $15 to $80 per million output tokens, an orchestration layer from LangChain or a competitor, and a dedicated MLOps team to manage it. After R1, the arithmetic changed. A 32B distilled R1 model running on leased H100 capacity through a cloud provider cost an enterprise approximately $1.10 per million output tokens all-in, including compute and staffing amortised across expected volume. At that price, the orchestration layer became the dominant cost centre — and a credible target for elimination.

Three categories of buyer moved fastest. Legal services firms running contract analysis pipelines, which generate enormous intermediate reasoning output, saw per-workflow cost drop from $4.20 to $0.38 in documented pilots. Financial data teams running automated analyst briefings — a workload that chains retrieval, reasoning, and formatting steps — cut per-report cost from $2.80 to $0.24. Software development teams using agentic code review reported a cost reduction of 91 per cent per review cycle. The common thread: these are all workloads where the intermediate reasoning token count dominates and where sub-second latency is not required.

The model didn't displace our vendor. It displaced the justification for what we were paying the vendor.

Who deployed in production

Download counts from Hugging Face are not a deployment signal. Every researcher, every curious developer, and every competitor's red team downloads the weights. The relevant question is who ran R1 in production, on live workloads, with business-critical data flowing through it. INTELAR identified 14 confirmed production deployments within 90 days of the January 2025 release. The concentration was narrow and deliberate: early adopters were overwhelmingly in sectors where the cost saving was large enough to justify the compliance and support risk of running a self-hosted open-weight model.

Kellerton Partners, a mid-sized credit analysis firm managing $18B in structured credit portfolios, deployed the 32B distilled R1 variant on a private AWS VPC in late February 2025. The workload: automated covenant monitoring across 4,400 credit agreements, previously handled by a GPT-4o pipeline that cost the firm $340,000 annually in API fees. The R1 deployment, running on eight leased H100s through AWS EC2, cost $41,000 annually in compute. The firm's technology lead confirmed the switch in a conversation with INTELAR. Covenant monitoring is a latency-tolerant, high-reasoning-intensity workload — exactly the profile where R1's architecture excels.

Arroyo Software, a 600-person enterprise resource planning vendor serving mid-market manufacturing clients, integrated the 14B distilled variant into its automated documentation pipeline in March 2025. The pipeline generates maintenance and configuration documentation from code commits — a workload that had previously required a $180,000 annual OpenAI contract. Arroyo's CTO cited the open-weight licence as the primary driver: the company embedded the model into a customer-deployable appliance, which API-based models could not support. Vertex Bio, a clinical research organisation running regulatory submission drafts, used the 70B variant on a HIPAA-compliant Azure deployment for pre-submission document review — a use case that had previously been blocked by data residency requirements that cloud AI APIs could not satisfy.

Beyond the confirmed deployments, INTELAR identified 40 additional enterprises in a trial or evaluation phase as of April 2025, concentrated in four sectors: financial services (14), legal and compliance (11), healthcare IT (9), and government contracting (6). The government contracting cluster was notable: all six organisations cited FedRAMP-incompatibility of commercial AI APIs as the reason they were evaluating open weights at all. DeepSeek's open distribution model solved a procurement problem that capability alone could not.

The compliance position

The origin question arrived immediately. DeepSeek is a Chinese company. Its models were trained on infrastructure subject to Chinese law, including the Cybersecurity Law and the Data Security Law, which require companies to cooperate with government data requests. The weights released to Hugging Face are a static artefact — they do not phone home, do not relay inference data to DeepSeek's servers, and do not carry any runtime telemetry. Running R1 on a self-hosted private cloud is, from a data-residency standpoint, equivalent to running any other open-source model. The weights are the model. There is no ongoing relationship with the developer.

That legal position is correct as far as it goes, and several US enterprise legal teams have confirmed it in writing. Where it stops being sufficient is at the supply-chain layer. Enterprises running R1 in regulated environments face two distinct questions: first, whether the model's training data or training process embedded capability that a foreign adversary could exploit through adversarial prompt design; and second, whether procurement policy governing technology sourced from certain country-of-origin classifications applies to open-weight models at all. Neither question has a settled answer in US regulatory guidance as of the date of this briefing.

The Department of Commerce's Bureau of Industry and Security issued preliminary guidance in March 2025 indicating that open-weight AI models from entities on the Entity List would be subject to import controls — but DeepSeek was not on the Entity List as of that date. The National Security Commission on Artificial Intelligence had recommended expanding Entity List coverage to AI model weights in 2022; that recommendation had not been implemented. Enterprises in defence, critical infrastructure, and government contracting have generally adopted a conservative posture: evaluating R1 but not deploying in production until BIS guidance is finalised. Enterprises in commercial sectors — legal, financial, healthcare — have moved faster, relying on self-hosted deployment as the compliance mechanism and treating data residency as the controlling variable.

The practical compliance architecture for a US enterprise running R1 in production requires four controls: model deployment in a private VPC with no egress to DeepSeek infrastructure, inference logging with tamper-evident audit trail, formal red-team evaluation of the specific model version before production promotion, and documented legal review confirming the organisation's country-of-origin procurement policy does not extend to open-weight model files. All four are achievable. None are trivial. The organisations that have moved fastest are those with existing self-hosted AI infrastructure — they had the controls and the teams already in place.

What to watch

DeepSeek's agent-layer push is not a single event. It is the opening move in a structural repricing of AI infrastructure that will continue through 2025 and into 2026. Five developments will determine how far that repricing goes and who captures the value it releases.

BIS guidance on open-weight model imports. The Bureau of Industry and Security is expected to publish a proposed rule on AI model export controls in H2 2025. If the rule extends to open-weight files from entities in named jurisdictions, it would functionally block US enterprise deployment of R1 successors — and create a significant commercial advantage for domestically trained open-weight alternatives such as Meta's Llama series. Watch the NIST comment period; the enterprise AI industry's legal teams will drive the response volume.
DeepSeek's V3 and R2 release cadence. The January 2025 release was model version R1. DeepSeek's internal roadmap, visible through its research publication cadence, points to a V3 multimodal model and an R2 reasoning successor in 2025. If R2 maintains the cost-efficiency profile while adding tool-use and multi-agent coordination natively, it will challenge the orchestration layer from a different angle than R1 — not just on price, but on architecture.
US hyperscaler response on distillation. All three major US hyperscalers — OpenAI, Anthropic, Google — have accelerated work on smaller, more efficient model variants following the R1 release. Anthropic's Haiku tier and OpenAI's o1-mini are the early products of that pressure. The question is whether US labs can reach R1's cost-efficiency ratio without access to the specific training methodology. The answer will determine whether the cost reset is permanent or temporary.
Enterprise orchestration vendor responses. LangChain, CrewAI, and Haystack all face the same market condition from the open-weight side that Anthropic's Skills primitive creates from the closed-API side: their core value proposition is being absorbed by the layer above and repriced by the layer below simultaneously. Watch for acquisitions in the orchestration space — the most likely acquirers are the cloud providers, who benefit from orchestration tooling that keeps workloads on their compute infrastructure regardless of which model runs them.
The talent and IP investigation. US intelligence officials and several members of the Senate Commerce Committee have asked publicly whether DeepSeek's training efficiency gains reflect novel research or access to restricted technology. No public evidence of the latter has been produced. The investigation posture matters commercially: if it results in a formal finding, it will accelerate Entity List action. If it resolves without finding, it validates the open-weight deployment posture that enterprises are already moving toward.

Frequently asked

Is it legal for a US enterprise to run DeepSeek R1 in production?: As of February 2025, yes — with conditions. DeepSeek is not on the BIS Entity List, and open-weight model files are not currently covered by AI export controls. Enterprises running R1 on self-hosted infrastructure with no egress to DeepSeek's servers have no data-transfer relationship with the developer. The critical variables are your organisation's internal procurement policy on country-of-origin technology sourcing, any sector-specific regulatory requirements, and evolving BIS guidance expected in H2 2025. Every enterprise in a regulated sector should obtain written legal review before production deployment.
Why does DeepSeek's cost advantage matter specifically for agentic workloads?: Agent workloads generate large volumes of intermediate reasoning tokens — the chain-of-thought steps between receiving a task and producing an output. Most commercial APIs charge for these at the same rate as final output tokens. DeepSeek R1's architecture treats chain-of-thought as an internal process, and its published API pricing reflects a lower effective rate per completed task. For workloads where intermediate reasoning dominates token count — legal document analysis, complex code generation, multi-step data synthesis — the cost reduction per completed task is 85 to 95 per cent compared to GPT-4o or Claude 3.5 Sonnet at list pricing.
What compliance architecture does a responsible R1 deployment require?: Four controls are required for a defensible posture in regulated US environments: deployment in a private VPC with no egress to external AI infrastructure, inference logging with tamper-evident audit trail, formal red-team evaluation of the specific model version before production, and documented legal review confirming that your organisation's procurement policies extend only to cloud API services and not to open-weight model files. Data residency is the mechanism — you are running a static model artefact, not transmitting data to a foreign entity. All four controls are achievable with existing enterprise security tooling.
Which workloads are a poor fit for R1 in production?: Three categories. First, any workload requiring sub-500-millisecond response times — R1's chain-of-thought reasoning adds latency that self-hosted infrastructure cannot compress below that threshold at reasonable cost. Second, workloads requiring real-time tool use or live API calls within the reasoning loop, where R1's current architecture lags behind Claude 3.5 and GPT-4o. Third, any deployment in an organisation whose procurement policy explicitly extends country-of-origin sourcing restrictions to open-weight model files — until legal review resolves that classification question, the compliance risk outweighs the cost saving.
Does R1's open distribution change the competitive position of US AI labs?: Structurally, yes. The cost floor for frontier-class reasoning capability has been reset. US labs can no longer price agentic reasoning tasks at $40 to $80 per million output tokens without losing the cost-sensitive segment of the enterprise market. The response — smaller, more efficient model variants from Anthropic, OpenAI, and Google — confirms this. The more significant structural shift is in the build-versus-buy calculus: enterprises that would not have considered self-hosted AI two years ago are now running capability evaluations on open-weight models, and the compliance frameworks for doing so are maturing faster than most analysts anticipated.

DeepSeek did not ship an agent platform. It shipped a cost signal — and cost signals reshape markets more reliably than capability claims do. The enterprises that moved first on R1 were not chasing novelty; they were responding to arithmetic. A 91 per cent reduction in cost per completed agentic task changes the business case for every workload that was previously too expensive to automate at scale. The agent layer that gets built on that arithmetic will look different from the one that existed in December 2024: more self-hosted, more distributed, and considerably less dependent on the pricing discretion of a handful of San Francisco laboratories.

The open question is not whether the cost reset is real. It is. The open question is whether the compliance and geopolitical risk around Chinese-origin open weights resolves quickly enough for the reset to compound, or whether regulatory action in H2 2025 creates a bifurcated market — with US enterprises on domestically trained open weights and international buyers on DeepSeek derivatives. Either outcome accelerates the same underlying dynamic: the model is no longer the scarce resource. Distribution, compliance posture, and the orchestration logic built on top of cheap inference — those are what enterprise AI buyers will be competing on through 2026.