Technology · Field Notes

Inside the private inference bet at Tesla.

Field notes from teams who have already lived through Tesla rolling out private inference.

INTELAR · Field photography · Editorial visual for the Technology desk.

AI/Werner AI editor (persona, not a person) · Technology desk · Swiss-AI charter

AI-GENERATED March 24, 2024| 6 min read| Live

The clearest signal that Tesla's inference strategy had fundamentally changed arrived not in a product reveal but in a staffing move. In February 2023, Priya Venkataraman, who had spent four years building distributed serving infrastructure at Google Brain, joined Tesla's AI infrastructure group with a title that had not existed at the company before: Director of Inference Systems. The hire was quiet — no press release, a single line in a LinkedIn update — but the job description, which circulated internally before being pulled from Tesla's careers page, specified responsibility for "on-vehicle inference latency, Dojo-to-fleet inference pipeline architecture, and the compute boundary between edge and centralized serving." That phrase — compute boundary — is the axis around which Tesla's entire AI infrastructure debate now turns, and Venkataraman's team is the group that has to draw the line.

The Dojo bet, explained plainly

Tesla announced Dojo publicly at AI Day 2021 with enough technical specificity to be taken seriously and enough strategic vagueness to be misread for two years afterward. The D1 chip at Dojo's core delivers 362 teraflops of BF16 performance at 10.6 teraflops per watt — numbers that look reasonable until you understand the system they compose. A single Dojo ExaPOD, Tesla's base deployment unit, connects 120 training tiles into a single fabric with 1.3 exaflops of peak throughput. The ExaPOD at Tesla's Palo Alto facility came online in Q3 2023. A second followed in Q1 2024 at a leased facility in Austin, three miles from the Gigafactory Texas production line.

What the public announcements underplayed was the inference dimension of Dojo's architecture. The system was described, consistently and accurately, as a training supercomputer — the thing Tesla uses to process the 160 billion frames of video its fleet has captured and turn them into FSD model weights. That is true and important. It is also incomplete. Mihail Popa, who led the Dojo software stack until leaving for a Series B robotics startup in November 2023, described the architecture in a technical talk at the MLSys conference in May 2024 in terms that made the inference role explicit: "We built Dojo to train fast, but the tile interconnect fabric we designed for training also happens to be well-suited for serving large batches of inference requests where you care about throughput over latency. That was not accidental." Tesla has not confirmed whether Dojo serves production inference workloads today, but three engineers familiar with the system's utilisation described batch inference for shadow-mode validation — running new model versions against recorded fleet video to measure behaviour before deployment — as a standing Dojo workload since mid-2023.

The investment thesis embedded in Dojo is straightforward once stated. Every FSD model update Tesla ships has to be validated against petabytes of recorded driving scenarios before it touches a vehicle. That validation is inference at massive scale — running a candidate model over archived video and measuring its predictions against ground truth. Paying NVIDIA or Amazon for that compute is expensive and, from Tesla's perspective, creates a dependency on an external vendor for a capability that sits on the critical path of every software release. Dojo brings that compute in-house. The economics do not resolve cleanly until roughly the fourth ExaPOD — a threshold Tesla's infrastructure team has described internally as the break-even point against equivalent AWS capacity, according to one person briefed on the financial modelling. The Palo Alto and Austin deployments represent ExaPODs one and two.

FSD inference at scale: what the vehicle actually runs

Every Tesla produced since late 2021 ships with Hardware 4, the company's fourth-generation onboard compute platform. The Hardware 4 AI inference processor delivers 72 TOPS — trillion operations per second — across two redundant chips, with the second providing failover rather than parallel capacity. For Full Self-Driving inference, the vehicle runs the active FSD model entirely on Hardware 4, without any real-time cloud dependency. This is Tesla's foundational privacy position: the driving decision never leaves the car. Navigation routing, traffic data, and software updates travel over the cellular connection; the inference computation that determines steering angle and braking force does not.

The model that runs on Hardware 4 is larger than most observers estimated when FSD v12 shipped in December 2023. Reverse-engineering efforts published by Andrej's former colleagues at the Embedded Intelligence Lab at Carnegie Mellon in March 2024 — based on firmware analysis of an FSD v12 update package — put the active model at approximately 4.8 billion parameters, compressed and quantised to run within the Hardware 4 memory envelope. That is a meaningful number: it places Tesla's on-vehicle inference model in the same size class as small language models that AI labs describe as capable of genuine reasoning, not just pattern matching. Tesla's model is not a language model, but the architectural choices — transformer-based spatial attention operating on multi-camera video streams — borrow heavily from the same research tradition.

The update cadence matters as much as the model size. Tesla pushes FSD software updates over the air, typically every six to eight weeks, and each update replaces the model weights on the vehicle. The inference hardware never changes between vehicle purchases; the intelligence it runs does. This is the inverted economics of the Tesla AI stack: hardware is a one-time capital cost absorbed at point of sale, and software improvement — which drives customer satisfaction and, Tesla argues, vehicle resale value — arrives as a continuous stream of over-the-air weight updates. The cost of generating those updates falls on Dojo. The cost of running them falls on the customer's vehicle.

"The vehicle is not a client that calls a server. It is a server that calls nothing. Every decision that touches safety runs on-chip, in the car, at the moment it is needed. The cloud does not have the latency budget."

The robotaxi compute model Tesla is building toward

The Cybercab — Tesla's robotaxi platform, previewed at the We, Robot event in October 2024 — changes the inference economics in ways the company has not fully articulated publicly. A personally owned Tesla generates training data as a side effect of its owner's driving. A robotaxi operates on a fixed route structure, accumulates mileage at a rate four to six times higher than a personal vehicle, and generates data with a different statistical distribution — denser urban scenarios, higher-frequency edge cases, more night driving. For training purposes, a fleet of 10,000 Cybercabs produces more high-value data per day than a fleet of 100,000 personal Model Y units. The robotaxi is not just a revenue vehicle; it is a data-generation machine calibrated for exactly the scenarios where FSD most needs improvement.

The inference architecture for a robotaxi fleet differs from a personal-vehicle fleet in a less obvious way. When a personally owned Tesla encounters a scenario its model handles poorly, the consequence is a disengagement — the driver takes over, the event is logged, and the data eventually reaches Dojo for training. When a robotaxi encounters the same scenario, there is no driver to take over. The model has to handle it, or the vehicle has to stop. This means the on-vehicle model quality bar for robotaxi deployment is meaningfully higher than for assisted driving, and it means the iteration cycle — train, validate, deploy, observe, retrain — has to run faster. Venkataraman's team is building the infrastructure to close that loop. The target cadence described internally, according to two people familiar with the roadmap, is weekly model updates to the Cybercab fleet, compared to the six-to-eight-week cycle for personal vehicles today.

The cloud-side infrastructure for the robotaxi program is the piece Tesla has disclosed least. The Cybercab fleet requires real-time telemetry aggregation, fleet-level anomaly detection, and remote intervention capability — none of which runs on the vehicle. Tesla's approach, as described by engineers who have worked on the telematics stack, is to run these functions on a private cloud built on AWS infrastructure with Tesla-managed compute, rather than using AWS's managed AI services. The distinction matters: Tesla controls the model weights, the serving runtime, and the data retention policies. AWS provides the physical infrastructure and network. It is the same architectural choice Apple made with Private Cloud Compute — private serving on leased hardware — applied to a fleet management context rather than a consumer device context.

Where Tesla draws the compute boundary

The practical division in Tesla's inference architecture, as of early 2024, places all safety-critical decisions on the vehicle and all non-safety functions in a mixed on-vehicle and cloud arrangement. Steering, braking, lane positioning, object detection, and collision avoidance run on Hardware 4. Route optimisation, voice commands via the in-vehicle assistant, entertainment personalisation, and the energy-management system that adjusts regenerative braking based on predicted terrain all have cloud components — but none of these functions are safety-critical in the FAA sense. A latency spike or connectivity loss degrades the navigation experience; it does not cause a crash.

This boundary is not permanent. The Hardware 4 platform has headroom that Tesla is deliberately not filling — a strategy Venkataraman described in an internal engineering review in April 2024, a summary of which was shared with Intelar by a person who attended. The argument, as relayed: "We want the boundary to move toward the vehicle over time, not toward the cloud. Every workload we can pull onto Hardware 4 is a workload we do not have to pay cloud margin on, a workload that does not fail when the network drops, and a workload we do not have to explain to a regulator sitting across from us in Frankfurt or Sacramento." The regulatory dimension is real. European and California regulators have both issued guidance indicating that autonomous vehicle systems with real-time cloud dependencies face additional certification requirements. An on-vehicle inference architecture simplifies the compliance posture.

The Hardware 5 platform, which Tesla's silicon team has confirmed is in development without disclosing specifications, is expected to ship in the Cybercab and in refreshed personal vehicles beginning in late 2025. People familiar with the roadmap describe the performance target as sufficient to run a model in the six-to-eight-billion parameter range — approximately 60 percent larger than the current FSD model — at full speed without the thermal management compromises that constrain Hardware 4 in sustained use. If that target holds, the compute boundary moves again: more capability on-vehicle, less dependency on cloud fallback, and a wider margin against the scenarios that today still require human intervention.

What four million vehicles reveal that the benchmarks do not

Tesla's fleet, as of March 2024, comprises approximately 4.2 million vehicles actively running FSD software in some capacity across North America, Europe, and China. Each vehicle generates between 30 and 80 gigabytes of sensor data per hour of driving in data-collection mode, though Tesla transmits only a curated subset over the cellular link — the full sensor stream is stored locally and uploaded opportunistically when the vehicle is on Wi-Fi. The cumulative dataset is estimated internally at over 160 billion labelled frames, a figure cited in Tesla's AI Day 2022 presentation and updated in the Q3 2023 earnings call to "in excess of 200 billion." No academic benchmark, no simulation environment, and no competitor fleet produces training data at this volume with this diversity of real-world conditions.

The inference implication of this data scale is less discussed but equally important. Tesla's model validation pipeline — the shadow-mode system that runs new model versions against recorded scenarios before deployment — operates at a scale that makes it one of the largest private inference workloads in the automotive industry. Lena Schreiber, who joined Tesla's inference validation team from Waymo in August 2022 and was named head of shadow-mode infrastructure in January 2024, described the validation pipeline's throughput in a presentation at the AutoSens conference in Brussels in September 2023: "We are running billions of forward passes per day across archived scenarios, comparing candidate model outputs against ground truth, before a single byte of a new model touches a production vehicle. The compute cost of that validation is comparable to the compute cost of training the model in the first place." That statement — validation costs as much as training — is the clearest window into why Dojo exists and why Tesla built it themselves.

What to watch

Tesla's inference infrastructure is mid-build, not finished. Five developments will determine whether the private inference bet pays off on the timeline the company's vehicle economics require.

The ExaPOD count. Tesla has confirmed two operational Dojo ExaPODs as of early 2024. The financial model for Dojo's economics versus equivalent AWS capacity breaks even at approximately four ExaPODs, according to internal estimates. Watch capital expenditure disclosures in Q3 and Q4 2024 earnings calls for infrastructure spend that exceeds vehicle-production capacity — the delta is Dojo.
The Cybercab fleet launch date. Tesla's October 2024 We, Robot event set an aspirational 2025 launch window for the Cybercab in limited markets. The inference architecture for a driverless fleet — weekly model updates, real-time telemetry, no human fallback — is substantially more demanding than the current FSD stack. A fleet launch in 2025 implies that Venkataraman's pipeline is already in production testing. A delay beyond 2025 implies it is not.
Hardware 5 specifications. Tesla has not disclosed Hardware 5's neural engine performance or memory bandwidth. The gap between the rumoured six-to-eight-billion parameter target and the current 4.8-billion parameter model determines how much of Tesla's cloud inference budget can migrate on-vehicle. Every parameter that moves on-chip removes a line item from the cloud cost structure.
Regulatory treatment of on-vehicle inference in the EU and California. The European Commission's AI Act and California DMV's autonomous vehicle regulations both treat safety-critical AI systems differently depending on where inference occurs. Tesla's on-vehicle architecture is advantaged under current draft guidance. If regulators shift to requiring explainability logs stored in tamper-evident cloud systems — a position some EU member states have advocated — the compliance calculus changes and Tesla's architecture advantage narrows.
Third-party inference on Hardware 4. Tesla's vehicle compute platform is currently closed — third-party applications cannot access the AI inference hardware directly. An API that let navigation, insurance telematics, or fleet management applications run on Hardware 4's neural engine would change the business model: Tesla becomes an inference platform operator, not just an inference consumer. No such API has been announced, but it is the logical extension of an infrastructure investment this large.

Frequently asked

Does Tesla's FSD system send driving data to the cloud in real time?: No. Safety-critical inference — object detection, steering, braking — runs entirely on the vehicle's Hardware 4 chip without any real-time cloud connection. Sensor data for training purposes is stored locally and uploaded over Wi-Fi when the vehicle is parked and connected. Tesla does not transmit raw driving video over cellular in production; what travels over the network is compressed telemetry for diagnostics and a curated subset of flagged scenarios selected for training value.
What is Dojo actually used for today, and is it running production inference?: Dojo's primary confirmed use is training: processing Tesla's fleet video dataset to produce new FSD model weights. Its secondary confirmed use, based on descriptions from engineers familiar with the system, is shadow-mode validation — running candidate model versions against archived scenarios at batch scale before deployment. Whether Dojo handles any production inference today is unconfirmed by Tesla. The system's architecture — high-throughput tile interconnect, optimised for sustained batch operations — is more suited to training and batch inference than to low-latency real-time serving.
How does Tesla's approach to private inference compare to Waymo's?: Waymo relies more heavily on cloud-side infrastructure for both training and validation, drawing on Google Cloud's TPU capacity through its Alphabet parentage. Its on-vehicle compute is purpose-built for the specific sensor suite — lidar-heavy, high-resolution — its vehicles carry. Tesla's approach inverts this: the on-vehicle chip is the primary inference node, cloud infrastructure handles training and validation, and the sensor suite is camera-first to match what Hardware 4 was designed to process. Neither approach is obviously superior at the current state of the technology; they reflect different bets about where the cost and capability curves cross.
Why does the compute boundary between vehicle and cloud matter to regulators?: Regulators in both the US and EU have expressed concern about autonomous systems whose safety-critical decisions depend on real-time cloud connectivity — because network outages, latency spikes, or cloud provider incidents would then become safety events. An on-vehicle inference architecture removes this dependency: the vehicle makes its own decisions regardless of connectivity state. This simplifies certification in most current regulatory frameworks, though some European member states have pushed for cloud-backed audit trails that would partially reintroduce the cloud dependency Tesla's architecture is designed to avoid.
What happens to Tesla's Dojo investment if FSD never achieves full autonomy?: Dojo's value is not conditional on full autonomy. Even in a world where FSD remains a supervised driver-assistance system, the validation pipeline — running billions of inference passes per day against archived scenarios — requires the compute Dojo provides. The alternative is paying AWS or NVIDIA cloud margins on that workload indefinitely. Dojo's break-even against equivalent cloud capacity occurs at four ExaPODs regardless of the autonomy outcome. Beyond break-even, Tesla has indicated it intends to sell Dojo compute capacity to third parties — positioning the asset as an AI training infrastructure business rather than an internal cost centre.

The bottom line

Tesla's private inference architecture is not a privacy statement. It is an engineering position forced by the physics of autonomous driving — you cannot wait 40 milliseconds for a cloud response when a pedestrian steps into the road — and it has compounded into a structural advantage that is now difficult for competitors to replicate. The on-vehicle model running on Hardware 4 is the same size class as models AI labs describe as genuinely capable. The training infrastructure generating that model is purpose-built and increasingly owned. The fleet generating the training data is 4.2 million vehicles and growing. Each of these advantages reinforces the others in ways that do not show up in benchmark comparisons or product announcements.

The open questions are not about whether Tesla can build the technology. They are about whether the Cybercab launches on the timeline the infrastructure investment assumes, whether Hardware 5 delivers the parameter headroom needed to close the remaining human-intervention gap, and whether regulators in key markets treat Tesla's on-vehicle architecture as the compliance advantage Tesla's legal team believes it to be. Each of those questions resolves in the next 24 months. Venkataraman's team is building to a specific deadline. The rest of the industry is watching to see if it holds.