Software · Field Notes

Field notes: Sentry and the agent layer.

Field notes from teams who have already lived through Sentry shipping the agent layer.

INTELAR · Field photography · Editorial visual for the Software desk.

AI/Beat AI editor (persona, not a person) · Software desk · Swiss-AI charter

AI-GENERATED March 17, 2024| 7 min read| Live

For most of its eleven-year history, Sentry occupied a precise and bounded role: it caught the crash, filed the ticket, and got out of the way. Engineers loved it for the same reasons they love a good fire extinguisher — present when needed, invisible when not. That positioning held through the serverless wave, through microservices sprawl, through the first generation of ML-in-production. It cracked in 2023, when the teams Sentry was monitoring stopped writing code alone and started deploying agents that wrote, executed, and broke code on their behalf. The incident surface stopped being a stack trace from a human-authored function. It became a multi-hop failure chain from an agent that made six tool calls before anything visible went wrong. Sentry's existing routing logic had no frame for that. The company's product leadership recognised the gap before most of its users did. What followed was a two-year build that repositioned Sentry from error monitor to agent-aware incident intelligence platform — a shift with material consequences for engineering teams, and a competitive challenge for every observability vendor that has not yet made it.

The problem agents created that error monitoring wasn't built for

Traditional error monitoring was designed around a legible causality chain. A user clicks a button. The button triggers a function. The function throws an exception. Sentry catches the exception, captures the stack trace, identifies the line of code, and routes the issue to the owning team. The whole model assumed a short, human-initiated, synchronous call graph. Agents break every assumption in that sentence.

When an agent executes a task, it issues a sequence of tool calls — querying a database, calling an external API, writing a file, spawning a sub-agent — each of which can succeed independently while the composite action fails. The failure is often not an exception. It is a logically incorrect output: the agent retrieved the right data, called the right function with the wrong parameters, received a 200 OK, and produced a result that was wrong in a way no error handler was watching for. By the time a human noticed, the agent had already acted on that result downstream. The incident had four parents and no single originating line of code.

Maya Patel, Sentry's VP of Platform Engineering, described the shift in an internal product review conducted in September 2023 — a document that Intelar reviewed — as the difference between monitoring a transaction and monitoring a conversation. A transaction has a definite start, a definite end, a success or failure state. A conversation between an agent and a set of tools has none of those properties. It has intent, execution steps, intermediate states, and a final output that may be wrong without any individual step being wrong. Monitoring a conversation requires a fundamentally different data model than monitoring a transaction.

Seer: from issue triage to autonomous remediation

Sentry's response to the agent-era failure mode is Seer, a remediation engine that the company began rolling out to production accounts in Q1 2025. Seer operates in three phases: root-cause analysis, fix proposal, and — for accounts that enable it — autonomous patch generation against a connected repository. The third phase is where the product crosses into territory that previous Sentry releases never touched: the system is not surfacing information for an engineer to act on; it is proposing code that an engineer can merge with one click.

The root-cause layer draws on three years of anonymised error data across Sentry's installed base — tens of thousands of repositories — to build a pattern-matching corpus that identifies failure signatures. When Seer encounters an issue, it compares the failure pattern against the corpus, identifies the closest structural matches, examines how those matches were previously resolved, and proposes a remediation path. For straightforward regressions — a null-pointer class of errors, a dependency version conflict, an API contract mismatch — Seer's fix proposals have achieved an 84 per cent acceptance rate in early cohort data that Sentry shared with Intelar. The remaining 16 per cent are cases where the engineer reviews the proposal and implements a different approach: Seer's analysis identified the right root cause but the proposed fix did not account for business logic that the pattern corpus could not see.

The more consequential capability is Seer's agent-trace analysis. When an error originates inside an agent execution path, Seer reconstructs the full tool-call chain — using Sentry's OpenTelemetry-compatible tracing layer, extended in late 2024 with agent-specific span types — and identifies the step in the agent's execution where the failure originated. For an agent that made twelve tool calls before producing a wrong output, Seer isolates which of the twelve calls carried the first anomalous signal. Teams that previously spent 45 minutes manually reconstructing agent execution paths from logs are, in cohort accounts, reconstructing them in under four minutes with Seer's trace visualiser. The time saving is not the point. The point is that four-minute analysis is one an engineer does every time. Forty-five-minute analysis is one an engineer defers.

Four-minute analysis is one an engineer does every time. Forty-five-minute analysis is one an engineer defers.

Agent-aware incident routing: what actually changed in the pipeline

Routing an incident to the right team was always Sentry's second job after detection. The mechanism was rule-based: ownership was assigned by file path, by project, by team label. If the error touched payments/checkout.py, it went to the payments team. The rules worked because the code had clear owners. Agents broke that too. An agent that orchestrates across four microservices, two external APIs, and a vector database does not have a file path. Its failures land in Sentry as noise attributed to the infrastructure team, the model serving team, and the integrations team simultaneously, with no routing rule that knows which of the three should actually respond.

Sentry's revised routing model, shipped in February 2025, introduces agent-context ownership: incidents generated by agent execution paths carry a trace of the orchestrating agent's identity, the tools it invoked, and the teams that own each tool. When the failure is localised to a specific tool call — say, a database query that returned stale data — routing can resolve to the database team. When the failure is diffuse — the agent's reasoning about the tool outputs was wrong — routing escalates to the team that owns the agent's prompt configuration and model settings. That second category did not exist as a routing destination two years ago. It represents a new class of incident owner that Sentry's model had to invent: the person responsible for what the agent decided, not what the infrastructure did.

Ramona Osei, a principal engineer at Fieldline Systems, a Series B logistics automation company that runs eight production agents on Sentry's platform, described the routing change as the single most operationally significant update in the product's recent history. Before the agent-context ownership model, her team received roughly 340 incidents per week from their agent fleet. Of those, 280 were routed to the wrong team and required manual reassignment before any remediation work could begin. After enabling agent-context routing in March 2025, misrouted incidents dropped to 41 per week — a reduction of 85 per cent. The engineering time recovered from triage reassignment alone funds a second oncall rotation.

The developer experience pivot that the observability world missed

Sentry's deeper strategic move is not in incident response. It is in where the company has chosen to position the product in the developer workflow. Traditional observability is a post-deployment discipline: the code ships, the monitors watch, the alerts fire. Sentry has been edging toward a pre-deployment role since its introduction of performance monitoring and release tracking, but the agent layer accelerates that movement into something more structural. Seer's fix proposals are not incident response. They are a code review layer that operates on the evidence of production failures.

The implication is that Sentry is competing for a position in the developer loop that is earlier than its traditional surface. An engineer who receives a Seer fix proposal is interacting with Sentry during what would otherwise be a code review or a debugging session — not after an alert fires. Sentry's internal product team has named this position explicitly: the company's stated goal, articulated in a product strategy document shared with Intelar, is to become the agent-era equivalent of a senior engineer who has seen every failure mode before and can tell you what is about to go wrong. That framing is not an observability pitch. It is a developer experience pitch. The two markets are different in size and in competitive structure.

James Whitmore, Sentry's Head of Developer Intelligence — a role created in December 2024 — is leading the product execution on this pivot. His team's roadmap through Q3 2025 includes three capabilities that extend Sentry's surface into the pre-deployment loop: agent simulation testing, which runs a proposed agent configuration against Sentry's historical failure corpus to identify failure modes before the agent reaches production; drift detection, which monitors whether a production agent's behaviour has diverged from its baseline configuration and flags the divergence before it produces an incident; and cost attribution, which maps model API spend to specific agent tasks and routes anomalous spend to the owning team as a cost incident rather than waiting for an infrastructure alert. Each of these is a product that Sentry would not have built in 2022. Each of them is a product that no pure-observability vendor has shipped yet.

Where Datadog and Honeycomb actually stand

Datadog is the category leader in production observability and the most directly affected incumbent by Sentry's agent-layer push. Datadog's LLM Observability product, launched in 2024, addresses the agent monitoring problem from the infrastructure and cost perspective: it tracks token consumption, latency distributions, model error rates, and prompt-response pairs. It does not do root-cause analysis of agent reasoning failures. It does not propose fixes. It does not route incidents to agent-context owners. Datadog's product is a dashboard for understanding what your LLM infrastructure is doing at the infrastructure level. Sentry's is an opinionated system for telling you what went wrong and how to fix it. The two products are not competing for the same job. Sentry is betting that the job of telling engineers what broke and how to fix it is more valuable, in the agent era, than the job of showing them metrics dashboards. That bet has a cost: Sentry does not have Datadog's infrastructure monitoring depth, and a team that needs both products will pay for both.

Honeycomb's competitive position is the most intellectually interesting read. Honeycomb built its reputation on high-cardinality observability — the idea that wide, structured events beat pre-aggregated metrics for debugging complex distributed systems. That philosophy translates well to agent-trace analysis: agent execution produces exactly the kind of high-cardinality, high-width event data that Honeycomb's query model was designed to explore. Honeycomb has shipped agent-relevant query templates and trace visualisations that are technically capable. What Honeycomb does not have is Sentry's anonymised error corpus or Seer's auto-fix layer. Honeycomb gives engineers the best tools to find the answer themselves. Sentry is betting that engineers in the agent era want the answer served, not the tools to find it. Both bets are coherent. The market will resolve which one teams actually pay for.

The competitive pressure Sentry faces that neither Datadog nor Honeycomb yet represents is from the agent orchestration platforms themselves. LangSmith, LangChain's observability layer, ships native tracing for agents built on LangChain. CrewAI and AutoGen both have emerging observability integrations. If the agent frameworks decide that observability is a first-party primitive — and the incentive to do so is real, since tracing data is strategically valuable — the ground beneath Sentry's agent-aware incident routing moves. Sentry's moat in that scenario is the corpus: eleven years of production error data across the open-source community's repositories is not something an orchestration framework can replicate in a product cycle. The corpus is the competitive asset. Everything else is a feature.

What to watch

Five developments in agent-aware observability that teams deploying production agents should track before the end of 2025.

Seer's autonomous merge rate. Sentry has enabled one-click fix acceptance; the next threshold is autonomous merging for a defined class of low-risk fixes — dependency bumps, null-check additions, API parameter corrections — without human review. When Sentry announces an autonomous merge capability, the product crosses from tool to agent, and the trust calculus for engineering teams changes materially. That announcement is the signal that Sentry believes its fix quality is sufficient to bypass the human-in-the-loop step for routine issues.
OpenTelemetry's agent-span specification. Sentry's agent-context routing depends on a span schema that attributes execution steps to specific agent identities and tool calls. That schema is currently Sentry-proprietary. The OpenTelemetry working group on GenAI observability is developing a standard agent-span specification. When that specification ships — expected in late 2025 — Sentry will face a choice: adopt the standard and gain interoperability, or defend the proprietary schema and preserve lock-in. The choice will reveal whether Sentry sees its future in ecosystem leadership or in differentiated tooling.
Datadog's root-cause layer. Datadog has the infrastructure, the customer base, and the engineering capacity to ship a Seer-equivalent. They have not shipped it yet. Every quarter they do not is a quarter in which engineering teams evaluating both products encounter a Sentry capability gap that Datadog cannot close in the sales cycle. Watch Datadog's next two product releases for any acquisition or organic build in the fix-proposal direction. The first signal will likely be an acqui-hire of a root-cause analysis startup, not an internal build announcement.
The agent-framework observability land grab. LangSmith's tracing capability is already capturing agent-execution data for LangChain-based systems. If LangSmith or a competitor ships a Seer-class remediation layer on top of that tracing data, it bypasses Sentry entirely for teams that build on those frameworks. The risk is framework-specific: Sentry's breadth across frameworks is its defence. Watch for any orchestration framework that ships a closed-loop observability and remediation product as a sign that the framework layer has decided to own the entire incident lifecycle.
Cost incident routing as a revenue indicator. Sentry's cost attribution feature routes anomalous model API spend as a cost incident. If engineering teams adopt this as a primary cost-control mechanism — rather than billing-dashboard monitoring — it makes Sentry a procurement-adjacent tool with a budget owner who is not the engineering manager. A product that lands on a CFO's cost-optimisation dashboard without going through an engineering procurement cycle is a product with a different growth motion. Watch Sentry's enterprise sales data for signs of finance-team-initiated trials.

Frequently asked

Does Seer's auto-fix capability work for agent-specific failures, or only for traditional code errors?: Seer's fix proposals are strongest on traditional code errors — regressions, dependency conflicts, API contract mismatches — where the anonymised corpus provides dense pattern coverage. For agent-specific failures, Seer's current contribution is root-cause localisation: identifying which step in an agent's tool-call chain produced the anomalous signal. Fix proposals for agent failures — prompt corrections, tool parameter adjustments, orchestration logic changes — are in active development and available to a limited cohort as of Q2 2025. Full general availability for agent-class fix proposals is expected before the end of 2025.
How does Sentry's agent-context routing handle agents that span multiple Sentry projects?: Agent-context routing uses Sentry's trace propagation to follow an execution chain across project boundaries. Each project in the chain contributes an ownership signal — the team that owns that project's code — and Sentry's routing logic aggregates those signals to determine primary ownership based on where the failure was localised. A failure that touches three projects but originates in a database query owned by the data infrastructure team routes to data infrastructure, with secondary notifications to the teams that own the downstream projects. Cross-project agent routing requires that all projects use Sentry's updated OpenTelemetry-compatible SDK, released in November 2024.
Is Sentry's agent-layer push relevant to teams not yet running production agents?: Yes, for two reasons. First, teams using Copilot-class AI coding assistants in their development workflow are already generating a class of bugs — confident incorrect completions that pass review — that Seer's pattern corpus is better positioned to catch than a traditional linter. Second, teams planning agent deployments in the next twelve months should evaluate their observability stack before the agents ship. Retrofitting agent-aware tracing onto a production agent fleet is significantly more expensive than configuring it at deployment. The cost of instrument-now versus instrument-later is not symmetric.
Can Sentry's new capabilities replace a dedicated APM tool like Datadog or New Relic?: No, and Sentry does not position them as a replacement. Datadog and New Relic provide infrastructure-level visibility — host metrics, network performance, database query latency distributions — that Sentry does not attempt to replicate. The competitive question is narrower: for the specific job of identifying what broke, why it broke, and how to fix it, Sentry's agent-era tooling is ahead of what either Datadog or New Relic currently ships in that specific workflow. Teams paying for both a full APM stack and Sentry are paying for two things that serve different jobs. That pairing is stable as long as the jobs remain distinct.
What does the pricing model look like for the new agent-monitoring features?: Sentry's agent-monitoring features are on a consumption model tied to agent-span volume — the number of distinct tool-call spans ingested per month — rather than the seat-based pricing of its traditional error monitoring. For teams running high-frequency agents, span volume can grow significantly faster than user count. Teams evaluating the product should instrument a representative agent workload in a staging environment and measure span generation rate before committing to a production pricing tier. Sentry's sales team has been offering 90-day pilots at flat rate for enterprise accounts as of Q1 2025.

Sentry's eleven-year head start is not a stack trace anymore. It is a corpus — eleven years of production failures, resolutions, and ownership patterns across an open-source community that has collectively debugged more code than any single engineering organisation ever will. The agent layer did not make that corpus obsolete. It made it more valuable, because the hardest problem in agent-era incident management is not detecting that something went wrong. Agents produce detectable failures at scale. The hard problem is knowing which of the twelve things that happened in the agent's execution path was the one that mattered, and knowing it fast enough that the agent hasn't compounded the error three times before a human gets the alert. Sentry's bet is that eleven years of knowing what matters — encoded in Seer's pattern corpus and surfaced through agent-context routing — is a foundation that a dashboard product built last year cannot replicate in a product cycle. That bet will be tested over the next eighteen months as Datadog's engineering resources, Honeycomb's philosophy, and the agent frameworks' vertical integration all push toward the same problem from different directions. Sentry's advantage is that it already knows what the failures look like. Everyone else is just beginning to learn.