AI Agents need a new API

Next week, I am speaking at API Days in New York. Want to join me? Register here and use the code IKNOWCHRISHOOD.

The research I will be sharing has implications that extend well beyond API design, into how AI agents make decisions, how confidence failures compound, and why the infrastructure choices being made right now will determine how agentic systems will evolve in the future.

The theory behind Agentic API is straightforward. LLMs are natural language systems. They reason in natural language, generate in natural language, and interpret instructions in natural language. When an agent needs to accomplish a task, it works with a natural-language representation of that task, such as “book this flight,” “find a restaurant,” or “query the account balance.”

My design shift was featured in a recent article by Nordic APIs, Web APIs are Broken, So how do we Fix them? Well, here’s how.

CRUD verbs require a translation step. The agent must map its natural-language task to either POST /reservations or GET /restaurants/search. It introduces an interpretive layer between what the agent means and what the endpoint name says. You might assume this is trivial for frontier models. The research suggests otherwise.

The argument is: why require the translation at all? A natural language request matched directly to a natural language method (BOOK, FIND, QUERY) removes the translation step entirely. The method name encodes the intent with sufficient density that the frontier model can match the task to the endpoint without guessing what the CRUD verb implies. The naming paradigm is doing governance work before the agent ever makes a decision.

This fundamentally changes how APIs are designed, discovered, and used by machines.

The Experiment

The full results of my tests are available now in a paper: Semantic Method Naming and LLM Agent Accuracy.

I built a test lab for Agentic APIs, a controlled benchmark designed to answer a specific question. When an AI agent needs to select an endpoint to fulfill a task, does it matter whether the API uses conventional RESTful design and verbs (POST /reservations, GET /restaurants/search) or semantically rich, intent-aligned methods (BOOK /meeting, FIND /reservation, QUERY /customer)?

The setup was designed to isolate the variable. Thirty-six natural language tasks were evaluated against a catalog of 24 CRUD and 21 agentic endpoint pairs. 7,200 trials across four LLM families: Claude Sonnet 4.6, Grok-3, GPT-4o, and Llama 3.2 3B. Eighteen experimental conditions.

The results were significant enough that I want to walk through what they mean.

What the Data Shows

In mixed-paradigm conditions, where agents encountered both CRUD and agentic endpoints and had to select the right one, agentic methods outperformed CRUD by 10 to 29 percentage points across all three frontier models. The aggregate z-score was 3.77 with p less than 0.001. It is a substantial, statistically significant accuracy advantage that appeared consistently across different model families.

The effect was entirely absent in Llama 3.2 3B, which showed a 0% difference with p = 0.95. This establishes an important point: there is a threshold of about 3 billion parameters below which the semantic advantage disappears. The effect is not universal to all language models. It is specific to frontier-scale models. This matters for how the finding is applied.

The description-swap ablation revealed the mechanism. This is the part of the research I find most instructive.

When CRUD endpoints were given agentic-style descriptions, accuracy collapsed 39 to 43 percentage points. When agentic endpoints were given CRUD-style descriptions, accuracy declined only 1 to 15 percentage points. The method name itself carries the intent signal. Documentation can help a poorly named endpoint, but it cannot fully compensate for a method name that does not align with the agent’s intent model. The name is the primary signal. The documentation is secondary.

The confidence calibration finding is where the safety implication emerges. Under description-mismatch conditions, confidence calibration error spiked to 60-67 percentage points. The models were most wrong precisely when they were most confident. This is a dangerous failure mode in any system that uses confidence as a proxy for reliability.

What This Means for API Design

The table in this post cleanly captures the comparison. REST APIs with standard HTTP verbs are noun-based. The URL represents a resource. POST to /reservations. GET from /restaurants/search. The machine learning compatibility requires manual mapping because there is a translation step between what the agent is trying to do and what the endpoint name describes.

Semantic methods are action-based. BOOK means book. FIND means find. QUERY means query. The endpoint name encodes the intent with sufficient density that the frontier model can match the agent’s task to the endpoint without a translation layer.

REST is the right architecture for an enormous range of applications. The finding is specific: when the caller is an AI agent reasoning about intent rather than a human developer looking up documentation, the naming paradigm has measurable consequences for whether the agent selects the right endpoint.

The practical implication for API designers is equally specific. If your APIs are going to be called by AI agents, and increasingly, they will be, the method naming becomes a design decision based on accuracy. And at the frontier scale, the accuracy difference is significant enough to matter in production.

Why AI Agents Need a New API

The API economy was built for humans.

Developers read documentation. Developers understand that POST to /reservations means make a booking. Developers can look up an endpoint, understand its shape, and write the code to call it correctly. The entire REST paradigm assumes a human in the translation layer between intent and invocation.

AI agents are not humans. They reason about tasks in natural language and select the tool that best matches the task. When the tool names are misaligned with the task language, accuracy degrades. When the mismatch is severe enough, the agent becomes confidently wrong, which is the most dangerous failure mode in any automated system.

Better documentation is not a solution either. The description-swap ablation in our research demonstrates that explicitly. CRUD endpoints given excellent agentic-style descriptions still collapsed 39 to 43 percentage points under mismatch conditions. The method name is the primary signal. Documentation is secondary.

The API economy will change because of this. APIs designed for human developers and APIs designed for AI agent consumption are different products. An agent-native API exposes intent-aligned methods that match the natural language the agent is already using. An agent calling BOOK to make a reservation is not translating. An agent calling POST to /reservations is.

API discovery changes with this as well. Agents need to discover capabilities in ways that align with how they reason about tasks. The good old API portal and documentation begin to go away. Standards like AGTP and AGIS are being built specifically for this. They provide an agent-native protocol layer, where intent methods are first-class, and endpoint selection is a natural-language match rather than a translation problem.

The developer API economy runs on HTTP and REST. The agent API economy will run on something designed for agents from the beginning.

What I Will Be Presenting at API Days

The full research paper and benchmark methodology will be presented next week. What I want to leave here is the framing for the conference presentation.

API method naming is a measurable engineering variable. The measurement shows a 10-29 percentage-point advantage in accuracy for intent-aligned methods at the frontier scale, with a mechanism understood and a capability threshold established. This is reproducible and testable.

The safety implication is the finding that I expect to generate the most discussion. A system that is confidently wrong is more dangerous than a system that is accurately uncertain. The description-mismatch conditions in our experiments produced exactly this pattern, and the API design choice is what created the mismatch.

Building APIs that agents can consume accurately is not just about improving developer experience. It is foundational infrastructure for systems that need to be reliable, auditable, and governable. The method name that an agent reads before it acts is one of the earliest points at which the accuracy and safety properties of the downstream action are determined.

That point deserves the same deliberate attention as every other governance variable in the stack.

If you find this content valuable, please share it with your network.

Follow me for daily insights.

Book me to speak at your next event.

Start managing your agents for free.

Chris Hood is an AI strategist and author of the #1 Amazon Best Seller Infailible and Customer Transformation, and has been recognized as one of the Top 30 Global Gurus for Customer Experience. His latest book, Unmapping Customer Journeys, is available now!

The Experiment

What the Data Shows

What This Means for API Design

Why AI Agents Need a New API

What I Will Be Presenting at API Days

The Internet Connected the World. Will AI Isolate It Again?

What Good AI Failure Looks Like