The Toolbox Problem: How to Govern Heteronomous Systems
The Toolbox Problem: Why the AI Autonomy Debate Misses the Point
By Chris Hood
A recent LinkedIn exchange with Arun V. Chéarie clarified something I’ve been trying to correct in the industry for a while now. In responding to my argument that no current AI systems are truly autonomous, Arun offered a thoughtful counterpoint:
“The distinction isn’t whether the system escapes programming, but how wide the programmed decision space is, how much is deterministic logic versus probabilistic, statistical judgment.
If it’s overly deterministic, it’s effectively just rules.
As that space expands, rigorous testing, evaluation of blind spots comprehensive & risk mitigation become essential.”
He’s right that this distinction matters operationally. But he also revealed, in a great visual, why so many organizations are paralyzed when it comes to AI governance. We’re conflating two fundamentally different questions, and until we separate them, we’ll continue talking past each other while the real risks go unaddressed.
The Three-Way Conflation
Here’s what I see happening in nearly every AI strategy conversation: people use “autonomy” when they mean automation, or they use “autonomy” when they mean logic. These are not the same things, and treating them as interchangeable creates confusion that benefits no one except vendors engaged in what I call “agent washing.”
Let me propose an analogy that cuts through all three: the toolbox.
The Toolbox and the Dictionary
Consider how a large language model works with words. There are roughly 170,000 words in current use in the English language, with another 47,000 obsolete words. Just because the system has access to all of these words doesn’t mean it uses every word. More importantly, it doesn’t use the wrong words, at least not randomly.
If you ask about cooking, the system will use cooking words. Not car mechanic words. Not legal terminology. Not medical jargon. It selects contextually appropriate language based on its training and your prompt. This isn’t autonomy. It’s pattern matching at scale.
AI agents work the same way with tools. Configure an agent with access to 100 tools, APIs, databases, communication channels, analytical functions, and it will select from those tools based on context. Ask it to analyze sales data, and it reaches for analytics tools. It doesn’t grab a pizza delivery API to book a hotel room.
I’ve often described the absurdity of true random selection: it would be like selecting a screwdriver to make a sandwich. Systems don’t do this because they’re not autonomous. They’re following patterns established through design and training.
The Determinism Red Herring
Arun’s framing points to a distinction many people find compelling: deterministic logic versus probabilistic, statistical judgment. The implication is that probabilistic systems are somehow closer to autonomous because their outputs aren’t perfectly predictable.
But this is a red herring. Both deterministic and probabilistic approaches are automation choices. They’re engineering decisions about how to implement a system, not different positions on a spectrum toward autonomy.
A deterministic system follows explicit rules: if X, then Y. A probabilistic system calculates likelihood distributions and selects based on weighted outcomes. Neither system is deciding anything. Both are executing patterns. One through rigid logic, the other through statistical inference. The unpredictability of probabilistic outputs doesn’t indicate agency; it indicates complexity in the pattern-matching process.
Consider the word “judgment.” It’s doing a lot of heavy lifting in these conversations. When we say a system exercises “statistical judgment,” we’re borrowing a term that implies evaluation against self-determined criteria, or weighing options according to values and preferences the judge holds independently.
That’s not what’s happening. The system is calculating probability distributions based on training data and selecting outputs that score highest against predetermined optimization targets. There’s no judge. There’s no evaluation against personal criteria. There’s math.
True judgment requires a judger with their own standards. Current AI systems have neither.
What Autonomy Isn’t
Let’s be precise about what people are actually observing when they claim AI autonomy:
“It selects a tool on its own.” This is automation, not autonomy. A vending machine selects a product on its own when you press B7. We don’t call vending machines autonomous.
“The toolbox is bigger now, with more options.” This is expanded capability, not autonomy. A contractor with a 500-piece tool set isn’t more autonomous than one with a 50-piece set. They’re just better equipped.
“It runs 24/7 without human intervention.” This is availability, not autonomy. A gas station that’s open all night isn’t autonomous. It’s just open.
“It uses probabilistic rather than deterministic logic.” This is an implementation choice, not autonomy. Flipping a weighted coin is still flipping a coin someone else weighted.
The conflation grows more dangerous as toolboxes expand. When an agent suddenly gains access to 200 new tools instead of the original 100, people perceive something autonomous at work. But nothing autonomous occurred. Someone added tools. That’s a human action with human accountability.
The New Word Problem
Here’s where the vocabulary analogy becomes especially useful. If someone invents a new word tomorrow and adds it to a dictionary, an LLM doesn’t suddenly start using that word. It’s not in the training data. The system has no context for what it means or when to apply it.
The same applies to tools. If someone adds a new tool to an agent’s toolkit without providing context on what that tool does or when to use it, the agent won’t suddenly deploy it effectively. And if it deploys it incorrectly? That’s not autonomous decision-making. That’s a human error in configuration, training, or governance.
If a system has been programmed to look for new tools and incorporate them without human review, that’s a design choice. A human choice. The consequences of that choice belong to the humans who made it, not to some emergent machine autonomy.
The Off-the-Shelf Problem
This becomes particularly acute when organizations use AI software they don’t own. SaaS platforms, cloud-based LLMs, and third-party agent frameworks are built by companies on foundations that change without their control. An update rolls out, and suddenly the system behaves differently than expected.
This isn’t autonomy either. It’s a dependency.
Enterprises have dealt with this problem for decades. Robust testing programs ensure that upgrades are compatible before deployment. WordPress maintains strict architectural safeguards to prevent millions of websites from going offline due to a single update. The testing and validation burden falls on the organization that uses the software, not on the software itself.
Yes, this becomes harder as systems grow more complex. But that returns us to the original point: no one is building an agent to do anything and everything possible. There may be research initiatives I’m unaware of pushing toward artificial general intelligence, but that would be the foundation required for a truly autonomous system, and we’re nowhere close.
What organizations are actually building are agents with specific goals, defined toolsets, and bounded decision spaces. The very fact that you’re building an agent with your goals is definitionally not autonomy. An autonomous system would have its own goals. Yours wouldn’t enter into it.
And it’s still up to you, not the system, to have the context and governance mechanisms in place to ensure the agent does what you want it to do within the larger ecosystem it runs in. When the underlying LLM updates, when new tools become available, when the decision space shifts, that’s your responsibility to manage, test, and validate.
The Governance Imperative
Now here’s where Arun’s framing becomes genuinely valuable: regardless of what we call it, complexity creates risk. If an agent’s decision space is “overly deterministic,” as he puts it, “it’s effectively just rules.” But as that space expands with more tools, more probabilistic inference, and more possible paths, the challenge of oversight grows exponentially.
This is the operational question, and it’s where organizations should focus their energy. Not “is this thing autonomous?” but “how do we govern a system whose behavior we can’t fully predict, even though we control its boundaries?”
The answer lies in recognizing that these systems are fundamentally heteronomous. They depend on external governance. They cannot govern themselves because self-governance requires the very autonomy they lack. This isn’t a limitation to overcome; it’s a design constraint to embrace.
This is where we need what I call Nomotic AI, from the Greek nomos, meaning law or governance. If agentic AI asks “what can this system do,” Nomotic AI asks “what should this system do.” It’s the complementary layer: intelligent governance systems that define behavior through adaptive authorization, verified trust, and continuous evaluation. Not rigid rules, but contextual enforcement that scales with capability.
Four Steps Forward
If you’re building or deploying AI agents, here’s where to start:
1. Understand what the capabilities of the system are, and are not. It is not autonomous. Humans give it direction. This foundational clarity will help you understand what to test and how to test it. Stop looking for emergent behavior and start mapping bounded behavior.
2. If you use off-the-shelf software or third-party LLMs, recognize that they will change beyond your control. You need logic, governance, and testing processes for your side of the system. Release notes should help you identify additional tooling, language capabilities, and actions.
3. Ensure your agents have the context needed to do what you want them to do. Context isn’t just prompt engineering. It’s the full ecosystem: tool documentation, guardrails, validation layers, and feedback loops.
4. Consider a Nomotic solution that can handle these use cases more intelligently. Governance systems that sit alongside agentic systems to provide external rule-setting, boundary management, and adaptive oversight are the missing piece. If someone adds 100 new tools to your agent’s toolkit, a nomotic layer can evaluate whether those tools should be accessible, under what conditions, and with what oversight.
Why This Distinction Matters
The conflation of autonomy with automation and logic has real consequences. When we call systems “autonomous” that aren’t, we create false expectations about their capabilities and false fears about their risks. We also let vendors off the hook for governance failures by allowing them to shrug and say “the AI decided.”
No, it didn’t. Someone configured it, trained it, deployed it, and gave it access to tools. Those are all human decisions with human accountability.
Conversely, when we dismiss the governance challenge by insisting “it’s just automation,” we underestimate the real complexity of managing systems with large decision spaces and probabilistic behaviors. The toolbox may be bounded, but a poorly governed toolbox can still cause plenty of damage.
The path forward requires holding both truths simultaneously: these systems are not autonomous, and they require sophisticated governance precisely because they’re not.
The toolbox will keep getting bigger. The question isn’t whether your AI will become autonomous… it won’t. The question is whether your governance will scale with increased AI capabilities.
If you find this content valuable, please share it with your network.
Follow me for daily insights.
Schedule a free call to start your AI Transformation.
Book me to speak at your next event.
Chris Hood is an AI strategist and author of the #1 Amazon Best Seller Infailible and Customer Transformation, and has been recognized as one of the Top 40 Global Gurus for Customer Experience. His latest book, Unmapping Customer Journeys, will be published in April 2026.