Justifiably False
Green sky over Sioux Falls, SD (Jaden via Twitter @jkarmill).
There is a specific kind of wrong that is harder to govern than ordinary wrong.
Ordinary wrong is detectable. The output doesn’t match the facts. The logic breaks down somewhere visible. A human reviewer catches it, or a validation layer flags it, or the downstream result fails in a way that traces back to the source. Ordinary wrong has a paper trail.
Justifiably false is different. The system isn’t malfunctioning. It isn’t hallucinating in the dramatic sense people imagine when they hear that word. It produces outputs that are internally consistent, confidently stated, and grounded in whatever it has come to believe is true. It is wrong in a way that cannot be detected by looking at the output alone, because the output looks exactly like a correct answer would look.
The system isn’t lying. It isn’t confused. It is, from its own frame of reference, right.
I would also argue that this isn’t hallucination. This isn’t any type of typical drift.
Justifiably false, if I were to overly simplify it, is an opinion.
Confidence is not a signal of correctness.
One of the things that makes AI outputs feel authoritative is that they don’t hedge the way a knowledgeable human would. A human expert, when uncertain, usually signals that uncertainty. They qualify. They say “I think,” or “I believe,” or “you should verify this.” The language includes doubt with it.
AI systems don’t do this reliably. They present correct answers and incorrect answers in roughly the same tone, with roughly the same apparent confidence. The output doesn’t tell you which is which.
We’ve known this for a while, but we’ve mostly talked about it as a hallucination problem. Or a confabulation problem. A discrete event where the system invents something. A fact that doesn’t exist. A citation to a paper that was never written. These are real issues, but they are the visible edge of something deeper.
The deeper issue is what happens when the model’s sense of what is true has itself been altered. Not in a single output, but structurally. When the baseline of what the system considers accurate has shifted away from what was validated at deployment, and neither the system nor the people using it has any way to know.
Governance was built for a different kind of error.
Most governance frameworks, for AI or anything else, are built around the assumption that errors are detectable. You set up checks. The checks catch deviations from expected behavior. Someone reviews the deviation and decides what to do about it.
This works when ‘wrong’ looks different from ‘right’.
Justifiably false breaks that assumption. If a system has learned to believe something incorrectly, its outputs will be consistent with that belief. Similar to AI outputs suggesting that AI is autonomous when it is not, or that the sky is green, if you trained it enough to believe that.
The internal logic holds. The tone is authoritative. The answer arrives without caveat. A governance layer designed to catch deviations will find nothing to flag, because the output is not deviating from what the system believes. It is doing exactly what it is supposed to do.
This is why, in a recent conversation about this, I wrote that it is increasingly challenging to govern systems when the system itself is justifiably false.
The challenge isn’t technical in the traditional sense. It’s epistemic. How do you build a governance layer that can identify incorrectness in a system that has no internal signal of its own incorrectness?
I’m also going to argue that this is not coherence, which I’m sure several people want to respond with. It’s not a constitutional problem, and it’s not a stabilization problem.
You can build interaction stabilization tools. You can add output validation layers. You can monitor for drift in user-facing behavior. These things have value. But they are answers to a downstream problem. They treat symptoms that appear at the interaction layer without addressing what is happening beneath.
The more fundamental question is one most teams haven’t reached yet: how do you govern a system that believes it is right? When its trained belief system is universally wrong.
The car and the frog.
There is an analogy that captures the mechanical failure better than any technical description.
Your car needs an oil change. You don’t do it. The warning light comes on. You still don’t do anything. You keep filling the tank with gas. The car runs. The ride feels smooth. Everything appears normal from the outside.
But the engine has been running without oil for months. It is eroding from the inside. The gas you keep adding has nothing to do with what is actually happening under the hood. The car won’t fail because you ran out of fuel. It is going to fail because the foundation that makes the fuel useful has already been destroyed.
Interaction stabilization is the gas. It keeps the system appearing to run. It is not without value. But if the model underneath has already collapsed into justifiable falsehood, you are not stabilizing a functional system. You are maintaining the appearance of one.
But here is what makes this particularly dangerous. The failure is not sudden. It does not announce itself. This is the other analogy that applies, the one about the frog in gradually heating water. The temperature rises so slowly that no single moment feels like the moment to act. The system degrades incrementally. Each output is only marginally worse than the last. The drift is real but invisible at any given point in time. By the time the problem is undeniable, the water has been boiling for a while.
This is what makes justifiably false so difficult to govern. It is not a threshold you cross. It is a direction you travel, slowly, in a way that looks completely normal from the outside at every step along the way.
And to be clear, these changes are over the course of years, not weeks. The longer the duration, the harder it is to see.
History is a good place to look.
There is a moment in the musical Wicked where the Wizard tells Elphaba that where he comes from, people believe all sorts of things that aren’t true. We call it history. It is played as a villain’s rationalization. It is also an accurate description of how information systems actually work.
Consider what AI models have been trained on. Centuries of historical narrative, written by people with perspectives, agendas, and incomplete information. Christopher Columbus was a heroic discoverer for generations of students. The model trained on that curriculum will tell you so, confidently. The model trained on the scholarship of the last thirty years will tell you something else, equally confidently. Neither model is malfunctioning. Both are doing exactly what they were built to do.
Napoleon was a liberator or a tyrant, depending on which tradition shaped the text. Thomas Jefferson was the author of human equality or one of its most prominent violators, depending on which version of that story the training data treated as the primary one. The outputs will be authoritative either way. The confidence will be identical.
And also, these aren’t necessarily biases in the traditional sense. Yes, biases are opinions. Some biases are bad, some are irrelevant. I have a bias for donuts over vegetables. It’s not a bad bias, other than what my doctor believes.
This is not a flaw in AI. It is a flaw in the assumption that training data is a neutral record of truth rather than a layered archive of contested perspectives. The model learned what it was taught. It believes what it absorbed. And it will defend those beliefs with the same tone it uses to tell you that water is H2O.
The governance question is not just whether the output is correct. It is correct according to what, decided by whom, and as of when.
Whose truth are we stabilizing?
There is a question underneath all of this that almost nobody is asking directly, and it is the one that makes the governance problem genuinely philosophical rather than merely technical.
If we build a control layer to correct a model’s outputs, we are making a claim about what correct means. We are asserting that there is a system of truth against which the model’s outputs can be measured and adjusted. And that assumption collapses the moment you examine it.
We do this in the law today. There is a set of laws that are then interpreted to mean one thing or another, and although some laws are specific, we see judges regularly interpret them with their own biases.
Consider two models. One trained predominantly on one political perspective. One trained on the opposite perspective. Both are confident. Both are internally consistent. Both will pass most of the output checks we have built. Both will tell you, with equal authority, very different things about the same events.
Which one is correct? Do you build a stabilization layer on top of the first political view to align it with the second’s belief of reality? Do you build governance parameters so one political model’s outputs conform to the other’s moral system? And then, as both models continue to degrade over time, what are you actually stabilizing toward? A moving target defined by whichever model you decided to treat as ground truth, which is itself evolving.
This is not a hypothetical edge case. This is the current state of the information environment in which AI systems are trained and deployed. Truth is not a stable reference point in many of the domains where AI is most actively used. It is contested, contextual, and evolving. The sky is blue is easy. Most of what AI is asked to reason about is not.
Governance built on the assumption of an objective truth baseline is governance that will work fine until it matters, and then it won’t.
The harder question.
There is a version of this problem that remains theoretical. Maybe the model collapse loop gets solved. Maybe training hygiene improves enough that the degeneration of AI trained on AI never becomes catastrophic. That would be good. The governance challenge I’m describing would still exist, but it would be bounded.
But what if it doesn’t get solved? What if the loop tightens?
What if the models that are foundational to the interaction layers being built on top of them are themselves drifting away from accuracy at a rate nobody is measuring, through a process so gradual that no alarm ever fires, toward a definition of truth that was never agreed upon to begin with?
The reality is that humans have not solved this problem since the invention of written communication. Wikipedia is a prime example of biased and alternative narratives that have shaped people’s opinions.
The answer is not more governance tooling applied to outputs. That is more gas for a failing engine.
The answer has to start with honesty about what the problem actually is. Justifiably false is not a corner case. It is a structural risk in any system whose understanding of truth can drift without detection, in a direction that nobody has formally defined, at a pace designed by nature to evade notice.
The layer we cannot see is the one that matters. And we have not yet decided what it should even be pointing at.
If you find this content valuable, please share it with your network.
Follow me for daily insights.
Book me to speak at your next event.
Chris Hood is an AI strategist and author of the #1 Amazon Best Seller Infailible and Customer Transformation, and has been recognized as one of the Top 30 Global Gurus for Customer Experience. His latest book, Unmapping Customer Journeys, will be published in 2026.