The Denominator Problem: Why AI Can’t Hold a Voice

There’s a moment every writer who works with AI eventually hits. You hand the model a voice guide, a style reference, a few sample paragraphs of exactly the tone you want. You ask for a hundred words. What comes back is flawless. Clean, on-voice, indistinguishable from something you’d have written on a good day.

So you ask for more. A full chapter. Two thousand words.

And somewhere around word 1,500, the thing you were reading stops being yours. The sentences are still grammatical. The paragraphs still resolve. But the voice has gone soft at the edges, smoothed toward something blander and more familiar. The rule of three. The tidy emotional button at the end of every paragraph. The closer that summarizes a feeling instead of staging it. You get about ninety percent of the way there. The last ten percent is mumble-jumble that sounds fine on a read-through but reads, unmistakably, as machine-made.

Most people call this drift. Drift is the right word for how it feels. It’s the wrong word for what it is.

Drift is a symptom. Here’s the disease.

The instinct is to assume the model “forgot” the voice guide, that the instruction fell out of memory or attention wandered, and that the fix is a stronger, longer, more emphatic guide. That instinct is wrong, and it’s why people burn hundreds of hours topping up a guide that was never the problem.

The voice guide doesn’t get forgotten. It gets out-massed.

Start at the beginning of a generation. The guide is essentially 100% of what the model is conditioning on, and the output is 0%. As the draft grows, that ratio inverts. At a hundred words, the guide still dominates. By five hundred, it’s roughly an even split. By three thousand words, the guide is a thin sliver against a mountain of already-generated text. By ten thousand, it’s a rounding error.

This isn’t the model defying the instruction. It’s the model doing exactly what it was built to do. It weights probabilities across all the text currently in view, while the proportion of that text occupied by your guide shrinks with every sentence it writes. The guide is still in the room. It just can’t win the vote anymore. You’re not fighting disobedience. You’re fighting a denominator that grows with every word.

And recency makes it worse. The model conditions most heavily on the most recent stretch of text, so the guide sitting up at the top of the context thins out even faster than raw proportion alone would suggest.

Why dilution turns into decay

Here’s the part that matters, because dilution alone would actually be survivable. If the model’s output were perfectly on-voice, then one percent guide plus ninety-nine percent on-voice draft would still read on-voice. The guide would have done its job by infecting the output, and the output would carry the voice forward on its own.

It collapses because the output isn’t neutral. This is the mechanism worth naming.

Large language models are autoregressive: each word is generated by conditioning on every word that came before it, including the words the model itself just produced. And every word it produces is sampled with the training prior baked in, a faint and constant pull toward the statistical center of everything it ever learned. So the draft drifts toward the average from word one. Then the model conditions on that faintly-average draft to write the next stretch, which comes out a little more average. Then it conditions on that.

The draft isn’t just diluting the guide. It’s a carrier, quietly reintroducing the mean back into the context and treating its own drift as established fact. It’s not a clean signal fading to one percent. It’s a clean signal being progressively overwritten by a blurrier copy of itself, each pass a little softer than the last.

There is a precise name for this. It’s called exposure bias, and it comes from a mismatch between how these models are trained and how they run. In training, the model is fed real human text as the prefix at every single step, a setup called teacher forcing. It only ever learns to predict the next word given genuinely good preceding text. But at inference, there is no human text to lean on. It’s conditioning on its own output, which is slightly off, and because it never trained to recover from its own mistakes, small deviations compound rather than self-correct.

The destination of that compounding has a name too. It’s neural text degeneration, the documented tendency of these systems to slide toward bland, repetitive, high-probability prose. Underneath it sits the likelihood trap: the most probable continuation is almost always the most generic one, because distinctive writing is, by definition, statistically unlikely. A model optimizing for the likely next word is therefore optimizing, quietly and relentlessly, against voice itself.

So the full, honest diagnosis isn’t “drift.” It’s drift caused by exposure bias in autoregressive decoding, manifesting as neural text degeneration toward the likelihood-favored mean. Drift is what you feel. That’s what’s happening.

Can anything actually be done?

Yes and no, and the no is more important than the AI industry wants to admit.

What doesn’t work is the obvious move: a bigger guide. You’re topping up the numerator while the denominator poisons itself. Past a certain length, no quantity of instruction survives the math.

There are real mitigations, and they help at the margins. Sampling strategies, the controls that stop the model from always grabbing the single most-probable word, fight degeneration directly and are why modern output is less robotic than it was a few years ago. Alignment training makes models better at holding a surface: diction, rhythm, sentence shapes. A model fine-tuned from the ground up on a single author’s corpus will hold that surface markedly better than a general-purpose one. If your problem is purely a texture issue, that path is real.

But here is the distinction that the whole conversation usually misses. There are two different things buried inside the word “voice,” and they don’t degrade the same way.

There’s surface voice, the diction and the cadence. A model can hold that reasonably well, and a specialized model can hold it very well.

And there’s intention, the felt sense of what a scene is for. A human writer carries that between sentences, between sessions, between days. When they stall, they push against the page. A language model carries nothing between chunks. Every few hundred words, it reconstructs what the passage is supposed to be doing from whatever happens to be visible, and the cheapest reconstruction available is always the generic one. That’s your last ten percent. Not a texture failure. An intention failure. And no amount of voice guidance can fix a system that lacks a persistent sense of why the paragraph exists.

Which means the real fix isn’t technical at all. It’s structural, and it’s about workflow.

The failure mode is long, unattended generation, letting the model condition on thousands of words of its own slowly-blurring output. So you don’t let it. You work in beats. You generate short, you cut hard, you stitch the pieces yourself, and you reset the context before the blur compounds. You hold the intention between the beats, because you’re the only one in the loop who can. The model drafts underneath the structure. It does not get to build the structure.

That’s a real division of labor, and it’s an honest one. But notice what it concedes. The tool wants you to do the load-bearing creative and emotional work while it fills in the texture beneath you, not the reverse. For anything with your name on the spine, that’s probably the correct arrangement no matter how good the models get, because the thing that drifts isn’t a bug they’re one release away from patching. It’s the shape of the architecture.

If you find this content valuable, please share it with your network.

Follow me for daily insights.

Book me to speak at your next event.

Start managing your agents for free.

Chris Hood is an AI strategist and author of the #1 Amazon Best Seller Infailible and Customer Transformation, and has been recognized as one of the Top 30 Global Gurus for Customer Experience. His latest book, Unmapping Customer Journeys, is available now!

The Denominator Problem: Why AI Can’t Hold a Voice

The Denominator Problem: Why AI Can’t Hold a Voice

Drift is a symptom. Here’s the disease.

Why dilution turns into decay

Can anything actually be done?

Autonomous Agents: An Oxymoron the Industry Hypes

The AI Agent Identity Landscape: Seven Lanes, 38 Players, One Question

How Agentic Resource Discovery Lives on the Agent Transfer Protocol

Chris Hood

Drift is a symptom. Here’s the disease.

Why dilution turns into decay

Can anything actually be done?

Agents Outlive Their Creators

The Industry’s Mess with IAM for Agents

You may also like

Autonomous Agents: An Oxymoron the Industry Hypes

The AI Agent Identity Landscape: Seven Lanes, 38 Players, One Question

How Agentic Resource Discovery Lives on the Agent Transfer Protocol