Why do LLMs hallucinate?

In the last post, we explained how large language models generate text: by predicting the next token based on probability. That process works remarkably well most of the time. But it has a significant downside.

The model doesn't know whether what it's generating is true. It only knows what a plausible response looks like.

This is what researchers call "hallucination," and it's one of the most important limitations to understand if you're using AI tools in your work.

Seeing it in action

The animation below shows a common scenario: a city employee asks a straightforward tax question, and the AI responds with specific statute citations and percentages. It looks authoritative. But watch what happens when we check the details.

The statute doesn't exist. The rate doesn't apply. The AI generated what a correct answer would look like, not what's actually true.

Why this happens

Remember, the model learned patterns from vast amounts of text. It learned that statute citations look like "§XXX.XXXX" and that "6.25%" often appears near "Texas sales tax." Those patterns are correct. But the model has no way to verify whether a specific statute number exists or whether it applies to your situation.

It's generating the next most likely token, not looking up facts in a database.

The dangerous part is that confidence and accuracy are completely unrelated. An AI can be 92% confident in an answer that's 100% wrong.

How to work with this limitation

The good news: hallucination is manageable once you understand it. Here's how to use AI tools effectively:

Treat AI as a drafting partner, not a source of truth. It's great for getting a first draft or exploring options, but the final answer needs verification.
Verify all citations. If an AI gives you a statute number, regulation reference, or case citation, look it up. This is the most common hallucination pattern.
Cross-check numbers against official sources. Percentages, dates, and dollar figures should always be confirmed.
Be extra cautious with niche or technical topics. The less common the subject matter, the more likely the model is filling gaps with plausible-sounding guesses.

None of this means AI tools aren't useful. They absolutely are. But they're useful in the same way a very fast, very confident intern is useful: you wouldn't let them send the final memo without review.

The bottom line

AI hallucination isn't a bug that will be fixed in the next version. It's a fundamental consequence of how these models work. The systems that generate fluent, helpful text are the same systems that can confidently make things up.

Understanding this trade-off is what separates people who use AI effectively from people who get burned by it.