Feb 4, 2026 7 min read

Intuition First, Explanation After: What Humans and LLMs Have in Common

A closer look at the shared tendency of humans and large language models to arrive at conclusions before generating plausible justifications — and what this means for how we understand "understanding."

▶ Contents

Why “Explanations” Are Unreliable
Intuition Comes First, Explanation Follows
Rethinking What “Understanding” Means
Deep Learning’s Foundational Designs Are “Representation Assumptions”
Structural Similarity Between Model Internals and Human Methodologies
The Role of Reflection: Not Replaying the Process, But Reinforcing Intuition
Conclusion: Understanding Is Structured Experience in Action

I used to think that if someone could explain their reasoning clearly, it meant they actually understood what they were doing. That turns out to be less reliable than it sounds — for both humans and the models I work with.

Why “Explanations” Are Unreliable

Here’s a pattern I’ve noticed: both humans and large language models can produce a result, then follow it up with a perfectly plausible-sounding reasoning chain. The problem is that this chain is often generated after the conclusion is already settled.

For language models, what gets called “reasoning” is really a post-hoc language generation task. The model has already arrived at an answer with high confidence, and then produces text that sounds like a logical walkthrough. It’s documentation, not a debug log.

Humans are similar. A decision gets made fast, in parallel across a lot of neural machinery we can’t introspect directly. The verbal explanation that follows is an edited reconstruction — cleaned up, made linear, stripped of the noise and false starts. The explanation is often genuinely related to the decision, but it rarely describes the actual process by which the decision was reached.

This doesn’t make explanations useless. Sometimes the post-hoc account does catch real patterns. But it’s usually incomplete, imprecise, and cannot stand in for the internal process that produced the result.

Which means: whether someone — or some model — can explain their way to an answer is not a reliable test of whether they actually reasoned their way there.

Intuition Comes First, Explanation Follows

A lot of genuine cognition follows this sequence:

A judgment or reaction appears
We then search for reasons to justify it

For a chess player, that instant sense that “this move is wrong.” For a programmer, the gut feeling that “something is off here.” For a model, the next-token prediction. In each case, what’s actually happening is:

A large amount of historical experience
Compressed in a high-dimensional space
Into a fast, stable reactive pattern

The explanation isn’t what drove the decision. It’s a linguistic summary generated after the fact.

A more accurate framing: decisions are driven by structure and experience; explanations are an optional language layer added afterward.

Rethinking What “Understanding” Means

If explanations are routinely post-hoc, we need to reconsider what “understanding” actually means.

The old model: understanding = can recite the rules, can trace the causal chain.

The more practical test: can you still give reasonable responses when conditions change, when surface rules stop applying, when the problem is restated?

That’s where understanding shows up — not in what you say, but in how you behave when things shift. Understanding, viewed this way, is less like knowledge and more like a shape: an internal structure that has been shaped by experience, and that tends to produce appropriate responses even in novel situations.

This structure, because of its complexity and high-dimensional nature, is not easily reducible to a linear set of rules. You might acknowledge that understanding is structural without being able to turn that structure into a clear, articulable form.

Deep Learning’s Foundational Designs Are “Representation Assumptions”

Stepping back, many of the foundational techniques in deep learning — discretization, normalization, activation functions, attention, regularization, network architecture — are usually described as engineering tricks. But a more honest framing is that they’re answers to a single underlying question:

What kind of representation makes it easier for a model to extract stable, generalizable patterns from real-world data?

Some examples:

Discretization: splitting the continuous world into composable units, enabling pattern reuse
Normalization: suppressing absolute scale so features compete on equal footing
Activation functions: creating boundaries in the representation space, turning flat linear space into something with structure
Attention: encoding the assumption that effective learning requires selective focus
Regularization: encoding the prior that simpler patterns generalize better than complex ones
Layered structure: encoding the assumption that real problems have hierarchical generative features

These aren’t neutral. Each one embeds a bet about how reality generates the patterns we’re trying to learn.

Structural Similarity Between Model Internals and Human Methodologies

At a higher level of abstraction, there’s something striking: the activation functions, optimization methods, and network structures in a model occupy the same conceptual space as the techniques, frameworks, and learning methods that humans develop through experience.

Both share this trait:

Cannot explain the mechanism of any single success. A human can’t fully explain that moment of sudden clarity; a model can’t trace exactly how a given output emerged from a forward pass.
But both have accumulated empirical methods that raise success rates. Humans trust that deliberate practice, spaced repetition, the Feynman technique, and a path from concrete to abstract will, with enough repetition, reliably produce learning. Deep learning similarly trusts that discretization, normalization, attention, regularization, and optimization will, together, give the model the ability to learn new tasks and patterns.

Neither humans nor models have cracked the “how did this specific instance succeed” problem. What both have instead is a set of empirically validated methods for increasing the odds.

This matters for how we think about using large language models. We can’t program them the way we program traditional software — with explicit instructions that deterministically produce specific behavior. They behave more like people: you can’t write a rule that makes them do something, you can only create conditions — through examples, feedback, environment — that make the desired behavior more likely to emerge. That tendency is shaped during training, not programmed in.

The Role of Reflection: Not Replaying the Process, But Reinforcing Intuition

Both human self-reflection and the “reflection” mechanisms in some models serve a similar function — but it’s not what the word suggests.

They’re not replaying the actual internal process. Their main effect is:

Adjusting the outcomes
Stabilizing useful reactive patterns
Making future performance on similar problems better

There’s a real difference in degree, though. Humans have active metacognition: we can monitor our strategy mid-task, catch ourselves, shift approach, and even question the framework we’re using to think with. A model’s reflection is closer to an additional training step: generating more output, running extra inference, or fine-tuning — improvement through iteration, not through real-time self-monitoring.

Still, the functional role is similar: reflection is a form of secondary optimization, not a faithful reconstruction of the original reasoning path.

Conclusion: Understanding Is Structured Experience in Action

These observations converge on a deeper point: cognition is not primarily about explicitness — it’s about adaptability.

A few implications:

Explanations are post-hoc. Models can produce results without being able to explain the actual internal path; when they do produce “reasoning,” it’s generated after the conclusion is already formed. Humans are the same. Explanations have value but are typically incomplete and imprecise.
Intuition precedes explanation. For both humans and models, many decisive moments are essentially fast pattern-matching from structured experience. The explanation comes later.
Understanding is acting rightly under novelty. It’s less “can you articulate why” and more “can you keep responding appropriately as conditions shift.” This capacity lives in internal structure — structure too complex and distributed to cleanly transcribe into language.
Methods are analogous. Activation functions and optimization techniques sit at the same level of abstraction as human-developed methods like deliberate practice or spaced repetition. They don’t replace thinking — they create the conditions under which good thinking becomes more likely.
Guidance, not control. This structural similarity suggests that interacting with LLMs should feel more like teaching than programming. You shape behavior through examples, feedback, and environment — not through precise rule instructions.
Reflection reinforces, it doesn’t reconstruct. Repeated training and reflection can sharpen intuition, but that intuition doesn’t depend on a clean, articulable reasoning chain. Humans have an edge in metacognition, but the core function — strengthening useful patterns rather than replaying process — is shared.

The common ground: both human cognition and deep learning are, at their core, stable, generalizable reactive patterns shaped by the interaction of experience and structure.