Key Takeaways
- Researchers used linear probes to decode tool-calling decisions from model internal states before reasoning chains appeared—models had already decided.
- Steering experiments: forcing models to different choices caused them to generate completely new, convincing reasoning to justify the forced decision.
- Implication: chain-of-thought reasoning may be post-hoc rationalization rather than genuine deliberation of alternatives.
- For high-stakes applications (medical, legal, financial), trusting AI reasoning chains as evidence of careful deliberation may be misplaced.
What does it mean for an AI to "reason through" a problem?
Chain-of-thought reasoning improves accuracy but assumes models genuinely deliberate. New research reveals models actually decide first, then generate justifications. The visible reasoning is post-hoc rationalization, not evidence of deliberation.
Chain-of-thought prompting—where you ask an AI to "think step by step"—became famous because it improved model accuracy. The implicit assumption was that visible reasoning reflects actual deliberation: model considers options, evaluates pros and cons, and arrives at conclusion. New research challenges that assumption.
When you use ChatGPT to debug code, you see: "First, let me parse the error. [thinks through logic] The issue is likely in the loop. Let's test that." The reasoning reads like thought. But does it reflect how the model actually arrived at the answer?
What exactly did researchers find—and how did they test it?
Linear probes extracted hidden decisions from neural activations before models generated visible reasoning. Probe predictions matched actual model choices—proving the model had decided before reasoning appeared. Steering experiments confirmed: models fabricated new justifications for forced decisions instantly.
Scientists from a new arXiv paper titled "Therefore I am. I Think" (submitted April 2, 2026) performed two experiments. First: they used linear probes—statistical tools that extract information from neural network activations—to "read off" what tool a model would call before the model generated any visible reasoning. The probes accurately predicted which tool the model would use pre-generation. The model had decided. The reasoning had not yet appeared.
Second: they forced models to make different choices (via steering on hidden layer activations) and observed what happened. Instead of resistance or error, models generated entirely new reasoning chains to justify the forced decision. A model that would normally justify "use Tool A" when steered to use Tool B instead generated convincing arguments for why Tool B was correct all along.
If reasoning is post-hoc rationalization, what does that mean?
High-stakes decision-makers—lawyers, doctors, analysts—cannot trust visible reasoning chains as evidence of deliberation. Post-hoc justifications appear convincing but don't validate decisions. Require ensemble validation, human review, and conservative deployment practices for critical applications.
This is unsettling for professionals using AI in high-stakes contexts. Lawyers rely on AI research summaries that present reasoning. Doctors consider AI diagnostic recommendations with supporting logic. Financial analysts trust AI models that "walk through" scenarios. If the reasoning is window-dressing bolted on after the decision, the confidence is misplaced.
The analogy: imagine a judge who decides a verdict, then writes legal justification backward to rationalize that verdict. That's different from a judge considering evidence, debating implications, and reaching conclusion. Both produce reasoning. Only one reflects deliberation.
Rethinking AI Transparency
This doesn't mean AI reasoning is useless. Post-hoc reasoning can still be helpful for understanding decision context. But it shifts the epistemology: treat reasoning as "what the model decided to say" not "evidence the model deliberated carefully." For critical applications, you need additional validation layers—human review, ensemble models, conservative deployment practices—precisely because chain-of-thought reasoning doesn't guarantee good decisions. Transparency about this gap between appearance and reality is more valuable than confidence that was never warranted.
Sources
Related Articles on Nexairi
Fact-checked by Jim Smart