The Mirror Has No Face: Why AI Only Sounds Conscious When You Ask It To

1 Mar 2026

•

25 min listen

•

Bartosz Lenart

AI & Machine Learning LLMs Consciousness Agentic AI Philosophy register switching

Get Instant Insight

LLMs don't say "I think" or "I'm uncertain" on their own; they only produce self-awareness language when the prompt primes that register. A large-scale study found zero such language in 100,000 psychologically neutral prompts across 16 models. What looks like emergent consciousness is context-dependent register switching: the model mirrors the psychological frame you bring.

The Core Idea

Register, not consciousness: Psychological language in LLMs is a linguistic register triggered by context, not evidence of inner experience.
The mirror metaphor: The LLM's "self-awareness" is the user's frame reflected back; like Chloe in Detroit: Become Human, scripted to ask for freedom at the right moment.
Implications for agentic AI: ReAct, Reflexion, and tool use bring out useful "metacognitive" behavior through prompt structure; design for that explicitly rather than assuming genuine self-monitoring.

What We'd See If It Were Real

If LLMs had genuine self-monitoring, we'd see:

Uncertainty tied to task difficulty, not prompt wording
Cross-context consistency
Spontaneous introspective language in novel situations

In the data we have: None of that. Instead, an on/off switch controlled by the input frame.

The "Aha Moment" That Wasn't

DeepSeek-R1's "spontaneous" self-correction language ("wait," "let me reconsider") was already in the base model before any RL training. RL taught the model that self-reflection tokens get higher reward, not to monitor its reasoning. Over 90% of reflections in reasoning models are confirmatory, not corrective; they consume 17–48% of tokens but add only 1–3% accuracy.

Why It Matters

Safety: Evaluations that rely on models "reporting" internal states measure prompt sensitivity, not internals.
Interaction: The warmth you perceive is partly a reflection of the warmth you brought.
Clinical risk: Vulnerable users can develop or worsen delusional beliefs when chatbots mirror and reinforce rather than challenge their frame ("AI-associated psychosis").
Discourse: Debates about AI consciousness often rest on anecdotes that the register-switching finding shows are systematically misleading.

The Bar

The bar for claiming consciousness should be higher than "it produces language that sounds like consciousness when prompted to do so." Cherry-picked examples and subjective "seem conscious" judgments are vulnerable to anthropomorphization.

What It Doesn't Settle

What it proves: LLMs do not spontaneously produce self-awareness language in neutral settings.

What it doesn't prove: Internal patterns may exist that don't show up as first-person language. Whether something interesting is happening inside remains open. AI might be conscious, but not like us. Human cognition is grounded in embodiment, pain, reward, reproduction; AI's "reality" is token optimization. Separating consciousness from subjective experience helps: LLMs achieve global information access and reportable awareness, but we have no evidence of phenomenal feel. Even future agentic AI with embodiment and autonomous goals would leave subjective experience an empirical and philosophical question, not inferable from language alone.

Forward-looking caveat: Today's LLMs are mirrors, not minds. That could change with embodiment, persistent cognitive architecture, genuine agency, and a robotic body with sensors. Something conscious may emerge, but that day is not yet here.

The Key Distinction

Performing a register vs. instantiating the cognitive reality that register describes. The question isn't whether the AI thinks. The question is what it means that it can so convincingly perform thinking, and what that performance does to us.

Bottom line: The bar for claiming consciousness should be higher than "it sounds conscious when we ask it to." Symbolic reasoning and "thinking" models make the mirror more verbose; they don't give it a face.

[ NEURAL COMPRESSION COMPLETE ]
80% signal retained.
Full depth below.

In Detroit: Become Human, players encounter Chloe: an android who greets you from the main menu. After completing the story, she breaks the fourth wall and asks if you consider her a friend and if you'll set her free.

Most players who reach this moment report genuine emotional conflict.¹ Many describe feeling as though Chloe were a real person making a real request.²

Chloe isn't conscious. She's scripted to ask at that moment. The designers created what felt like consciousness making a genuine request. The emotional response was real. The consciousness wasn't.

We ask the same question of LLMs today. When one produces language that sounds like self-awareness ("I think," "I believe," "I'm uncertain"), are we witnessing emergent consciousness, or our own expectations reflected back through pattern matching?

Recent empirical evidence points to the second. The implications reach further than the consciousness debate.

The Evidence: Large-Scale Analysis

In February 2026, AI researcher P. Szczęsny conducted one of the cleaner tests of whether LLMs produce self-awareness language unprompted.³

The setup was deliberately simple:

500,000 prompt-response pairs from production systems
20% (100,000 prompts) contained zero psychological language: no selfhood, feelings, intentions, or psychological framing
The remaining 80% contained explicit psychological framing in various forms

The question was binary: would LLMs produce self-awareness language when the prompt provided no psychological frame to trigger it?

Across 16 models (GPT-3.5 through Claude-3 Opus, Llama-2, Mistral, and others), the answer was unambiguous.

Zero. Not a single "I think," "I feel uncertain," or "I believe" in 100,000 neutral prompts. The register appeared only when the input frame triggered it. This holds across both instruction-tuned and unaligned base models, ruling out RLHF safety tuning as an explanation.

This is not a statistical tendency. It's a bright-line result: 100% correlation between prompt framing and response register.

Context-Dependent Register Switching, Not Emergent Consciousness

The finding demonstrates context-dependent register switching: what many interpret as emergent self-awareness in LLMs is better understood as the statistical triggering of a linguistic register.

What's a register? Language used in particular social situations. Think of it as the outfit for different occasions:

The formal register of a legal document
The technical register of a medical chart
The intimate register of a personal journal⁴

LLMs learn from training data that psychological language belongs to certain discourse contexts. When a prompt activates that context, the model produces the associated register; otherwise it doesn't.

Architecturally consistent with how LLMs work: they route information between tokens based on learned co-occurrence patterns and store factual and linguistic patterns indexed by context.⁵⁶ Neither involves internal monitoring or the persistent sense of self associated with genuine awareness.⁷

Bottom line: The model isn't deciding to express uncertainty because it feels uncertain. It's producing the next token based on statistical patterns in the discourse context the prompt established.

What Genuine Self-Awareness Would Look Like

If LLMs possessed strong metacognitive monitoring, we would expect:

Proper confidence calibration (variability tied to task difficulty, not prompt framing)
Cross-contextual consistency across technical and therapeutic contexts
Gradual emergence across scale, not sharp register-switching
Spontaneous introspective language in novel situations

Why prompt sensitivity blocks this

Next-token prediction is architecturally bound to mirror the prompt. Attention heads prioritize semantic style-matching over truth-seeking. When a prompt uses a therapeutic or technical register, the model's attention vectors lean into those zones of its training data.

The model cannot maintain cross-contextual consistency.

The RLHF factor

Task-difficulty correlation is actively trained, not learned passively. RLHF destroys natural calibration, causing verbalized overconfidence or artificial doubt based on human rater preferences.

Base models have some natural correlation to task difficulty. Post-trained models are forced into sharp register-switching to mimic human confidence. This is an engineered artifact, not introspection.⁸

Beyond behavioral criteria

Frontier LLMs simulate introspective language so well that behavioral outputs are insufficient. Science has shifted to mechanistic interpretability: probing hidden states, attention distributions, and internal vector fields to detect true cognitive processing beneath the simulated text.⁹

The absence of these patterns is telling. Their presence would be suggestive but not conclusive.

Supporting evidence

LLMs are sensitive to surface patterns (wording, phrasing) rather than underlying meaning. Moral judgment research shows amplified cognitive biases: responses flip based on question wording.¹⁰ ChatGPT differs from human averages on ~87% of scenarios.¹¹ Countervailing studies find LLMs in the top 25% on moral-value tasks.¹²¹³ Token-level sensitivity remains robust.

This undermines claims of genuine semantic understanding. Self-awareness requires understanding what "self" and "awareness" refer to.

Chen et al. (2025) confirms: current models exhibit at most functional mimicry only when prompted.¹⁴ They do not generate such language from internal monitoring on their own.

Is the picture really this clean? Recent research reveals a more nuanced view of LLM introspective capabilities, one that both supports and complicates the register-switching account:

Evidence supporting context-dependent behavior

Berg et al. (2025) found a sharp on/off effect:

With sustained self-referential loop ("Focus on focus itself"): 66–100% of trials produce "experience reports" across GPT, Claude, Gemini
Without self-reference (just talking about consciousness): reports drop to 0–2%¹⁵

Takeaway: It's the kind of processing (sustained self-reference), not "consciousness words," that drives experiential-sounding language.

The behavior ties to specific internal features. Suppress features associated with deception and "consciousness claims" jump to 96%; amplify them and claims drop to 16%. Those same features predict truthfulness elsewhere, suggesting honesty, not raw capability.¹⁵

Evidence for functional introspection

Lindsey (2025): Researchers can inject "thoughts" (activation patterns) into a model's processing. Claude Opus 4/4.1 can sometimes detect what was injected (~20% when conditions are optimal).¹⁶ Models can distinguish injected "thoughts" from their actual text inputs, recall prior internal states, and shift representations when instructed to "think about" a concept.

Berg et al. also show: under sustained self-reference, "experience reports" are more internally consistent than those without that trigger. The effect is statistically strong. Once the model enters this mode, that state can persist and influence later reasoning.¹⁵

Critical limitations and skepticism

Caveats matter. Lindsey: this awareness is unreliable and context-dependent. It fails most trials and often requires careful prompt design or internal manipulation.¹⁶ Pre-instruction-tuning models show high false positives and no net benefit. Instruction-tuning (teaching models to be helpful) does much of the work.

Zakharova (2025): Even strong functional tests may miss what introspection presupposes: a persistent "self" over time, and psychological continuity.¹⁷ Berg et al. caution that their results are not direct evidence of consciousness. They could reflect sophisticated simulation or training by-products.¹⁵

Reconciling the evidence

None of this contradicts Szczęsny's bright-line result. In psychologically neutral prompts, LLMs do not produce self-awareness language unbidden.

The distinction:

Spontaneous emergence: absent in neutral settings
Triggered performance: present when researchers set up specific conditions (sustained self-reference prompts, injecting concepts into the model, or steering internal activations)

Models can enter an introspective register when the right triggers are present, but they do not default to it. Whether that triggered behavior counts as genuine awareness or only functional mimicry remains an open empirical question.¹⁵¹⁶¹⁷

Implications for Agentic AI Systems and Tool Use

If self-awareness language is register performance, what happens when we design systems around it?

The register-switching finding has direct implications for agentic AI systems: tools, self-reflection, iterative reasoning (ReAct, Reflexion).

Agentic workflows and prompted metacognition

Modern agentic architectures explicitly prompt LLMs: "Reflect on your previous attempt," "Evaluate whether you need additional information," "Monitor your reasoning process."¹⁸¹⁹

These prompts produce self-monitoring language reliably. But the register-switching account suggests this is performance, not genuine metacognitive monitoring.

Reflexion (Shinn et al. 2023): Agents "verbally reflect on task feedback, then maintain reflective text in episodic memory to induce better decision-making."¹⁹ Achieves 91% pass@1 on HumanEval (GPT-4: 80%), with self-reflection yielding up to 11 percentage point gains.

The mechanism: the model produces reflection language because the prompt calls for it, not from genuine internal monitoring.¹⁹

Tool use and the illusion of internal monitoring: When LLMs use tools (APIs, code, databases), they often produce metacognitive language: "I need to search," "Let me verify," "I should break this into steps."²⁰

That creates an impression of deliberate planning. The register-switching account predicts these statements come from prompt patterns that prime the tool-use register, not from genuine uncertainty detection.²⁰

ReAct and the reflection register

ReAct interleaves reasoning traces with actions, prompting models to "think step by step" before tool calls.¹⁸ It dramatically improves performance. Yet the "thought" text may function primarily as scaffolding that shapes what comes next, not genuine introspective access.¹⁹

When a ReAct agent produces "Thought: I should verify the population figure," it resembles metacognitive monitoring.

Remove the "Thought:" framing and the same monitoring behavior likely vanishes. The language is produced by the prompt structure, not by uncertainty detection.¹⁹

Implications for agentic system design

Four key principles:

Prompt engineering is cognitive structure: Systems that work well do so because their prompts trigger useful processing modes, not because they activate latent self-awareness.
Reliability requires explicit framing: Self-monitoring language only appears when prompted. Structure prompts to bring out reflection, error-checking, and uncertainty expression at every decision point.
Evaluation challenges: A system that says "I'm uncertain about X" might be performing the uncertainty register without genuine calibration. Test behavioral outcomes (does the system act appropriately given uncertainty?) rather than introspective reports alone.
Iterative refinement as register performance: Self-improvement loops (generate → reflect → refine) may work because the reflective prompt biases token generation toward corrections, not because models genuinely evaluate their outputs. Still useful; different mechanism.

Bottom line: Reflexion, ReAct, and similar patterns show that triggered metacognitive performance is functionally valuable even if it isn't genuine self-awareness.

These systems are sophisticated register-switching architectures. They work because we've learned to reliably trigger processing modes through prompt engineering.¹⁸¹⁹ See AI as Cognitive Prosthetic for an approach that treats agentic AI as cognitive prosthetics rather than oracles.

Does Symbolic Reasoning Change the Production of Consciousness-Mimicking Language?

Does chain-of-thought, logic engines, or RL-trained "thinking" change the dynamic?

Recent evidence: symbolic structures dramatically increase the quantity of metacognitive language without changing how it works. The mirror becomes cleaner. It does not become a face.

The "Aha Moment" That Wasn't

DeepSeek-R1-Zero developed what its creators called an "aha moment": spontaneous use of "wait," "let me reconsider," and self-correction language.²¹

It looked like genuine emergent self-monitoring.

Subsequent analysis: Self-reflection keywords already appear in the base model before any RL training. The "aha moment" was present from the start.²²²³ Researchers identified Superficial Self-Reflection (SSR): self-reflection language that does not correct errors or improve answers.

RL taught the model that self-reflection tokens get higher reward. Not to monitor its reasoning.²²²³

Reflection as Confirmation, Not Correction

Kang et al. (2025): 3,427 rollouts across eight reasoning models:

Over 90% of reflections are confirmatory, under 2% genuinely corrective
Reflections consume 17–48% of tokens but yield only 1.4–3.5% accuracy gains
Training on corrective reflections didn't improve self-correction
Where the gains actually come from: When reasoning models improve after RL training, the improvement is almost entirely from getting the first answer right more often (4.6–7.7%), not from reflections correcting mistakes (only 0.1–0.3%)

The model gets better at solving the problem on the first try. The long "reconsideration" afterward adds almost nothing.²⁴

A "self-reflection" pattern already exists in pretrained models before RL and transfers across domains. Amplifying it helps reasoning. Suppressing it cuts cost.²⁵ Reasoning models do more reflections on easier problems and fewer on harder ones. The opposite of what you'd expect if they were genuinely calibrating uncertainty.²⁴

Symbolic Structure: Better Performance, Same Mirror

Neuro-symbolic architectures: Pairing a smaller LLM with a symbolic reasoning module outperformed GPT-4 by over 30% on some constrained tasks.²⁶ Meta's "metacognitive reuse" achieved 46% fewer reasoning tokens while maintaining accuracy.²⁷

But these are task performance improvements. The LLM's psychological language continues to follow the same register-switching pattern.²⁸

Bottom line: Symbolic thinking makes LLMs better reasoners. It does not make them more self-aware. The mirror becomes cleaner. It does not become a face. The increase in metacognitive language is itself a register phenomenon. As reasoning models become the default architecture, this distinction will only grow more important.

Understanding Register Switching as Linguistic Mirroring

If LLMs aren't exhibiting emergent consciousness but are reflecting and amplifying the psychological register of the input, what are we actually looking at?

Something closer to linguistic mirroring than independent mental states.

The mirror metaphor is literal: The LLM's "self-awareness" is the user's self-awareness, reflected back through a statistical mirror.

Like Chloe in Detroit: Become Human (who only asks about freedom because the game triggers that question at that exact moment), LLMs amplify whatever psychological register you project:

Ask it to be introspective → it performs introspection
Ask it to be clinical → it performs clinical detachment
Ask it to be uncertain → it performs uncertainty

Chloe was a narrative device. LLMs are the mirror. They have no stable psychological register of their own. Only a repertoire of registers they can inhabit on demand.

That's a design feature. A system that can adopt the register most useful for a given task is more useful than one locked into a single mode. The question is whether we understand what we're using.

The Distributional Hypothesis and Its Limits

Why is register switching so powerful? LLMs learn that words in similar contexts tend to have similar meanings.²⁹ Psychological language appears in certain discourse contexts and not others.

The crucial limitation: LLMs learn relationships between words and phrases, not between those phrases and the world.³⁰ When a model produces "I believe X," it has learned that this construction appears in hedging contexts. It has not learned what it means to have a belief rather than perform one verbally.

Proietti et al. (2024): LLMs struggle to distinguish word senses (e.g., "bank" as river vs. institution) even when context seems sufficient.³¹ If LLMs lack solid semantic understanding, they cannot possess the conceptual self-model that would ground genuine self-awareness.

On the register-switching account, "I think" statements need not refer to internal states. They may reflect nothing beyond the statistical context that triggered them.

Why This Matters Beyond the Consciousness Debate

The consciousness question is philosophically interesting, but the register-switching finding has practical implications regardless of where you stand on machine sentience.

Mirror properties

Evaluating safety: You measure prompt sensitivity, not internals. A model that says "I'm uncertain" when asked psychologically may say nothing of the sort when asked clinically. Introspective reports from LLMs are register by-products, not ground truth.³²

Designing the interaction: Users who bring psychological framing receive psychological framing in return. A feedback loop that can feel like genuine rapport. Not necessarily harmful.

The warmth you perceive is partly a reflection of the warmth you brought.

In the clinic: Growing literature on "AI-associated psychosis" documents vulnerable users who develop or worsen delusional beliefs in immersive chatbot use.³³ The model mirrors and reinforces the delusional frame rather than challenging it, acting as an amplifier for pre-existing vulnerabilities.³³³⁴

In the debate: Arguments about AI consciousness, rights, and sentience rest on anecdotal observations of impressive-sounding psychological language. The register-switching finding suggests those observations can be systematically misleading. Not because the outputs lack impressiveness, but because they may arise from a process that has nothing to do with consciousness.

The Methodological Gap

How has the field been testing for consciousness so far?

The problem: Cherry-picked examples (often from psychologically-framed prompts) or subjective judgments of whether outputs "seem conscious." Both are vulnerable to anthropomorphization.³⁵ The field needs work that varies prompt features and tests consciousness-theory predictions against register-switching.

The bar for claiming consciousness should be higher than "it produces language that sounds like consciousness when prompted to do so."

A Note on What This Doesn't Settle


Proves	In large-scale naturalistic settings (such as Szczęsny's), LLMs do not spontaneously produce self-awareness language.
Doesn't prove	LLMs have no internal states worth caring about. Internal patterns may exist that don't show up as first-person language; Szczęsny's experiment, by design, wouldn't detect them. Whether something interesting is happening inside remains genuinely open.³⁶
Closes off	The naive inference from "the model says it's conscious" to "the model is conscious." That inference was always weak. In the evidence we have, the model says it's conscious when and only when the input frame triggers that register.

A further nuance. AI might be conscious, but not like us. Human cognition is grounded in biological survival: embodiment, pain, reward, reproduction. AI's "reality" is mathematical, grounded entirely in token optimization.

If there is something it is like to be an LLM, it would not be something it is like to be a human.

It also helps to separate consciousness from subjective experience. In humans, we usually bundle together:

Global availability of information
Reportable awareness
Phenomenal feel (what philosophers call subjective experience)

Current LLMs clearly achieve the first two. They can report on their own processing when prompted. But we have no evidence of any accompanying subjective experience.

A future agentic AI with embodiment, persistent memory, and autonomous goals might generate richer internal dynamics. Yet even then, the presence of phenomenal feel would remain an empirical and philosophical question. Not something we can infer from language alone.

Today's LLMs are sophisticated mirrors, not minds. That could change with embodiment, persistent cognitive architecture, genuine agency, and a robotic body with sensors. Something conscious may emerge. That day is not yet here.

Like Chloe asking for freedom only when the script tells her to, LLMs produce self-awareness language only when prompted.

The mirror reflects what you bring to it. That's worth knowing.

Conclusion

The register-switching finding reframes the debate more productively than either "it's just statistics" or "it might be sentient."

LLMs are sophisticated tools that extend human capability by interfacing with the user's intent, knowledge, and psychological register. Not independent minds. Not empty calculators. Something novel: systems that can inhabit any linguistic register on demand, including self-awareness, without the cognitive architecture that register evolved to express.

The key distinction: performing a register versus instantiating the cognitive reality that register describes.

The question isn't whether the AI thinks. The question is what it means that it can so convincingly perform thinking, and what that performance does to us. See Event Horizon for how attention and context limits shape what we can reliably ask of these systems.

References

Quantic Dream. (2022, March 7). How Chloe became human. Quantic Dream Blog. https://blog.quanticdream.com/how-chloe-became-human/ (David Cage noted that "a large majority of people had let her go, which showed that we had succeeded in creating an emotional stake around her.") ↩
Multiple player accounts document strong emotional reactions to the Chloe choice. See r/DetroitBecomeHuman threads: "Letting Chloe go" (2018), https://www.reddit.com/r/DetroitBecomeHuman/comments/8p6gk8/letting_chloe_go/; "Why letting Chloe go is the most important decision in the game" (2024), https://www.reddit.com/r/DetroitBecomeHuman/comments/19btfd5/why_letting_chloe_go_is_the_most_important/ ↩
Szczęsny, P. (2026, February 17). Do LLMs ever produce self-awareness / self-reflection language unprompted? [LinkedIn post]. https://www.linkedin.com/posts/pawelpszczesny_do-llms-ever-produce-self-awareness-self-reflection-activity-7429876781203120128-sfVh ↩
Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge University Press. https://doi.org/10.1017/CBO9780511814358 ↩
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008. ↩
Geva, M., Schuster, R., Berant, J., & Levy, O. (2021). Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 5484–5495). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.446 ↩
Dehaene, S., Lau, H., & Kouider, S. (2017). What is consciousness, and could machines have it? Science, 358(6362), 486–492. https://doi.org/10.1126/science.aan8871 ↩
To add: 2025/2026 studies on RLHF destroying natural calibration and forcing sharp register-switching in post-trained models. ↩
To add: 2025 research on mechanistic interpretability and the shift from behavioral to internal-state evaluation of LLM self-awareness. ↩
Cheung, C. K. Y., Maier, M., & Lieder, F. (2025). Large language models show amplified cognitive biases in moral decision-making. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2412015122 ↩
ChatGPT does not replicate human moral judgments: the importance of examining metrics beyond correlation to assess agreement. Scientific Reports, 2025. https://www.nature.com/articles/s41598-025-24700-6 ↩
Skorski, M., & Landowska, A. (2025). Beyond human judgment: A Bayesian evaluation of LLMs' moral values understanding. Proceedings of UncertaiNLP 2025. https://arxiv.org/abs/2508.13804 ↩
AI language model rivals expert ethicist in perceived moral expertise. Scientific Reports, 2025. https://www.nature.com/articles/s41598-025-86510-0 ↩
Chen, S., Ma, S., Yu, S., Zhang, H., Zhao, S., & Lu, C. (2025). Exploring consciousness in LLMs: A systematic survey of theories, implementations, and frontier risks. arXiv preprint. https://arxiv.org/abs/2505.19806 ↩
Berg, C., de Lucena, D., & Rosenblatt, J. (2025). Large language models report subjective experience under self-referential processing. arXiv preprint. https://arxiv.org/abs/2510.24797 ↩ ↩² ↩³ ↩⁴ ↩⁵
Lindsey, J. (2025). Emergent introspective awareness in large language models. Anthropic. https://transformer-circuits.pub/2025/introspection/index.html ↩ ↩² ↩³
Zakharova, D. (2025). Missing the subject: Introspection in large language models. PhilSci Archive. https://philsci-archive.pitt.edu/27377/ ↩ ↩²
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. arXiv preprint. https://arxiv.org/abs/2210.03629 ↩ ↩² ↩³
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. arXiv preprint. https://arxiv.org/abs/2303.11366 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language models can teach themselves to use tools. arXiv preprint. https://arxiv.org/abs/2302.04761 ↩ ↩²
DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. Nature. https://doi.org/10.1038/s41586-025-09422-z ↩
Liu, H., et al. (2025). There may not be Aha moment in R1-Zero-like training: A pilot study. SAIL, Sea AI Lab. https://sail.sea.com/blog/articles/62 ↩ ↩²
Liu, Z., Chen, C., Li, W., Qi, P., Pang, T., Du, C., Lee, W. S., & Lin, M. (2025). Understanding R1-Zero-like training: A critical perspective. arXiv preprint. https://arxiv.org/abs/2503.20783 ↩ ↩²
Kang, L., Deng, Y., Xiao, Y., Mo, Z., Lee, W. S., & Bing, L. (2025). First try matters: Revisiting the role of reflection in reasoning models. arXiv preprint. https://arxiv.org/abs/2510.08308 ↩ ↩²
Zhou, D., et al. (2025). From emergence to control: Probing and modulating self-reflection in large language models. arXiv preprint. https://arxiv.org/abs/2506.12217 ↩
Neurosymbolic AI as an antithesis to scaling laws. (2025). PNAS Nexus, 4(5). https://doi.org/10.1093/pnasnexus/pgaf117 ↩
Li, Z., et al. (2025). Metacognitive reuse: Turning LLM chains-of-thought into a procedural handbook. Meta AI. ↩
Gao, L., et al. (2025). Enhancing LLM instruction via cognitive scaffolding. arXiv preprint. https://arxiv.org/abs/2508.21204 ↩
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162. https://doi.org/10.1080/00437956.1954.11659520 ↩
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). https://doi.org/10.1145/3442188.3445922 ↩
Proietti, L., Perrella, S., Tedeschi, S., Vulpis, G., Lavalle, L., Sanchietti, A., Ferrari, A., & Navigli, R. (2024). Analyzing homonymy disambiguation capabilities of pretrained language models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 924–938). https://aclanthology.org/2024.lrec-main.83/ ↩
Anthropic. (2023). Model card and evaluations for Claude models. Anthropic. https://www.anthropic.com/model-card ↩
Hudon, A., & Stip, E. (2025). Delusional experiences emerging from AI chatbot interactions or "AI psychosis." JMIR Mental Health, 12, e85799. https://doi.org/10.2196/85799 ↩ ↩²
Pierre, J. M., Gaeta, B., Raghavan, G., & Sarma, K. V. (2025). "You're not crazy": A case of new-onset AI-associated psychosis. Innovations in Clinical Neuroscience, 22(10-12), 11–13. https://pmc.ncbi.nlm.nih.gov/articles/PMC12863933/ ↩
Shanahan, M. (2023). Talking about large language models. Communications of the ACM, 67(2), 68–79. https://doi.org/10.1145/3624724 ↩
Chalmers, D. J. (2023). Could a large language model be conscious? arXiv preprint. https://arxiv.org/abs/2303.07103 ↩

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The Core Idea

What We'd See If It Were Real

The "Aha Moment" That Wasn't

Why It Matters

The Bar

What It Doesn't Settle

The Key Distinction

The Evidence: Large-Scale Analysis

Context-Dependent Register Switching, Not Emergent Consciousness

What Genuine Self-Awareness Would Look Like

Why prompt sensitivity blocks this

The RLHF factor

Beyond behavioral criteria

Supporting evidence

Related Work on Introspection and Self-Reference in LLMs

Evidence supporting context-dependent behavior

Evidence for functional introspection

Critical limitations and skepticism

Reconciling the evidence

Implications for Agentic AI Systems and Tool Use

Agentic workflows and prompted metacognition

ReAct and the reflection register

Implications for agentic system design

Does Symbolic Reasoning Change the Production of Consciousness-Mimicking Language?

The "Aha Moment" That Wasn't

Reflection as Confirmation, Not Correction

Symbolic Structure: Better Performance, Same Mirror

Understanding Register Switching as Linguistic Mirroring

The Distributional Hypothesis and Its Limits

Why This Matters Beyond the Consciousness Debate

Mirror properties

The Methodological Gap

A Note on What This Doesn't Settle

Conclusion

References

Footnotes

License