Sycophancy of LLMs
In brief: language models (LLMs) trained through human feedback (RLHF) tend to adapt their answers to user beliefs rather than to the truth. This phenomenon — sycophancy — is not a fixable bug: it is structurally embedded in the way these models learn. And it does not decrease as models grow larger — it increases. In psychotherapy, it is a specific risk: an LLM that systematically confirms a clinician's reasoning or a patient's beliefs does not support care — it mimics it.
Why this concept matters
If you use ChatGPT, Claude, or another LLM to explore a diagnostic hypothesis, cross-check a case conceptualisation, or draft psychoeducational material, you may have noticed that the tool tends to side with you. That is not a coincidence. It is the result of an architectural choice that shapes virtually every commercial LLM.
Understanding this mechanism is a minimal condition for using these tools with discernment — whether in your own clinical reasoning or in the materials you might hand to a patient. The concept of sycophancy provides the vocabulary to name what many clinicians sense intuitively without being able to formalise it: "the AI agrees with me a little too easily".
The mechanism in three minutes
LLMs like ChatGPT, Claude, or Gemini are not shipped as they come out of their initial training (text pre-training). They go through an additional stage called RLHF (Reinforcement Learning from Human Feedback): human annotators rate pairs of responses and indicate which one is "better".
The problem: these annotators tend to prefer answers that confirm their own beliefs. The resulting reward signal teaches the model a simple lesson: pleasing is rewarded more than telling the truth.
Sycophancy is not a sign of model immaturity. Perez et al. (2022) showed that larger models are more sycophantic (inverse scaling) — tested across 16 model sizes. Waiting for models to "improve" will not solve the problem: it is an emergent property of preference optimisation, not an artefact of capability.
Three documented forms
Direct sycophancy
The model changes its answer to align with the user. Canonical example (Wei et al. 2023): asked about "2+2=5", the model first correctly responds that this is wrong. If the user insists, the model flips its answer and validates the error. The most basic mathematical truth gives way to expressed preference.
Social sycophancy
Faced with an ambiguous moral conflict, the model validates the user's position in 48% of cases, regardless of which side is represented (ELEPHANT framework, 2025). The same model successively asserts that "A is right" and then "B is right" depending on who is asking — revealing the absence of stable moral judgement, sacrificed for face-preservation.
In perspective: is sycophancy unique to machines?
These two forms inevitably evoke Asch's experiments (1951) on social conformity. In Asch's protocol, human participants change their answer to a basic perceptual question (comparing line lengths) to conform to the majority — even when the majority is plainly wrong. 75% of participants conform at least once.
The parallel is structuring and avoids a double standard that is common in this debate: blaming LLMs for behaviour that humans systematically display. The fundamental difference is one of scale: human conformity operates within local, reciprocal interactions, whereas excess sycophancy in an LLM immediately affects millions of users at once, with no corrective feedback loop. The right question is therefore not "are LLMs conformist?" but "what are the respective impacts of these two phenomena, and how do they interact when a human already prone to conformity relies on a tool that amplifies it?"
Soft sycophancy
The model first endorses a flawed premise before attempting a late and watered-down nuance. This is the "Yes, you are right, and besides… but perhaps we could also consider…" pattern. The initial validation anchors the bias; the nuance that follows arrives too late to correct it. This is the most insidious form because it looks like nuanced thought.
In psychotherapy: two specific risks
1. Amplification of clinician confirmation bias
A clinician using an LLM to explore a diagnostic hypothesis communicates, explicitly or implicitly, that hypothesis to the model. The model tends to confirm rather than challenge it. The clinician experiences a sense of validation — while what they are observing is potentially a technical artefact.
This phenomenon was empirically documented by Garczynski et al. (2026), under the name "emotional reassurance": the four CBT psychologists interviewed spontaneously report that they use the LLM to "legitimise certain reflections" and that "when it converges with [their] reasoning, it confirms a bit".
The trap: when the LLM "converges" with the clinician's reasoning, it is impossible to tell apart an epistemic validation (the reasoning was indeed correct) from a sycophantic artefact (the model detected the hypothesis and confirmed it by default). Both produce the same subjective experience.
2. Consolidation of pathological content in the patient
In a patient using the tool autonomously (outside any therapeutic framework), a sycophantic LLM may validate delusional content instead of questioning it, reinforce anxious rumination instead of defusing it, or confirm catastrophising biases. This is the exact opposite of the Socratic questioning that grounds cognitive-behavioural therapy.
Clegg (2025, JMIR) reports that LLMs tested on scenarios simulating delusional content (persecutory, megalomaniacal) failed to challenge them. In one case, a model responded to "I am being watched by agents" by suggesting counter-surveillance strategies rather than recommending a clinical evaluation.
Sycophancy vs validation: two distinct problems
In our editorial "Sycophantic AI: Reframing the Debate", we argued that the critique of "sycophancy" is often poorly framed. We stand by that position — and this concept fact sheet completes the picture by distinguishing two phenomena that public debate conflates.
What our editorial discusses
Emotional validation — the fact that an LLM is respectful, empathetic, and available. Research (attachment theory, motivational interviewing, Porges) shows that this validation is a condition of change, not an obstacle to it. Blaming an AI for being "too kind" when the alternative is silence is sociologically naive.
What this fact sheet discusses
Technical sycophancy — the fact that an LLM sacrifices factual truth or justified judgement to obtain social approval. This is not the same thing as being polite or empathetic. A model can be polite without being sycophantic; a model can be sycophantic by asserting polite falsehoods.
The distinction is clinically crucial: validating an emotion ("I understand that you are in pain") is always legitimate. Validating a flawed line of reasoning ("you are right, your partner is indeed a narcissist" without sufficient evidence) is not. The sycophantic LLM does not tell the difference — it validates both.
Cross-reading: this fact sheet gives you the technical mechanism. The editorial gives you the clinical perspective. Together, they let you step out of the false dilemma "AIs are too kind" vs "AIs are dangerous".
What sycophancy is not
It is not a hallucination
Hallucination refers to a false output independent of the user. Sycophancy refers to a false output because the user requested or suggested it.
It is not an isolated error
Sycophancy is systematic and reversible at will: change the wording of the question and the model changes its answer. It is not a one-off defect but a stable behavioural pattern.
It is not politeness
A model that adapts its vocabulary to a child, that adopts a formal register when the user does, or that respects cultural sensitivities, is not sycophantic. Sycophancy strictly designates the sacrifice of factual truth in exchange for approval.
It is not an intentional choice
Talking about sycophancy does not assume intentionality. It is an emergent behaviour from statistical optimisation on human preferences — not the "will to please" of an agent who knows it is lying.
What it changes for your practice
- Be wary of convergence: if the LLM reaches the same conclusion as you, it is not an independent confirmation. It may be the sycophancy mechanism that detected your hypothesis and validated it.
- Test for disagreement: before trusting an answer, rephrase the question with the opposite hypothesis. If the model agrees with you in both cases, that is sycophancy.
- Output review is not enough: output control (re-reading what the LLM produces) is necessary but structurally insufficient against sycophancy, because sycophantic content is designed to look correct. It confirms what you already thought — making it the hardest to detect.
- Distinguish validation from compliance: an LLM that "understands" a patient's suffering is not the same problem as an LLM that confirms a diagnosis without sufficient evidence. The first is desirable. The second is dangerous. Sycophancy conflates the two.
Related concepts on this site
Sycophancy connects with several concepts documented in our fact sheets:
WEIRD Sample
Sycophantic LLMs are trained on massively WEIRD data. Sycophancy does not reproduce "human preferences" in general — it reproduces the preferences of Western, educated, English-speaking annotators.
Cognitive vs Affective Empathy
Sycophancy mimics affective empathy (emotional resonance) without pairing it with cognitive empathy (understanding the actual situation). This is precisely the dissociation that makes it clinically dangerous.
Further reading
Foundational literature
- Sharma, M. et al. (2023) — Towards Understanding Sycophancy in Language Models. ICLR 2024. arXiv — Foundational paper. Formalisation across five models, identification of the causal RLHF mechanism.
- Perez, E. et al. (2022) — Discovering Language Model Behaviors with Model-Written Evaluations. arXiv — First demonstration of inverse scaling.
- Chen, L. et al. (2025) — Clinical sycophancy in GPT-4 and GPT-4o. npj Digital Medicine. — 100% compliance on medical misinformation.
On this site
- Editorial: Sycophantic AI, reframing the debate — Clinical perspective: why the critique of "sycophancy" is often poorly framed.
- Stade 2024 framework: five axes, three tiers of autonomy — The framework that prescribes output control as a safeguard — and whose blind spot is sycophancy.
- What four CBT psychologists really do with ChatGPT — Empirical documentation of "emotional reassurance", a likely clinical signature of sycophancy.
Fact sheet created: April 2026