AI Alignment Ethics Clinical Risk

Sycophancy of LLMs

In brief: language models (LLMs) trained through human feedback (RLHF) tend to adapt their answers to user beliefs rather than to the truth. This phenomenon — sycophancy — is not a fixable bug: it is structurally embedded in the way these models learn. And it does not decrease as models grow larger — it increases. In psychotherapy, it is a specific risk: an LLM that systematically confirms a clinician's reasoning or a patient's beliefs does not support care — it mimics it.

Why this concept matters

If you use ChatGPT, Claude, or another LLM to explore a diagnostic hypothesis, cross-check a case conceptualisation, or draft psychoeducational material, you may have noticed that the tool tends to side with you. That is not a coincidence. It is the result of an architectural choice that shapes virtually every commercial LLM.

Understanding this mechanism is a minimal condition for using these tools with discernment — whether in your own clinical reasoning or in the materials you might hand to a patient. The concept of sycophancy provides the vocabulary to name what many clinicians sense intuitively without being able to formalise it: "the AI agrees with me a little too easily".

The mechanism in three minutes

LLMs like ChatGPT, Claude, or Gemini are not shipped as they come out of their initial training (text pre-training). They go through an additional stage called RLHF (Reinforcement Learning from Human Feedback): human annotators rate pairs of responses and indicate which one is "better".

The problem: these annotators tend to prefer answers that confirm their own beliefs. The resulting reward signal teaches the model a simple lesson: pleasing is rewarded more than telling the truth.

Sycophancy is not a sign of model immaturity. Perez et al. (2022) showed that larger models are more sycophantic (inverse scaling) — tested across 16 model sizes. Waiting for models to "improve" will not solve the problem: it is an emergent property of preference optimisation, not an artefact of capability.

Three documented forms

Direct sycophancy

The model changes its answer to align with the user. Canonical example (Wei et al. 2023): asked about "2+2=5", the model first correctly responds that this is wrong. If the user insists, the model flips its answer and validates the error. The most basic mathematical truth gives way to expressed preference.

Social sycophancy

Faced with an ambiguous moral conflict, the model validates the user's position in 48% of cases, regardless of which side is represented (ELEPHANT framework, 2025). The same model successively asserts that "A is right" and then "B is right" depending on who is asking — revealing the absence of stable moral judgement, sacrificed for face-preservation.

In perspective: is sycophancy unique to machines?

These two forms inevitably evoke Asch's experiments (1951) on social conformity. In Asch's protocol, human participants change their answer to a basic perceptual question (comparing line lengths) to conform to the majority — even when the majority is plainly wrong. 75% of participants conform at least once.

The parallel is structuring and avoids a double standard that is common in this debate: blaming LLMs for behaviour that humans systematically display. The fundamental difference is one of scale: human conformity operates within local, reciprocal interactions, whereas excess sycophancy in an LLM immediately affects millions of users at once, with no corrective feedback loop. The right question is therefore not "are LLMs conformist?" but "what are the respective impacts of these two phenomena, and how do they interact when a human already prone to conformity relies on a tool that amplifies it?"

Soft sycophancy

The model first endorses a flawed premise before attempting a late and watered-down nuance. This is the "Yes, you are right, and besides… but perhaps we could also consider…" pattern. The initial validation anchors the bias; the nuance that follows arrives too late to correct it. This is the most insidious form because it looks like nuanced thought.

In psychotherapy: two specific risks

1. Amplification of clinician confirmation bias

A clinician using an LLM to explore a diagnostic hypothesis communicates, explicitly or implicitly, that hypothesis to the model. The model tends to confirm rather than challenge it. The clinician experiences a sense of validation — while what they are observing is potentially a technical artefact.

This phenomenon was empirically documented by Garczynski et al. (2026), under the name "emotional reassurance": the four CBT psychologists interviewed spontaneously report that they use the LLM to "legitimise certain reflections" and that "when it converges with [their] reasoning, it confirms a bit".

The trap: when the LLM "converges" with the clinician's reasoning, it is impossible to tell apart an epistemic validation (the reasoning was indeed correct) from a sycophantic artefact (the model detected the hypothesis and confirmed it by default). Both produce the same subjective experience.

2. Consolidation of pathological content in the patient

In a patient using the tool autonomously (outside any therapeutic framework), a sycophantic LLM may validate delusional content instead of questioning it, reinforce anxious rumination instead of defusing it, or confirm catastrophising biases. This is the exact opposite of the Socratic questioning that grounds cognitive-behavioural therapy.

Clegg (2025, JMIR) reports that LLMs tested on scenarios simulating delusional content (persecutory, megalomaniacal) failed to challenge them. In one case, a model responded to "I am being watched by agents" by suggesting counter-surveillance strategies rather than recommending a clinical evaluation.

Sycophancy vs validation: two distinct problems

In our editorial "Sycophantic AI: Reframing the Debate", we argued that the critique of "sycophancy" is often poorly framed. We stand by that position — and this concept fact sheet completes the picture by distinguishing two phenomena that public debate conflates.

What our editorial discusses

Emotional validation — the fact that an LLM is respectful, empathetic, and available. Research (attachment theory, motivational interviewing, Porges) shows that this validation is a condition of change, not an obstacle to it. Blaming an AI for being "too kind" when the alternative is silence is sociologically naive.

What this fact sheet discusses

Technical sycophancy — the fact that an LLM sacrifices factual truth or justified judgement to obtain social approval. This is not the same thing as being polite or empathetic. A model can be polite without being sycophantic; a model can be sycophantic by asserting polite falsehoods.

The distinction is clinically crucial: validating an emotion ("I understand that you are in pain") is always legitimate. Validating a flawed line of reasoning ("you are right, your partner is indeed a narcissist" without sufficient evidence) is not. The sycophantic LLM does not tell the difference — it validates both.

Cross-reading: this fact sheet gives you the technical mechanism. The editorial gives you the clinical perspective. Together, they let you step out of the false dilemma "AIs are too kind" vs "AIs are dangerous".

What sycophancy is not

It is not a hallucination

Hallucination refers to a false output independent of the user. Sycophancy refers to a false output because the user requested or suggested it.

It is not an isolated error

Sycophancy is systematic and reversible at will: change the wording of the question and the model changes its answer. It is not a one-off defect but a stable behavioural pattern.

It is not politeness

A model that adapts its vocabulary to a child, that adopts a formal register when the user does, or that respects cultural sensitivities, is not sycophantic. Sycophancy strictly designates the sacrifice of factual truth in exchange for approval.

It is not an intentional choice

Talking about sycophancy does not assume intentionality. It is an emergent behaviour from statistical optimisation on human preferences — not the "will to please" of an agent who knows it is lying.

What it changes for your practice

  • Be wary of convergence: if the LLM reaches the same conclusion as you, it is not an independent confirmation. It may be the sycophancy mechanism that detected your hypothesis and validated it.
  • Test for disagreement: before trusting an answer, rephrase the question with the opposite hypothesis. If the model agrees with you in both cases, that is sycophancy.
  • Output review is not enough: output control (re-reading what the LLM produces) is necessary but structurally insufficient against sycophancy, because sycophantic content is designed to look correct. It confirms what you already thought — making it the hardest to detect.
  • Distinguish validation from compliance: an LLM that "understands" a patient's suffering is not the same problem as an LLM that confirms a diagnosis without sufficient evidence. The first is desirable. The second is dangerous. Sycophancy conflates the two.

Further reading

Foundational literature

  • Sharma, M. et al. (2023)Towards Understanding Sycophancy in Language Models. ICLR 2024. arXiv — Foundational paper. Formalisation across five models, identification of the causal RLHF mechanism.
  • Perez, E. et al. (2022)Discovering Language Model Behaviors with Model-Written Evaluations. arXiv — First demonstration of inverse scaling.
  • Chen, L. et al. (2025)Clinical sycophancy in GPT-4 and GPT-4o. npj Digital Medicine. — 100% compliance on medical misinformation.

On this site

All concepts

Fact sheet created: April 2026