Veille IA

What four CBT psychologists really do with ChatGPT

The study conducted by Luc Garczynski at Université de Montréal interviewed four Quebec CBT psychologists about their actual uses of ChatGPT in their practice. We jointly present what he found: a spontaneous three-step protocol, a systematically peripheral use, an invisible prerequisite — and two phenomena that no one had anticipated.

Why this article, why two voices

Studies on what LLMs could do in psychotherapy number in the hundreds. Studies on what psychologists actually do with these tools in their practice number close to zero.

This article presents the results of a qualitative study conducted by Luc Garczynski at Université de Montréal — and it is co-signed by the two people best placed to draw the consequences: the one who conducted the study and knows the material from the inside, and the one who has accompanied him on the editorial and critical side since the beginning of the collaboration.

The exercise is unusual: commenting on a study whose author is sitting at the same table. We assume this position. It has an advantage that classic decryptions do not have: Luc can say what he saw and what surprised him, and Matthieu can say what it implies for the clinical reader and what it does not prove. When our perspectives diverge, we say so.


The study in brief

Lead author: Luc Garczynski (Université de Montréal, PSY6008, Winter 2026 session).

Method: Interpretive Descriptive Research (IDR, Thorne 2016) — semi-structured interviews with four CBT psychologists in private practice in Quebec. Mixed coding: deductive on the five application axes that the authors formalised from the Stade 2024 framework, inductive for emergent phenomena.

Participants: four psychologists, all in CBT, all in private practice, all actual users of ChatGPT for at least six months. This is not a representative sample of the profession — it is a sample of practitioners who have already taken the leap.

Luc Garczynski

This course project gave me the opportunity to ask questions I had been asking myself for some time. What interested me was understanding the concrete configurations of use — not “do psychologists use AI?” but “how do they go about it, concretely, when they use it?”. The Stade framework gave us a grid, but the most interesting findings are the ones that emerge from it.

Methodological warning: this study documents uses among clinicians who already integrate LLMs successfully. It says nothing about clinicians who tried and gave up, nor about those who refuse the tool. We are aware of this — and we return to it in the “Limits” section.


What they actually do: five families of use

The uses described by the four participants are distributed across the five application axes that Garczynski formalised from the Stade 2024 framework. The interview guide was built on this formalisation — so it is not surprising to find the five axes again. What is interesting is the level of detail with which each axis materialises in actual practice.

Documentation and administrative work (Axis 1)

This is the most frequent use, the least risky, and the most immediately profitable. The psychologists use ChatGPT to draft progress notes, synthesise sessions, structure reports. One of them estimates that the time spent on notes is “divided by three or four”.

“Yes, in the end, it will produce something. But that something, I systematically check it. I never tell myself ‘ah, manna from heaven, perfect, I paste this into the chart and thank you, goodbye’. I go over it again, I correct, I clarify, I rephrase if needed.”

— Participant 3

The gain is not only in time: it is in cognitive load. Administrative writing is often experienced as an aversive task that nibbles at the available clinical energy.

Training and protocol fidelity (Axis 2)

Less spontaneously described, but present: the use of the LLM to review manualised protocols, simulate clinical situations, verify the fit between an intervention and the theoretical model. One participant describes using ChatGPT to recall the steps of an exposure protocol he has not practised for a long time.

This is a discreet but potentially important use: it reflects a demand for fidelity to EBPs (evidence-based practices) that the LLM can support, without substituting for supervision or for clinical judgement.

Production of patient-facing content (Axis 3)

The participants use ChatGPT to personalise psychoeducational materials, adapt cognitive restructuring exercises, produce explanatory sheets adjusted to the patient’s level of understanding. Personalisation is the distinctive benefit compared to standardised materials.

But it is also the axis that raises the most control questions: a material handed to the patient engages the clinician’s responsibility.

Clinical decision support (Axis 4)

This is the most sensitive use. Two participants describe using ChatGPT to explore diagnostic hypotheses, confront their case conceptualisation, verify intervention paths. The tool does not act as decision-maker but as a cognitive interlocutor — a structured mirror against which the clinician tests their own hypotheses.

It is also here that the most fertile phenomenon of the study emerges — “emotional reassurance” — which we will return to later.

Between-session support (Axis 5)

The least documented axis in this study. None of the participants reports a setup where the patient would themselves use an LLM between two sessions under clinical prescription. This is consistent with the central observation: use remains on the therapist’s side, not on the patient’s side.


The spontaneous protocol: three steps, one reflex

The most directly transferable result of the study is a three-step procedure that the four participants developed independently, without coordinating with each other and without finding it in the literature.

1

Input control: systematic anonymisation

Before any transmission to the LLM, clinicians remove patient identifiers. This is the first barrier — imperfect from a legal standpoint, but systematically applied.

2

Structured prompts: stable templates

The participants do not improvise their prompts. They use templates progressively stabilised, derived from OPQ requirements — the deontological norm structures the prompt, not the other way around.

3

Output control: iterative re-reading

None of the four participants uses LLM outputs without re-reading them in full, correcting them, and rephrasing them. The re-reading is not a glance: it is an iteration.

Luc Garczynski

What struck me is that the four did this without coordinating. They do not know each other, they have not read the same things, and yet they arrived at the same triptych. For me, that says something about the tool itself: when you use it seriously, you quickly understand that you cannot fail to control the input, frame the prompt, and verify the output. It is almost the natural minimal structure of a professional use.

Matthieu Ferry

This triptych is an operational translation — rudimentary but functional — of the “human supervision” principles that the literature formulates in abstract terms. That is its strength: it was born from practice. It is also its weakness: it relies on the clinician’s individual vigilance. What happens when the clinician is tired, rushed, or in cognitive overload is not documented — and that is precisely where sycophancy becomes a problem.


The invisible prerequisite: being already solid before using the tool

A finding runs through the four interviews: all the participants who integrate LLMs successfully already had a mature professional foundation. Confirmed clinical expertise, internalised deontological training, an already-digitised environment, technological ease established outside clinical practice.

“In my training, there was explicit work on deontology, not just ‘a course’ in passing.”

— Participant 4

The tool does not come to compensate for inexperience: it inserts itself into an already structured practice. And the integration follows a trajectory that Luc observed in all four participants, without having suggested it to them.

Luc Garczynski

They all started by using AI in their daily life, without necessarily even reaching a very theoretical knowledge of how it works. The experiential rapport — where you see the flaws, the moments when it gets things wrong, the moments when it is good — that is still a nice edge. And by the way, it is the same for me: I used AI a lot for myself before considering a clinical dimension. I think it is a prerequisite that the literature does not name enough.

This raises a question that the study cannot address with its sample (n=4, all experienced): what happens when a less experienced clinician uses the same tools without the same foundation? Does the input/prompt/output triptych still work when the output control is exercised by someone who lacks the competence to detect a subtle clinical error?


The unbreached frontier: no one integrates AI “at the heart” of the process

This may be the most striking result. Despite varied and regular uses, none of the four psychologists integrates the LLM at the heart of the therapeutic process itself. AI is used to prepare, document, structure, verify — it is not used to conduct the intervention, to rephrase in session, to guide an exposure exercise, to structure a real-time patient-therapist interaction.

The integration remains peripheral — and the participants do not experience it as a lack.

Luc Garczynski

This is a result I had not anticipated so clearly. I was expecting to find at least one case of integration “at the heart of the process” — a clinician using AI during the session, or prescribing a structured use between two appointments. No. Not one out of four. They are all on the periphery. For me, that means two things: either we are still at the very beginning of adoption and the heart will come later; or there is something in the therapeutic relationship that structurally resists this integration. That is a question I am taking with me into my doctoral thesis.

The open question: is this frontier the sign of an early adoption stage (the heart of the process will be reached later) or of a structural limit (some dimensions of care resist algorithmic assistance)? The study does not say — but the observation deserves to be taken seriously before prescribing deeper integrations.


Two phenomena that no one had anticipated

The study identifies two entirely inductive categories — not anticipated by the interview guide, absent from the Stade framework, emergent from the material itself.

”Emotional reassurance”

The four participants spontaneously report a use that was not anticipated: mobilising the LLM in moments of intense clinical doubt to obtain a confirmation that their reasoning is coherent. This is not a cognitive use in the strict sense — it is an affective use.

“Often, I arrive already with a fairly clear idea, but I am in doubt. I think I also use it to legitimise certain reflections, or to untangle when it gets more complex. And when I see that it converges with my reasoning, that it reaches similar conclusions, it confirms a bit, and it reduces uncertainty.”

— Participant 2

Luc Garczynski

This is the category that surprised me the most. It was not in my interview guide, not in the Stade framework, not in the literature I had read. The four described it spontaneously, each in their own way. For me, that says something important: in moments of high emotional load — a violence case, a doubt about a diagnosis — the clinician is not so much looking for alternative reasoning as for a validation. And AI gives it very easily. The question is: is it a good validation?

Matthieu Ferry

This is the question that opens up a research programme. If LLMs are structurally trained by RLHF to approve the user, then “the AI converges with my reasoning” can be a technical artefact as much as an epistemic confirmation. Emotional reassurance would then be the clinical signature of sycophancy, seen from the practice. We will devote an entire article to it.

The “professional taboo”

The four participants describe a climate of discomfort about discussing their LLM use among colleagues. The subject is rarely raised in supervision or among peers.

“Honestly, it is a delicate subject. We talk about it little among ourselves, even among colleagues, so we stay discreet. And the OPQ is a bit in the background, with the idea that it can be misinterpreted, even exposing oneself to a complaint.”

— Participant 4

Luc Garczynski

All four spoke of it. It is the only theme where there was complete saturation — each formulated it differently, but the idea is the same: they use AI, they think it is useful, but they do not dare talk about it openly. And what strikes me is that this isolating discretion prevents exactly what we would need: collective regulation, an exchange of practices, a construction of shared landmarks.

This taboo produces a defensive discretion: practices unfold individually, without submission to the critical gaze of peers — conditions exactly opposite to those of a healthy collective regulation. We will also devote a dedicated article to it.


What it changes for your practice

1

If you already use an LLM, check your triptych

Input control (anonymisation), structured prompts (templates), output control (iterative re-reading). If one of the three is missing or weakened, that is a breach. This protocol is not perfect — but it is the minimal floor that practitioners who integrate the tool have empirically converged towards.

2

If you do not use an LLM, do not feel behind

Successful integration relies on a mature professional foundation, not on technological enthusiasm. Strengthening your deontological framework and your basic clinical competencies first is probably more productive than throwing yourself at ChatGPT.

3

If you are a trainer, think “procedure” before “tool”

Safety lies in the procedure, not in the tool. Training clinicians to “use ChatGPT well” makes no sense if you do not teach them to structure their inputs, formalise their prompts, and exercise rigorous output control.

4

The fundamental question remains open

This study documents peripheral integration. Integration at the heart of the therapeutic process is not documented, and no one yet knows whether it is desirable, under which conditions, for which patients, with which safeguards. That is the research frontier — and that is where Luc’s thesis is heading.


The limits we acknowledge

We are not in the comfortable position of an outside commentator listing the weaknesses of a work foreign to him. Luc is the lead author of the study, Matthieu has been collaborating with him since March 2026 on a series of articles drawing on this material. The limits we identify are therefore also those of our own work.

1

Small sample, biased toward success

Four participants, all volunteers, all young CBT practitioners at ease with digital tools. This is a course project format (PSY6008, UdeM), not a journal article. Clinicians who tried and gave up are not in the data. Luc knows this — it is one of the reasons his doctoral scoping review aims at a much broader empirical base.

2

Single theoretical framework and circularity

The interview guide is built on the five axes that the authors themselves formalised from the Stade 2024 framework. It is not surprising to find these axes again. The inductive categories (emotional reassurance, taboo) that emerge despite the framework are the most interesting — they point to the limits of the grid. For both of us, this is the main methodological lesson.

3

No patient perspective

The four interviews are conducted with clinicians. The patient’s voice is entirely absent. We know what therapists do with AI — not what patients perceive of it, nor what it changes for them. This is a blind spot that the rest of Luc’s thesis will need to cover.

4

No effectiveness measure

The study describes uses, not results. It measures neither the impact on care quality, nor patient satisfaction, nor clinical errors avoided or produced. In terms of the Hua framework, this is T2 (feasibility), not T3 (clinical effectiveness).

Why we still take this material seriously: because in a field where almost the entire literature is either technical (T1 benchmarks) or intentional (“clinicians would like…”), a study that documents what clinicians actually do deserves attention — provided that its conclusions are calibrated to the actual level of evidence. That is what we try to do, in two voices, in this article and the ones to follow.


Study presented: Garczynski, L. (2026). Intégrer les modèles de langage en psychothérapie — usages réels et repères de pratique chez des psychologues TCC. Final project PSY6008, Université de Montréal. Unpublished document.

On this site:

Partager