Artificial Intelligence for Therapeutic Supervision
There are countless ways to envision how artificial intelligence can help therapists with their clinical follow-ups and support them in facing the challenges of their practice.
Needless to say, this raises myriad ethical, legal, and practical questions, and it is imperative that everyone engages with these issues. What better way to weigh the concrete stakes than to test on fictional data?
💎 Session Recordings: A Gold Mine
Since the COVID-19 crisis, a great many therapists have been conducting remote sessions, which notably facilitates recording sessions for therapeutic purposes (it can be very enlightening for patients to review the video before the next session).
These recordings are gold mines of information, much of which escapes us due to our limited attention capacity and lack of time to review the sessions ourselves.
🎯 Choosing an AI Model with the Right Capabilities
The first challenge is finding an AI model that has the required capabilities to assist us.
1. Context Window Size
First and foremost, the model must have a context window large enough to hold the entire transcribed text along with the questions we will ask and the answers it will generate during the conversation.
In practice: The text of a typical CBT session ranges between 10,000 and 15,000 tokens. In early 2024, many free models had a context size of less than 10,000 tokens — too small to analyze a complete session.
2. Multimodal Capabilities
Originally, well-known AIs like ChatGPT or Claude are Large Language Models (LLMs), meaning AIs with tens of billions of artificial neurons trained on vast amounts of text.
Now, more and more AIs are also trained on images, audio, and video in addition to text, thus offering multimodal capabilities. This can be extremely valuable in a therapeutic context for analyzing nonverbal behavior and identifying concordances or discordances between what the person says and what their body communicates.
Note: Context window size becomes even more critical when processing data such as audio or video, which are very data-intensive modalities.
3. Inference Quality (Reasoning)
Obviously, we also need an AI capable of high-quality reasoning (the technical term is inference). Numerous factors influence an AI's reasoning ability:
- Its internal architecture (the structure of the artificial neural network)
- The time allocated to the reasoning process, often limited for resource efficiency reasons
Concretely: An AI allowed to "think" (compute) for a minute will produce better results than the same AI forced to respond in 30 seconds (much like therapists...).
4. Access to Temperature Settings
Temperature is a technical parameter that influences how a generative AI selects among the tokens it considers most likely for an appropriate response.
Low Temperature
Always selects the most probable token. Preferable for more scientific and analytical approaches.
High Temperature
Selects other less probable tokens, producing more creative responses. Relevant for brainstorming.
Most consumer-facing interfaces do not provide access to this parameter.
5. Cost
From one AI to another, costs are calculated differently, and analyzing a session can cost anywhere from nothing to several tens of euros depending on the chosen model and the publisher's business model.
- Some charge a subscription and limit the number of queries per hour
- Others charge per token submitted and per token generated
- Costs can vary between consumer interfaces and APIs intended for third-party software
Trend: There are more and more open-source AI models that can run directly on our personal devices. Costs should therefore become marginal over time.
6. Generalist or Specialized AI?
Although we are still in the early stages, there are already AI models specialized in the medical field (such as Med-Gemini), and these already outperform most healthcare professionals on certain tasks.
However, these models are not publicly accessible and are undergoing intensive testing with selected professionals. Generalist AIs, provided you know how to use them correctly by giving them the right context and the right prompts, are nonetheless already very performant.
See the Differences in Practice
To concretely illustrate these differences between models, we created a comparative benchmark: the same CBT session analyzed by 3 different LLMs, with 43 clinical analysis questions.
Explore the benchmark