The APA Model: A Framework for Evaluating Mental Health Apps
Over 10,000 mental health apps on the stores, but only 15% with clinical evidence. The APA model offers a 5-level framework to help clinicians navigate the landscape.
Over 10,000 mental health applications are available on app stores. Only 15% have clinical evidence, and 44% share user data with third parties. In this digital wild west, how can a clinical psychologist responsibly recommend — or advise against — an app to a patient?
The American Psychological Association (APA), through the work of John Torous and his team at the Beth Israel Deaconess Medical Center (Harvard), has developed a hierarchical 5-level evaluation model. Not a certification label, but a shared decision-making tool between clinician and patient.
The 5 Levels of the Model
The model works as a funnel: each level filters out apps that fail to meet essential criteria. There’s no point checking the clinical efficacy of an app (level 3) if it doesn’t protect your patients’ data (level 2).
Accessibility and Context
The basics are checked here: cost, supported platforms (iOS, Android), language, offline functionality. But also the development context: who built the app? Academic institution or venture-funded startup? Is the business model transparent?
Critical point:
This level also evaluates crisis management: what happens when a user expresses suicidal ideation? Many popular apps offer no emergency protocol whatsoever.
Privacy and Data Security
The most discriminating level. Is data encrypted? Shared with third parties? Can users delete their data? In Europe, GDPR compliance adds an extra layer of requirements beyond the American HIPAA framework.
Key figure:
81% of popular mental health apps lacked adequate privacy policies (2019 study). Replika was banned in Italy in 2023 for insufficiently regulated data collection — a textbook case.
Clinical Evidence
Is the app backed by scientific data? The model proposes an evidence hierarchy: randomized controlled trials (RCTs) > cohort studies > pilot studies > expert opinion.
Concrete examples:
Woebot has multiple published studies showing reduction in depressive symptoms. Headspace has been evaluated for stress reduction. Conversely, the majority of “mental wellness” apps have no scientific publications to their name.
Engagement and Usability
A clinically validated but unusable app serves no one. This level evaluates design, personalization, notifications, and retention rates.
Caution:
Some apps maximize engagement through mechanisms borrowed from social media (gamification, streaks, push notifications). When engagement stops serving care and starts creating dependency, the ethical line has been crossed.
Interoperability and Integration
The most ambitious level: can the app integrate into an existing care pathway? Export data to a patient record? Allow secure sharing with the therapist?
In practice:
Very few apps reach this level. The open-source mindLAMP platform (Harvard) is one of the rare ones offering interoperable architecture designed for clinical practice.
Practical Guide: Questions to Ask
Before recommending an app to a patient — or evaluating one they’re already using — here are the essential questions, level by level:
- Who developed this app? An institution, a startup, an unknown entity?
- Is the business model transparent (free, freemium, subscription)?
- What happens in a crisis? Is there an emergency protocol?
- Is data encrypted and stored locally (or in the EU for European users)?
- Is it shared with third parties (advertisers, insurers)?
- Can the patient delete their data at any time?
- Are there independent scientific publications?
- Are they RCTs, pilot studies, or just testimonials?
- Have results been replicated by other teams?
- Does the app use retention mechanisms (gamification, streaks)?
- Can the patient use it without becoming dependent on the app?
- Does the design serve care or commercial engagement?
- Does the app allow data sharing with the therapist?
- Can it integrate into an existing care pathway?
- Is data exportable in a standard format?
For a more thorough evaluation, the APA provides the MIND database (105 structured questions) and a rapid 8-question screener usable in consultation.
Limitations of the Model
The APA model is a valuable step forward, but it has blind spots:
-
No relational dimension: the model evaluates the app as an isolated technical object. Yet what unfolds between a patient and an emotional support app also involves relational dynamics — and that’s precisely our expertise as clinicians.
-
Pre-dates the LLM explosion: designed before the generative AI wave, the model doesn’t address the specific challenges of these technologies (hallucinations, response variability, opacity). How do you evaluate an app whose behavior is probabilistic and changes with every interaction?
-
Geocultural bias: centered on the American context, the model doesn’t account for European specificities (GDPR, public healthcare systems, linguistic plurality).
-
Voluntary evaluation: unlike the European CE medical marking or the German DiGA system, the APA model is non-binding. It relies on developer goodwill and clinician vigilance.
Our Position
This model is, to our knowledge, the most comprehensive framework for helping clinicians evaluate a mental health app. Its hierarchical logic — don’t go further if the foundations aren’t solid — is simple and operational.
But it should be seen for what it is: a questioning tool, not a compliance certificate. The fact that an app “passes” all 5 levels doesn’t guarantee it suits this patient, in this therapeutic context, at this point in their journey. That’s where clinical judgment takes over.
We previously discussed the APA’s ethical recommendations for AI use in practice in a previous article. The app evaluation model is its natural complement: where the ethical guide sets the principles, the model provides a method.
Main source: Torous, J. et al. — American Psychiatric Association App Evaluation Model, MIND database. See also our concept pages on Digital Phenotyping and Ecological Momentary Assessment, two approaches this model helps evaluate concretely.
Mots-clés