Crucial Challenges Of Using Waitlist Controls When Performing Empirical Research On AI And Mental Health
In today’s column, I examine the crucial challenges of using waitlist controls when performing empirical research regarding AI and mental health.
This is a vital topic since it pertains to the potential thoroughness associated with methodologically and scientifically determining whether and how AI usage can impact human psychological well-being. Researchers are increasingly opting to use waitlist controls in their research studies on AI and mental health. As such, it is markedly important to have sufficient awareness of the ins and outs concerning waitlist controls.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here ).
As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For an extensive listing of my well over one hundred analyses and postings, see the link here and the link here .
There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors, too. I frequently speak up about these pressing matters, including in an appearance on an episode of CBS’s 60 Minutes , see the link here .
AI Providing Mental Health Guidance
Millions upon millions of people are using generative AI as their ongoing advisor on mental health considerations (note that ChatGPT alone has over 900 million weekly active users, a notable proportion of which dip into mental health aspects, see my analysis at the link here ). The top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets; see my coverage at the link here .
This popular usage makes abundant sense. You can access most of the major generative AI systems for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.
There are significant worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Banner headlines last year accompanied the lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement.
Today’s generic LLMs, known as general-purpose AI, such as ChatGPT, GPT-5, Claude, Gemini, Grok, CoPilot, and others, are not at all akin to the robust capabilities of human therapists. Meanwhile, specialized LLMs are being built to attain those desired qualities, though such AI is still primarily in the early development and testing stages. For more about purpose-built AI apps in mental health, see my in-depth coverage at the link here and the link here .
Gauging Impacts Of AI On Mental Health
Researchers in psychology are increasingly trying to ascertain the impact of AI usage on human mental health. Society desperately needs bona fide, insightful research on this significant matter. Hand waving is not sufficient.
It is generally assumed that if AI is suitably devised and utilized appropriately, mental health will improve, while if AI is poorly devised or haphazardly used to obtain therapeutic advice, the result will be harmful to mental well-being. But making such an all-encompassing off-the-cuff assumption about the expected impacts is not particularly reassuring. We need to vigorously lean into robust scientific methods to thoroughly study the profound human-AI effects.
Fortunately, the volume and depth of robust empirical studies on AI and mental health are rapidly expanding. Experiments are being carried out that consist of a classical setup of a treatment group and a control group. The use of AI is typically considered the treatment and is applied to a chosen set of human subjects. This treatment group will hopefully reveal whether the treatment is having a detectable effect. For more about the use of RCT (randomized control trials) in AI and mental health research, see my coverage at the link here and the link here .
To showcase the presumed effect, a control group is established as a basis for comparison to the treatment group. There are numerous ways to compose a control group. One way that is gaining popularity consists of using a wait-list control. In brief, subjects in the control group are put on a waiting list to later receive the same treatment as the treatment group. Meanwhile, their “do nothing” during the waiting period serves as a comparison to the treatment group, plus experimenters can assess possible additional impacts once the wait-list group finally undergoes the treatment.
Let’s momentarily back up and think widely about control groups.
When conducting empirical research on AI and mental health, there are numerous choices that can be made about the design of a control group. First, realize that the treatment group is presumably going to be asked to use AI as the principal treatment element. The control group, then, is going to serve as a fundamental comparison to the treatment group.
The open question is what the control group should be doing or making use of, to allow researchers to render a sensible and valuable comparison to the treatment group?
Here are some of the commonly utilized control groups in this realm:
- Online content seeking control group. A control group of users does online searches about mental health and dips into the content found online containing mental health advice.
- Book reading control group. A control group of users is given printed materials such as published books and pamphlets about mental health.
- Human therapy control group. A control group of users gets mental health counseling from a human therapist during the experiment.
- Human-to-human groupwise control group. A control group of users gets mental health advice from other non-therapist humans in a group setting, such as via an online social network intentionally devised for the experiment.
- Expert systems control group. A control group of users who make use of a conventional rules-based expert system rather than using an LLM.
- General-purpose AI control group. A control group of users who make use of general-purpose AI for mental health guidance, rather than a purpose-built AI being used by the treatment group.
- Purpose-built AI control group. A control group of users who make use of purpose-built AI for mental health, while the treatment group uses general-purpose AI.
- Waitlist or delayed treatment control group. A control group of users who are not assigned any specific task or usage right away, other than a blanket “do nothing”, meanwhile, the treatment group proceeds; subsequently, the control group is asked to make use of the same treatment as had been applied to the treatment group.
- Other types of control groups. Various other control group settings can be established, including having multiple control groups at the same time (doing a mix-and-match of the above).
There are tradeoffs underlying each of the possible control groups.
Please realize that the choice of which control group to use is not an especially right or wrong decision. Any designed control group structure incurs its own semblance of advantages and disadvantages. The mainstay entails being aware of the pros and cons, being on the watch for them, along with making sure to disclose those tradeoffs when presenting the results of the research.
Tradeoffs Of Waitlist Control Groups In This Context
Take a reflective moment to mull over the potential tradeoffs of using a waitlist control group in the context of an AI and mental health experimental design.
You undoubtedly identified the foremost factor, namely, the handy, straightforward contrast that seems to arise by using a waitlist control group in this setting. The treatment group is using AI for mental health advice. The control group, meanwhile, is not using anything. The starkness is bound to be illuminating. You’ve got the AI intervention versus the no immediate intervention as vividly differing sets.
Another nice feature is that the control group will inevitably get to use the AI and therefore receive the same treatment as the treatment group (after the designated waiting period). This is nice because the control group will seemingly get the same benefits of the AI usage as the treatment group. If the control group was never tasked with using the AI, you might argue that they are left with nothing in hand. They participated in the experiment but didn’t get anything personally useful by doing so.
In general, recruiting subjects to participate in these types of research studies can be tricky, particularly if they suspect or know that some subjects will get to use the AI and others will not. The ones who anticipate that they won’t be chosen to use the AI are likely to take a dim view of participating. There doesn’t seem to be a payoff for them. The excitement of being able to use the AI could be a notable means of securing subjects for these experiments.
The same logic applies to the retention of subjects during the experiment. A member of the control group might drop out if they don’t have any cognizant reason to remain in the study. By dangling the promise of being able to use the AI (eventually), those subjects in the waitlist control group are given an added incentive to remain engaged in the study.
One of the well-known downsides of waitlist controls as an experimental design is that they tend to exaggerate the treatment effects. The control group knows that they are waiting. This can foster a nocebo-like disappointment. They are in limbo. On the other hand, the treatment group is getting pizzazz.
In this context, the treatment group is using AI. They likely feel excited about doing so; it is a novelty, they are getting attention from the researchers, and so on. The control group is twiddling its thumbs. It doesn’t seem fair. They get anxious to take their turn.
The experimental comparison also begins to drift away from the conception that the “AI works” to instead be that it is simply AI versus doing nothing at all. The real world isn’t that way, especially since the subjects in the control group are purposely avoiding anything resembling mental health advice, including not talking to friends about mental health aspects. This is a somewhat misleadingly crafted circumstance.
The “do nothing” is a difficult beast to corral. Do you tell the subjects of the control group to completely avoid any kind of mental health guidance, whether talking with others, looking online, or whatever? Sure, maybe. But that doesn’t reflect what people tend to really do in their daily lives and the real world.
Okay, you then decide that the subjects in the control group can do whatever they ordinarily do to get mental health advice. Oops, this is an issue because some might be using AI. Some might be looking up content on the Internet. The subjects in the control group will be all over the map, though you are claiming or categorizing them as all seemingly doing nothing.
You can plainly see that the waitlist control scheme can make-or-break what the results showcase. It could be that the AI usage undertaken by the treatment group turns out to be marginally different if the subjects in the control group are secretly using AI anyway. An experimenter would falsely assert that the treatment group wasn’t materially impacted in comparison to the “do nothing” control group, though the reality is that the control group was under-the-hood using AI all along on their own.
It can be a darn if you do, darned if you don’t situation.
Understanding The Comparator At Play
I mentioned that each of the control group structures has its own respective tradeoffs. Consider what that means. First, in theory, the waitlist control is going to compare AI usage to doing nothing. If you instead were to use a control group that, for example, consists of control subjects getting explicit human therapy during the experiment, you are comparing AI-based mental health guidance to human therapeutic activity.
Which approach tells you more about the potential impacts of using AI for mental health guidance?
Well, you would be hard-pressed to insist that one of those approaches is altogether better than the other. The experiment using a control group consisting of human therapy has upsides and downsides. Was the human therapy provided consistently, or did it vary across the subjects? How much human therapy did they get? All sorts of problematic concerns arise.
The gist is that no matter which control group approach you select, you must be mindful of what you are buying into. Think about how to best use the chosen approach. Watch for signs of issues along the way. Make sure to clearly report on the upsides and downsides as encountered in your experimental efforts. Doing so will be more aboveboard and allow for reasoned and serious discourse on the clinical, regulatory, and policy sides of gauging the use of AI for mental health.
Waitlist Controls Deserve Their Attention
Just in case any trolls might suggest that this clarification about waitlist controls is somehow a bashing of waitlist controls, I certainly hope that any reasoned viewpoint would see that the emphasis was on the tradeoffs of all types of control group structures. The idea is to ensure that those using waitlist controls do so wisely and that those consuming studies that are based on waitlist controls are fully aware of what to watch out for in such studies.
In the end, I want to emphasize that waitlist controls are a bona fide approach and are absolutely welcomed in the realm of AI and mental health. The welcome mat is there. As per the famous words of Franklin D. Roosevelt: “It is common sense to take a method and try it; if it fails, admit it frankly and try another. But above all, try something.”
For all you researchers out there, keep going and doing great work that will help society and humankind grasp the significance of AI in mental health. That’s certainly better than doing nothing, and, for the sake of humanity, you’d be doing something.
Loading article...