The buzz about artificial intelligence (AI) just keeps getting louder. For the evidence-based research set, the potential of computer-based tools to automate screening and data extraction in systematic reviews has researchers excited about the possibilities – and maybe a little bit fearful of being replaced by this technology. But are the robots really poised to take over?
The idea of using AI-based tools to screen abstracts and to convert unstructured full text articles into easy-to-use codified data is not new. I began working with academic research groups in this field 2006 and, while our studies reported some success, a practical and reliable solution for real-world systematic reviews remained tantalizingly out of reach.
Here’s the problem: parsing complex unstructured text to extract very precise data is hard. NLP is good at finding words, phrases, and even concepts, but exact, context-dependent meaning remains difficult to extract reliably.
Worse, the consequences of erroneous, unchecked, machine extracted information can be dire. Imagine, for example, a pivotal reference accidentally excluded or the results of an irrelevant, but heavily powered, study arm ending up in your data set.
Are we asking too much?
While it’s great that we’re pushing hard to build better tools for systematic reviews, I believe it is also important to temper expectations around what we can realistically do with AI and natural language processing (NLP) today.
While available computing power continues to grow rapidly, it’s clear that real people are still needed to make intelligent decisions when screening references and extracting data.
In his Harvard Business Review article (“What Artificial Intelligence Can and Can’t Do Right Now” HBR, Nov 09, 2016), Andrew Ng, founder and lead of the Google Brain team, talks about an interesting rule of thumb: if a task will take a typical human one second or less to process, it’s a good candidate for AI automation. Taking this into consideration, asking AI to perform autonomous screening or data extraction may simply be asking too much.
Where do we go from here?
Fully autonomous screening and data extraction may not be here yet, but there are still some ways to incorporate AI and NLP into the systematic review process today to ease the growing burden on our review teams. Here are three ideas that our team is focused on:
#1 Prioritizing References
One of the benefits of using systematic review software is that reviewers are able to work on different levels of the review at the same time. For example, once a reference has been included at the Title & Abstract Screening level, it immediately moves on to Full Text Screening or Data Extraction and becomes available to anyone assigned to that level to work on.
Using AI to prioritize references by likelihood of inclusion can expedite this process further. With the most promising references at the top of the screening list, subsequent levels of the review can be well underway while title and abstract screening for the less promising references is completed.
#2 Sanity Check
As we learned in our 2017 Survey on Literature Reviews, time (or lack of it) is one of the biggest challenges faced by systematic reviewers today. A second set of eyes on each reference is not always feasible from a time – or money – perspective, but without agreement from a second reviewer, you could end up with inclusion or exclusion errors, compromising your results.
Luckily, an AI tool can be an effective “second screener” at the title and abstract screening stage. While each reference is first screened by a human, as second pass is made by the “robot” screener, with any disagreement between the two flagged for a closer look.
This approach achieves the benefits of human screening while leveraging AI to check for potential errors.
Similarly, AI can be used to conduct a post review audit to check for accidental exclusion of articles that should have been included.
By reviewing the references that the human reviewed, the AI can get a very accurate picture of what an included reference looks like. It can then review every reference that was excluded to check that none were excluded by mistake.
One of the compelling features of this approach is that it can be easily applied to any finished review, even if it was finished a long time ago.
#3 Automated Screening
It sounds futuristic, but we believe it is currently possible for AI to accurately screen large volumes of references – with a little help.
By using the first 50-100 references that you screen as a training set, AI-enabled systematic review software is able to build classifiers to allow it to reliably screen a subset of the remaining references. These systems use weighted scoring to determine likelihood of inclusion or exclusion so, never fear, if it’s unsure as to whether or not to include or exclude, you’ll have to make the final call.
Robots vs. Humans
Will systematic reviews ever be fully automated, eliminating the need for human intelligence altogether? We are actively working with like-minded groups on all of the concepts outlined above, but I think we’ve got a long way to go before the robots take over. For now, AI and NLP can play a valuable role as a reviewer’s aid, but they won’t replace reviewers completely…yet.
This blog post was originally published in November 2016 and was updated in January 2018 to reflect the current state of AI technology in the context of systematic reviews.