Systematic reviewers are prone to experiencing “déjà vu”, that indescribable gut feeling that you’ve done the exact same thing (or screened the same abstract) at some point in the past. Particularly if you specialize in a certain research topic, you may encounter the same references turning up over and over from one review to another.
To add to this frustration, new research is being published at an alarmingly fast pace, and it can seem like an insurmountable task to keep your reviews up to date – especially when you’re wasting time reading articles you’ve screened before.
More and more, we are pinning our hopes on artificial intelligence (AI) and automated screening and data extraction as the cure for “systematic review déjà vu” and for staying ahead of the onslaught of new evidence. Without a doubt, machines are learning to be better at screening and data extraction, and our team is one of many research groups actively working to grow the role of machine learning in the review process.
That said, I believe that if we put the tools and policies in place to collaborate on a larger scale we can not only outpace the machines, we can deliver more reliable results.
Humans are awesome natural language processors and we are collectively screening, classifying and coding thousands of references every minute of every day. In DistillerSR alone, people are coding and classifying well over 1 million references every month.
Given the efforts of DistillerSR users and others, it is safe to say that humans are actually classifying references and papers faster than they are being produced.
Problem solved. Right?
Partially – but in order to outright win the race against the never-ending stream of new publications, we (and I’m talking about the global “we” here) need to focus on three things:
A Standard Lexicon
Online repositories such as PubMed already provide standard field names for the most common reference attributes (e.g. title, author, journal, etc). We need to go far beyond that to capture a much richer set of attributes such as study type, interventions, subjects in the control group, patient demographics, etc.
By creating a standard lexicon, we will be able to merge coded data (i.e. data collected during the screening and data extraction process) generated by different groups to create living repositories of coded references – think PubMed, but with all of the study data broken out into field or tabular formats.
Of course, we can use computer-based tools to make this process easier. Screening and data extraction software, like DistillerSR, can be used to translate data captured in forms into coded reference fields. Additional tools could accept coded reference sets, in a pre-defined format, from multiple sources and merge them to create larger, living repositories.
Lastly, and ideally, we could then put a powerful search interface on the front of these repositories to allow other researchers to conduct very precise searches and extract references with their key data already extracted into fields or tables.
The Willingness To Share
This last key to success is critical. If we are going leverage the incredible amount of work that researchers are already doing to create coded references, we need to work together to collate, curate and share these valuable collections of material.
To make this feasible, I believe we need to continue to develop technology-enabled platforms that support exactly this type of initiative, on a global scale.
If the research community can come together to achieve these three things, we could end up with something extraordinarily powerful: a global repository of fully coded references.
What’s In It For Me?
Everyone benefits from helping to build a global coded reference repository. Here are just a few of the things it will enable:
Searching against coded data, rather than unstructured abstracts, allows for more precise searches, resulting in less screening. Instead of wading through piles of “maybe” articles to determine whether your specific inclusion criteria have been met, you would be able to specify details such as patient gender, age, or location right in your search.
For example, you could search for references discussing RCTs, conducted in the past 2 years in the United States and the UK, on shock wave lithotripsy, where all of the patients were male and over the age of 40 – a level of precision that’s not possible with traditional reference repositories.
Less data extraction
If all or some of the data needed from a study has already been extracted, it does not need to be extracted again, saving time and resources.
Instant scoping studies
Using highly precise searches, like the one noted above, it becomes a simple process to evaluate the state of research on a given topic.
The above ideas are not new, but we have only recently reached a point where they are becoming feasible. As a result, many groups are now engaged in helping to make the fabled coded reference repository a reality.
I still believe that AI and natural language processing will have a growing role in the review process and I am actively involved in developing these technologies. In the near term, however, people are still better reviewers than machines AND we can easily leverage work we’re already doing to build a valuable, shareable asset.
Remember, if a reference has been coded once by a reliable source, it should never have to be coded again. If we do this right, staying ahead of the tsunami of newly published papers is a very achievable, and worthwhile, goal.
As for that reference that you’ve screened and coded a million times? You’ll never have to read it again!