A Closer Look at Natural Language Processing in Systematic Reviews

For some, the hesitation to trust artificial intelligence (AI) in systematic reviews comes from not fully understanding the technology that powers it. For example, when we say DistillerSR uses “natural language processing,” what does that really mean in practical terms? In this post, we’re taking a closer look at natural language processing in systematic reviews, how it works, and why it’s changing the systematic review landscape for the better.

A Little Background: Types of AI

AI has been in the works since the dawn of the computer. Scientists and computer programmers have always wanted to design a computer that can learn and think for itself, deciphering complex problems such as the answer to life, the universe, and everything (the answer, of course, is 42).

Real-life AI can be broken into several categories depending on the goal you need to achieve as well as the programming that powers it. The main types of AI are:

Reactive Machines: Code-based automation that does not use past or present information to learn or make decisions. It’s only responsive, as the name implies.
Limited Memory: Derives knowledge from stored data and events to build its knowledge base.
Theory of Mind: Defined by the ability to perform decision-making to the same extent as humans.
Self-Awareness: Describes an AI that exhibits human-level consciousness. Does not currently exist with today’s technology.

The technology we’re talking about today is a subset of these types: natural language processing.

NLP in a Nutshell

Natural language processing (NLP) is a type of AI that enables a computer to extract data from large amounts of unstructured text such as reports, literature, articles, etc. Before NLP was invented, computers could only extract and analyze structured data in the form of databases, codes, and spreadsheets. Now, they use a comprehensive pipeline to find context and data from unstructured text.

NLP is a subfield of AI that is part of many common tools we have been using for years. Consider tools such as spelling/grammar checkers, spam filters, search engines, which are all powered by some form of natural language processing.

A Closer Look at How NLP Works

NLP works by taking a big goal and breaking it into much smaller tasks for the computer to complete step-by-step. In order to understand how NLP works, you have to go way back to grammar school. NLP pipelines take an organically constructed sentence and approach it using logic, breaking it down to its root and enabling the computer to analyze it as data. Let’s take a look at what a generic NLP pipeline looks like:

Step 1: Segment the sentences in your document.

Step 2: Within the sentences, segment the individual words. This is known as “tokenization.”

Step 3: For each word, identify its Part of Speech (noun, verb, adjective, pronoun, adverb, preposition, conjunction, interjection). This is challenging because some words have more than one meaning and can fall into more than one Part of Speech category. For example, in this case, the word “the” is actually an adjective!

Step 4: Lemmatization, which is the process of finding the root, unconjugated version of a word.

Step 5: Identify and remove the “stop” words. Stop words include the, a, to, and, etc.

Step 6: Dependency parsing, which is the process of measuring how each word in a sentence relates to each other. In NLP, this means creating something called a “Parse Tree” that connects each word and illustrates how they are connected. Alternatively, if your goal is to extract general ideas from the text rather than details, some NLPs use something called “Noun Phrasing” to accomplish parsing more simplistically.

Step 7: Named Entity Recognition classifies proper nouns in the sentence. This step identifies things like people, places, names, dates, events, and more.

Step 8: Conference Resolution is the last and most difficult step. In human language, we use pronouns to describe things we have already named. He, she, it – these are all words we use to describe things. NLPs need to parse these words to make more connections and develop context in text.

This is a general idea of how NLP takes organic language and transforms it into data that a computer can understand and analyze.

How does Natural Language Processing Work in Systematic Reviews?

A systematic review is a collection of unformatted text. Systematic review software works by taking unformatted text and turning it into data that can be more easily analyzed. NLP takes this a step further and can be trained to extract data so that computers can use it and help automate SR tasks that were once quite time-intensive.

In DistillerSR, NLP speeds up the systematic review process by finding context in full-text documents. NLP powers the DistillerAI Toolkit, which speeds up the systematic review process by:

Helping to train and test on real project data to ensure accuracy and compare against work done by human reviewers
Sorting and ranking references based on the likelihood of inclusion
Automatically including or excluding references based on training data
Double checking that references have not been excluded erroneously

What are the Challenges of NLP?

Although natural language processing has advanced to an amazing degree, there are still some challenges. Most notably is the fact that the English language is–well, complicated. The language is nuanced, ambiguous, and downright messy sometimes. Different words have double or even triple meanings depending on the context. Grammar can change depending on the unique style of the writer, their background, age, or education level, and jargon, slang, etc. And colloquial expressions are a whole other kettle of fish (see what we did there?)

During the parse tree phase of the pipeline, there could be hundreds or even thousands of combinations that could confuse the NLP. But as the technology advances, programmers are building NLPs that compensate for the confusion in the English language and can make accurate decisions based on other data found in the sentence. It’s also helpful that this type of machine learning continues to learn from past data, making it more accurate the more you use it.

There’s no doubt about it; natural language processing is the future. Tomorrow’s systematic review is going to be faster, more efficient, with the same level of compliance and accuracy as human reviewers thanks to incredible technology like this.