How To Train Your Robot: Best Practices For Automated Screening

by Peter O'Blenis | Feb 23, 2018

DistillerAI made its global debut on Feb 11, 2018 in the latest release of the DistillerSR systematic review software platform. Since then, researchers around the world have been testing it and applying it to their literature reviews. In the process, they’ve discovered a few best practices for training DistillerAI to assist with the screening process. Here are their top tips on optimizing your training set.

First, a primer on how DistillerAI learns. When you’re screening, you’re training DistillerAI without even knowing it. As you include and exclude references during your normal screening process, the software is monitoring your decisions and learning from them. This way, training happens organically and doesn’t add overhead to what you’re already doing. Once it has enough data to make confident decisions, DistillerAI lets you know that it’s ready.

One important note on the training set you use. You want to train DistillerAI using the most accurate data possible. If you train it on data with screening errors, it will learn to replicate those errors. So, here are a few tips to keep in mind:

Clean your data before using it to train

Before you hand your screening data over to DistillerAI for training, make sure that you have reviewed it for conflicts between your human screeners and resolved any inclusion exclusion discrepancies. If we humans can’t agree on what’s right, it’s not really fair to ask your robot to arbitrate while it’s still learning!

Minimum training set size

We have found that ten or more included references and 40 or more excluded references should be enough to get DistillerAI trained to recognize inclusion and exclusion characteristics. While you can run the AI Toolkit any time to test DistillerAI, the AI Reviewer tab on your dashboard won’t pop up until you have 100 screened references. Please note that a reference must have been screened by the required number of reviewers to be counted as part of the training set.

Maximum training set size

We’ve observed that learning diminishes with training sets in excess of 300 references. With that in mind, adjust your training set percentage (the percentage of human-screened references to train on) so that the set has no more than 300 references.

Please note that if you use a training set larger than 1,000, the training process may fail (the progress bar will just disappear….I know…we’re working on that). If that happens, just reduce your training set size and try again.

Use multiple screening levels to train

Humans almost always include more references than we should at the first level or stage of a systematic review. This is a best practice to avoid false exclusions. However, if you train DistillerAI on Level 1 screening results results, you will be training your robot on falsely included references. It will learn to make the same mistakes as your human screeners.

To avoid this issue, start your training set at level 1 and finish at the highest level that you have screened references at. For example, if you include Levels 1 to 3, DistillerAI will look at all references excluded at, or before, Level 3 and only look at references included at Level 3. This should dramatically reduce the number of false inclusions in your training set and make your robot smarter.

The time it takes to train an artificial intelligence tool has often been cited as one of the roadblocks to broader adoption of these technologies. DistillerAI has been designed to learn from you as you work, dramatically reducing the time required to get it up to speed. Keep these training tips in mind and your robot assistant will benefit from the education!

Peter O'Blenis

Peter O’Blenis is the CEO DistillerSR and has assembled a collection of best practices and methodologies for using web-based software to streamline clinical research. He believes that well written software can solve real-world problems and has presented globally on the topic.
View all posts

Stay in Touch with Our Quarterly Newsletter

Subscribe

Here’s Why You Should Stop Relying on Spreadsheets for Your Literature Review Process

We understand—spreadsheets are readily available as part of most standard office suites and are easily ‘customized’ to do specific tasks using formulas and macros. It can be very tempting to fall back on them as your default tool for literature reviews – especially...

Webinar Recap: Stryker and IQVIA Share Their Journey Leveraging Data Reuse for Faster Insights, Time to Market and Healthcare Innovation

At a recent webinar with Sepanta Fazaeli, Clinical Systems & Medical Systems Lead at Stryker and Rajpal Singh, Associate Director at IQVIA, moderated by Mark Priatel, VP of Software Development at DistillerSR, we explored the benefits of centralizing literature...

EM’24 Medical Devices/IVD Industry Best Practices Session Recap

The Evidence Matters 2024 Medical Devices/IVD Best Practices session featured industry experts Rea Castro, Director of Medical Affairs at QuidelOrtho, Sepanta Fazaeli, Clinical Systems & Medical Lead at Stryker, and Christa Goode, Worldwide Scientific Operations...

How To Train Your Robot: Best Practices For Automated Screening

Clean your data before using it to train

Minimum training set size

Maximum training set size

Use multiple screening levels to train

Stay in Touch with Our Quarterly Newsletter

Recent Posts

Here’s Why You Should Stop Relying on Spreadsheets for Your Literature Review Process

Webinar Recap: Stryker and IQVIA Share Their Journey Leveraging Data Reuse for Faster Insights, Time to Market and Healthcare Innovation

EM’24 Medical Devices/IVD Industry Best Practices Session Recap

Product

Industries

Pricing

Services

Resources