Building the Distilled COVID-19 Reference Sets

by | Mar 31, 2021

Building the Distilled COVID-19 Reference Sets

We are currently living through an unprecedented global pandemic. All of our lives are impacted by COVID-19, whether we know someone personally affected by the virus or not. “Physical distancing” is quickly becoming the new normal in many communities.

The critical importance of public health is taking center stage, and as a supporter of the scientific community, the team at Evidence Partners is doing everything we can to help.

To offer support to researchers working on COVID-19, we have adopted two initiatives that we hope will help support and accelerate your important work:

  1. Researchers working on COVID-19 can now access a new version of DistillerSR for free. This soon-to-be-released version contains advanced AI features that accelerate the screening process for faster results.
  2. We applied the pre-release AI features from DistillerSR to the open-source CORD-19 dataset and other sources to create a tagged reference set of COVID-19 relevant articles. We are offering this as a free download for researchers who need access to COVID-19 literature quickly. The reference set is updated each week as new material becomes available.

Preparing to Screen

As our reference sets are updated every week, we start the process by downloading the most recent edition of CORD-19 and Once the reference sets get imported, we run the DistillerSR deduplication tool to remove duplicate references from the data. For example, we deduped against the ‘Cord_uid’ field, which, on our April 17th update, removed 49,925 duplicate references, consisting of references that were duplicates of references uploaded in prior weeks. We also deduped against the ‘Pubmed_id’ and ‘DOI’ fields (on April 17th, that specific dedupe removed 975 additional references.) The rest of the list was cleaned using the “Extreme Precision” setting in DistillerSR.

Human + AI Screening

We used a DistillerSR AI classifier to identify references and systematic reviews that relate specifically to COVID-19 (precision: 64.5%, recall: 99%) and tagged those that meet inclusion and exclusion thresholds.

Next, a human reviewed the references labeled by the COVID-19 classifier. A conflict check was used to compare human responses to those of the AI and corrections were made to each response set as required. Humans then screened approximately 200 beyond what was screened by the AI to look for references that may have been missed.

This process leveraged AI-powered Continuous Reprioritization, a feature that uses AI to prioritize references based on the likelihood of inclusion. As you continue to screen, the AI continuously re-ranks references to bubble the most promising ones to the top.

Weekly Updates

We are updating the reference set on a weekly basis as new literature about COVID-19 becomes available. As we do this, our COVID-19 AI classifier is continuously retrained with the new data, increasing its accuracy. All new references are dual screened using the AI as one of the screeners.

We were able to create this reference set and maintain it with one person working less than three hours on it per week thanks to the AI tools and other features in DistillerSR.

All of this is in the hopes that DistllerSR and our efforts will assist researchers who are currently working tirelessly to combat COVID-19. In an unprecedented time, we feel that it’s incumbent upon us to do our best to be part of the solution.

  • Vivian MacAdden

    Vivian MacAdden is DistillerSR's Senior Manager, Industry Marketing - Medical Devices. Throughout her career, she has accumulated 20 years of strategic marketing experience in various industries in Canada and international markets such as Brazil, China, Singapore, Jordan and Japan. A problem solver at heart and forever an optimist (and karaoke lover), she is passionate about telling great stories that make a positive impact on the world.

    View all posts

Stay in Touch with Our Quarterly Newsletter

Recent Posts