How is the virus spreading? What makes transmission more likely? Why are some people getting more sick than others? What treatment methods are working? How close are we to a vaccine?
Even as scientific studies are published every day and new research projects get started around the world, we all have more questions than answers about the novel coronavirus and COVID-19.
And the questions are mounting.
Historically, these types of questions are answered over time through extensive research that gets published, reviewed and used to help prevent and treat future iterations of an infectious disease.
With COVID-19, we aren’t afforded the luxury of time. While there is plenty of research and data being shared about the disease, the time required to read, compare and understand all the research while still fighting the disease is a growing issue.
To help address this challenge, more than 50,000 full-text documents on COVID-19 and other coronaviruses have been gathered and released to the public by the Allen Institute for AI, Semantic Scholar and other research groups. There’s a call to analyze these documents, using advanced analytical methods, in the hopes of connecting researchers with answers.
The problem becomes what analytical strategy can we employ to connect the right research with the right people to answer some of our most pressing questions?
Since the call went out, data scientists and health experts around the world have stepped forward to mine the literature for answers using text analytics and artificial intelligence. Thoroughly reviewing just a handful of these papers to determine relevancy could be a process that takes many hours. Applying AI and text analytics concepts can reduce this to seconds or a few minutes and help network many researchers with relevant papers – and potentially lifesaving answers – much sooner.
At SAS, we’ve assembled a global team of clinical and epidemiological experts alongside technical experts to contribute to this effort. They’re analyzing the raw publications and clinical data in order to develop predictive models that can surface results in visual format for further exploration.
I asked a few of them to tell me how the process is going so far and share what they’re learning. This interview includes responses from:
Why are these documents so important – and what challenges do they present to researchers?
Sherrine Eid: Before starting any level of scientific research, especially with a novel virus, the first thing you need is good literature and good assumptions. This is especially true during an outbreak.
How do you sift through the entirety of all this information – new publications, blog posts, PDFs, pre-publication materials, 100-year-old epidemiological models – and find the information you need and trust?
It’s so important for us to be able to know not only the content and work that has already appeared in these publications because that’s our compass – but also the quality of the material. When trying to understand a novel virus, the models are all over the place because of the assumptions they have to make. Text mining these documents can help improve assumptions and our collective research models.
Sarah Hiser: Technology can scale communications between scientists and researchers. Coming from an academic background and knowing a lot of researchers, if there is any way that I can scale a researcher’s ability to get their work out there faster, collaborate more efficiently and inform the scientific community faster, that’s going to be something I’m incredibly passionate to make happen.
This research can help me rely on my fellow scientists and build on their science. Every new fact we learn about the virus and every question we answer has the potential to save lives. Sherrine Eid, MPH Global Health Care Principal and COVID-19 Epidemiological Response Team SAS
What answers are you especially interested in finding in the data right now?
Eid: If I’m a state epidemiologist, I need to find some intelligent ways to help me sift through journal articles and papers that are out there. If I need to know how to plan for surge capacity in my state, I need to have an intelligent search engine with search capabilities tailored to me.
As a public health specialist, I can explore these documents to find answers that will help navigate critical points in my city or state. These documents also help me understand how mitigation strategies might affect my region or how a certain subset of their population might be affected. I would be taking a deep dive into at-risk populations in my region to help me get insights on that population and not worry about them falling through the cracks.
Hiser: I’m looking at viral mutations and doing a scientific review of the genetic mutation literature. Based on the research, can we figure out how the virus is mutating and how that influences different characteristics of the virus itself?
Are different infection rates based on different characteristics or different strains of the virus? Do different strains have different incubation periods or different asymptomatic infection rates? If there is a more aggressive strain and you know that strain is in your area, can you use that information to inform mitigation efforts and other decisions?
Scott McClain: We’re also exploring where the outbreaks are located and combining geospatial analysis with genetics analysis to monitor the evolution and geolocation of different strains of the virus.
When you can pull out geolocation, you can help people understand how close the outbreak is and help public health leaders plan for resources and support they will need in the future. Since we know epidemics have second and third waves, we can use this data to predict if certain strains of the virus will be associated with certain outcomes and start to ask whether different areas will experience their second waves differently.
What’s exciting about the approach you’re taking with SAS® to analyze these documents?
Eid: Intelligent search capabilities are driven by common questions that different personas will have. A state epidemiologist is going to come to these documents with different questions than a clinician. The persona-driven search engine surfaces the content most relevant to that person and then connects them with related resources. After that, the network analysis provides further search paths, and the deep-dive analysis can focus text analytics on a very specific question.
McClain: The part that’s really powerful for a citizen scientist is that the analytics are already built into the base layer of SAS technology. The ability to use things like concepts, text parsing, sentiment and topics helps create faster, deeper insights into scientific text. When I log in to start analyzing the documents, the system automatically provides group listings of key terms that it finds often and are related in an article that suggests some context. The technology does that without any initial direct intervention.
Simply put, using SAS, I’m given a starting place before I have certain terms in mind. It can scour the corpus of information so much faster than scanning individual papers and can point you in a relevant direction that is based on many pieces rather than single, disjointed pieces of data.
Hiser: It’s impossible for an individual to read through 35,000 documents and understand the full story. But, an individual can train an algorithm to find what you are looking for and help scale the work they are doing. It’s like having 500 interns work for me – with this technology, the “AI interns” read the articles and come back to me with information to refine my search. Those insights help me decide what to look for next, so I tell the model to do it a little bit differently and get even more answers.
How will this research matter in the bigger picture of fighting the disease?
Eid: If you find one article that looks relevant for your area, it might not be enough for a policy or mitigation effort. But if you can associate it with three other research results, it can help you make decisions based on intelligence from others. It can inform when to manage patients with a different protocol, who to test quicker or who to put on oxygen sooner. This research can help me rely on my fellow scientists and build on their science. Every new fact we learn about the virus and every question we answer has the potential to save lives. If research into these documents can encourage folks with new evidence to stay home and distance socially, knowing we don’t have a vaccine, it will lower the intensity and impact on our health care system.
Jeremy draws on more than 20 years of experience in data science to evangelize the benefits of advanced analytics – including AI – in health care. He helps lead SAS health care initiatives for data, AI and analytics, ensuring solutions align with health care market needs. He's passionate about applying analytics to health care modernization and executing new strategies within complex global health systems. Jeremy's work focuses on the essential interdependencies between healthcare policy, programs, and providers, payers and patients.
- A data scientist’s views on data literacyGet a data scientist and teacher's perspective on the value of having foundational knowledge so you can more easily tell data fact from data fiction.
- Health care cost containment through big data analyticsHealth insurers are plagued by fraud, waste and abuse. For health care cost containment, an enterprise approach to payment integrity using data management and analytics can help. With this approach, payers can detect and prevent fraud; influence provider, employee and patient behavior; and substantially reduce costs.
- Analytic simulations: Using big data to protect the tiniest patientsAnalytic models help researchers discover the best way to care for babies in the NICU, saving lives (and millions of dollars) in the process.