In any time of trouble, the archetypal hero usually takes a specific form — soldiers in WWII, firefighters on 9/11, and now health care professionals in the time of COVID-19. But data scientists are playing an indispensable role in fighting the global pandemic. While medical professionals are on the front lines caring for the sick, data scientists have shouldered the responsibility of helping to keep everyone else healthy by disseminating crucial information to the world.
Case in point is the mantra “flatten the curve.” Originating from the Centers for Disease Control (CDC), the idea is that we have to slow down cases of infections in order to keep healthcare systems from collapsing. Those three words are going to save potentially millions of lives. But they mean nothing without the simple visualization of the graph that show how slowed infections can reduce strain on the system, and the graph doesn’t exist without data.
It tells a story of exponential growth that can be difficult for the average person to immediately comprehend. “COVID-19 is…a tricky thing to reason about. Very tricky thing to reason about. Intuition breaks down,” said Jeremy Howard in an interview with VentureBeat. Howard is the co-founder of Fast.ai, which offers free courses on deep learning, and is on the faculty at the University of San Francisco. He pointed out that there’s a long gap between an outbreak occurring and seeing the results. The nature of an illness is that it takes a while for the disease to show itself in a person, and it takes longer to see it at scale.
“This is a perfect storm of what the human brain is bad at,” said Howard. “We respond to what we can see. And we respond to stories. A pandemic doesn’t give you those things.”
He continued, “But what we do have is data. Data scientists are people who know how to look at data and find out what story it’s telling us.” He joked that data scientists aren’t very good at telling that story — except for the visualization. Like “flatten the curve.”
And most if not all of this data and their visualizations are free. It’s a sort of digital flotilla coming to our collective rescue. There’s a multitude of current examples of how free, articulated, and visualized data is helping us combat COVID-19.
This week, a collaboration between a number of businesses and organizations like Microsoft and the Allen Institute, and the White House Office of Science and Technology (OSTP), yielded a trove of data related to COVID-19. The COVID-19 Open Research Dataset (CORD-19) is a machine learning-readable repository of some 29,000 articles about coronaviruses. And then the data scientists mobilized: Kaggle, a Google company that bills itself as the world’s largest community of data scientists, launched a forecasting challenge to uncover factors that impact coronavirus transmission rates.
The primary goal is not to forecast accurately. But to find factors that impact transmission rate.
— Anthony Goldbloom (@antgoldbloom) March 19, 2020
One of the best resources for non-experts who are trying to keep track of the spread of COVID-19 is an interactive dashboard maintained by Johns Hopkins University. It’s tracking infection rates, recovery rates, and death rates by geographic location.
The COVID Tracking Project pulls information from all 50 states, “to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data,” per its website. You can check info on your state and follow a link to the best data source for your state. The COVID Tracking Project is a volunteer effort, built by Jeff Hammerbacher of Related Sciences along with two journalists from the Atlantic, Robinson Meyer and Alexis Madrigal. According to the site, a small army of volunteers from different fields are maintaining and updating the data.
On a more individual level, data scientists like Howard and his Fast.ai cofounder Rachel Thomas spent a frenetic weekend earlier this month putting together an article, from a data science perspective, on how to protect yourself and your community from COVID-19. It links to additional resources, and it’s been translated into 17 languages so far. The effort was personal for Howard and Thomas, both of whom have pre-existing medical conditions that make them more vulnerable to COVID-19.
Both Howard and Thomas have subsequently been aggressive on platforms like Twitter, disseminating information, creating information, and at times debunking bad information. They’ve already shared the first part of their most recent deep learning course, which includes information about COVID-19, with some analysis.
All of the aforementioned resources are free and available to all. And it’s just the tip of the iceberg. So many organizations, companies, and individuals are doing what they can to get the data about COVID-19 and the story it tells to as many people as possible.
Part of that story is sometimes incomplete, and we need data scientists to understand and explain that, too. As Howard pointed out in an interview, data scientists know how to work with censored data, or data where labels are missing or there are unknown values. For example, there may be a stat that says a certain percentage of those infected with COVID-19 will die; but that could be taking in to account only a small number of cases, because most people with the virus have neither died nor yet recovered. And gauging infection rates requires understanding that the number may be significantly impacted by the absence of testing kits — no testing kits means no tests, and no tests means no infections are recorded. In the U.S., testing kit availability has been a serious problem.
People want to help in times of crisis, both from the kindness of their hearts and out of a need to feel control over difficult situations. It’s a particularly wonderful human trait. But there’s so little most of us can actively do in the face of this pandemic. Because we want to bring our skills, abilities, and resources to the fight, it’s counterintuitive and uncomfortable that the best thing the vast majority of us can do is nothing — to literally just stay home. To flatten the curve.
But data scientists happen to possess a skillset that is crucial to this time of global pandemic. Their work helps us all understand what’s happening, and it helps experts like epidemiologists build knowledge, track progress, and provide guidance that ultimately helps keep us all safer and healthier.