Google today released a corpus of web search trends intended to help researchers study the link between queries and COVID-19 spread. The COVID-19 Search Trends symptoms corpus, which includes Google search trends for over 400 symptoms, signs, and health conditions like “cough,” “fever,” and “difficulty breathing,” drills down to the U.S. county level over the past three years.
Search is often where people seek answers on health and wellbeing. Indeed, Microsoft researchers recently used Bing search data to characterize the changes in people’s needs during the pandemic. In the past, scientists have tapped Google search data to gauge the health impact of heatwaves, improve prediction models for influenza-like illnesses, and monitor Lyme disease incidence. Using geotagged searches, a Harvard team even managed to identify restaurants with less-than-stellar food safety records.
There’s evidence search trends reveal a lot about COVID-19 infections. In a recent study published in the Journal of Medical Internet Research, scientists at Cedars Sinai Medical Center, Indiana University, and Kentuckiana ENT found a correlation between searches for symptoms of the disease and new confirmed cases and deaths. Moreover, they managed to tie increased symptom searches to super-spreading events including the February Champions League soccer match in Italy.
“Researchers could use the COVID-19 Search Trends symptoms data set to study if search trends can provide an earlier and more accurate indication of the reemergence of the virus in different parts of the country,” Google Health senior staff research scientist Evgeniy Gabrilovich wrote in a blog post. “And since measures such as shelter-in-place have reduced the accessibility of care and affected people’s wellbeing more generally, this data set — which covers a broad range of symptoms and conditions, from diabetes to stress — could also be useful in studying the secondary health effects of the pandemic.”
Google says the COVID-19 Search Trends symptoms corpus is powered by the same anonymization technology used in “other Google products every day.” No personal information or individual search queries are included, the company says, thanks in part to a technique that adds noise to the data to provide privacy guarantees while preserving quality.
Similar to Google Trends, data within COVID-19 Search Trends is normalized based on symptoms’ relative popularities, allowing researchers to study spikes in search interest over periods of time. This initial release is limited to the U.S. and covers searches made in English and Spanish in states and counties “where the available data meets quality and privacy thresholds,” but Google says it will “evaluate and expand” the data set as it receives feedback from public health researchers and civil society groups.
The release of the COVID-19 Search Trends symptoms data set is a part of Google Cloud’s ongoing COVID-19 Public Datasets program, which kicked off earlier this summer. (In a complementary effort, Google partnered with Harvard to release COVID-19 prediction models.) Corpora within the COVID-19 Public Datasets program includes the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE) data set, Global Health Data from the World Bank, and OpenStreetMap data, all of which are stored for at no cost on Google Cloud.