By Christopher Sirota, CPCU
Can web searches by the public help predict the outbreak of a disease?
Experts have reportedly been working on such digital epidemiological short-term forecasting (aka "nowcasting") tools.
One of the earlier attempts was reportedly made by Google using its so-called Flu Trends data aggregation analytics. Flu Trends, per Time, overestimated the prevalence of influenza for the 2011-12 and 2012-13 seasons. Despite the initial poor results, more recently (in 2019), PLOS reported that some researchers corrected the Google data with newly available time-stamped influenza monitoring data, and concluded that the Flu Trends data actually had more forecasting potential than initially judged.
Apparently there has been continued interest in using web analytics for disease surveillance since then, as evidenced by recent studies focusing on how web and social media data could have been used to nowcast the COVID-19 outbreak. Below are a few of such studies. Note, the studies generally suggest that "nowcast" web analytics tools could potentially supplement but not replace traditional epidemiological surveillance methods, especially since nowcasting can be near real-time and traditional methods can have some reporting lag time.
21-Day COVID-19 "Nowcast"
The New York Times has reported on a team at Harvard University that conducted a study to explore the potential for a nowcasting tool for pandemics by examining a combination of the following data sources: Google Trends, Twitter, physician searches on the platform UpToDate, mobile phone data, and data from a smart thermometer app which uploads to an app. Using a predictive analytics model which looked at these sources state by state for March and April 2020, the researchers compared the sources to COVID-19 case and fatality data and reportedly identified a signal that could potentially have anticipated the COVID-19 outbreak by 21 days, on average.
The article notes that use of these types of data can have limitations in forecasting power since "[s]ocial media and search engines also can become less sensitive with time; the more familiar with a pathogen people become, the less they will search with selected key words."
16-Day COVID-19 "Nowcast"
The Mayo Clinic published (PDF here) a study that examined Google Trends data between January and April 2020. Google Trends data reportedly contains ten terms searched for by the general public and assigns an order of magnitude from 0 to 100 by counting the number searches for the term in a certain geographic area and divides it by the total number of searches in that area. The researchers searched for the following terms and categories:
- COVID symptoms
- coronavirus symptoms
- sore throat+shortness of breath+fatigue+cough
- coronavirus testing center
- loss of smell
- Lysol (sanitizer)
- face mask
- coronavirus vaccine
- COVID stimulus check
- disease symptoms
- possible treatments.
Per the article, the researchers concluded that "[f]or the United States, Google Trends data were highly correlated with cases of COVID-19 on a state-by-state basis and could potentially be used to predict new areas of outbreak and possible high-impact zones as the disease progresses." The article further adds that " [s]trong correlations were observed up to 16 days prior to the first reported cases in some states."
12-Day COVID-19 "Nowcast"
The National Institutes of Health (NIH) published a study that examined data from Google Trends and the Baidu Index across several countries.
According to the study, the researchers used Google Trends to analyze search terms globally, for Spain, for Italy, and for New York and Washington states; they reportedly used Baidu for China. Per the study, the researchers examined various search terms from January to April 2020 and compared this data to confirmed COVID-19 cases and fatalities. The search terms that were strongly correlated "to both new daily confirmed cases and deaths" reportedly included the following:
- shortness of breath
- chest pain
Per the study, the researchers found that the search terms "COVID-19" and "virus" were also strongly correlated, and "predated real-world confirmed cases by 12 days."