



Although the data may exist and are frequently treated as “ground truth” statistics, they are reported at a pace that disallows real-time evaluation of emergent crises. In high-income settings like the US, insufficiency of government-sponsored public health data is often characterized by delays in reporting ( 11). News media data have repeatedly demonstrated usefulness in aggregating case counts, and in each of the aforementioned instances, were implemented to augment otherwise insufficient data from government-sponsored agencies. For infectious disease research specifically, case count data obtained from news coverage of outbreaks led to studies that examined the 2014–2015 Disneyland measles outbreak ( 4) and the 2016 Arkansas mumps outbreak ( 5), as well as a broad range of international studies, including Zika ( 6, 7) and dengue ( 8) in Latin America, H7N9 in China ( 9), and Ebola in West Africa ( 10), among others. Gaps in government-sponsored public health surveillance have long preceded the pandemic, as has the practice of leveraging alternative data sources. Situational statistics are also useful more broadly in infectious disease epidemiology research. News media organizations such as The Atlantic’s COVID Tracking Project partially filled this gap ( 3), highlighting the critical role that alternative data sources can play during public health emergencies. This was due, in part, to lack of prioritization and underinvestment in local public health surveillance systems ( 2).

Most notably, for the first year of the pandemic, the Centers for Disease Control and Prevention (CDC)-which has historically been responsible for reporting population-level situational statistics (e.g., cases, hospitalizations, and deaths over time during infectious disease outbreaks)-did not efficiently report COVID-19 related statistics. The COVID-19 pandemic has exposed foundational gaps in government-sponsored public health surveillance across the United States (US) ( 1).
