(Featured image source: NextStrain)
I’m not sure when it was that looking up the current numbers of the Covid-19 pandemic became a daily ritual. Perhaps sometime in the first couple weeks of March? But lately, along with catching bits of Andrew Cuomo’s daily briefings, I check the current numbers, sometimes every couple hours.
This is what I have open on my browser window on any given day. You can treat this as a jumping off point for links; that’s what I do.
These websites gather data from a wide set of sources, that include governments, local media, hospitals, etc.
European Centre for Disease Prevention and Control: The European CDC is a rigorous and thorough source of Covid data that researchers work hard to get right. My guess it is much more trustworthy than the US CDC.
Worldometers: Worldometers.info tracks all sorts of numbers that they show live on their dashboard, where you can watch the global population changing in real time. It has been used as a canonical source of data by governments and news services. But I, like most people, only found it due to their coronavirus tracking. They source their data from government websites and local news.
Johns Hopkins: The Johns Hopkins Coronavirus Resource Center is not only a source of data, but also a pretty cool interactive graphic, built by the Center for Systems Science and Engineering at the Johns Hopkins University. Their initial source, while the epidemic was confined to China, was DXY—a website run by the Chinese medical community. Now, they have expanded their sources to include the US Center for Disease Control and Prevention, the World Health Organization, the European Center for Disease Prevention and Control, Worldometers.info, etc.
Covid Tracking Project: The Covid Tracking Project has become another go-to source for US Covid data. It was created by a team from The Atlantic to fill a void in the Johns Hopkins/Worldometer data sets: they didn’t report on tests performed. Without an idea of testing, one can have no confidence in the slopes of curves: is the curve sloping down because infections are actually decreasing? Or is it because there were simply fewer tests performed? Along with the numbers of positive and negative numbers in each state, the Covid Tracking Project gives each state a letter grade to show how well they’re reporting Covid data.
New York Times: The New York Times data set: another attempt by reporters to fill the gap left by incomplete reporting from government sources such as the CDC. US-focused, this is an effort to gather county-level data by reporters working round-the-clock.
Corona Data Scraper: This is a really cool open source project by Adobe engineer Larry Davis. It marries Covid data from a wide variety of sources—all of which are listed on the website—with geolocation and population data, to produce Covid numbers down to the county level. It also presents several visualizations written by contributors. All of its source data files can be downloaded, and the source code is up on GitHub.
Visualizers and Dashboards
It is very hard for humans to visualize exponential data, or as I like to call it, data that “explodes” or shows wildfire-like spread. This is why the logarithmic curve, where the Y-axis shows incidences of coronavirus growing by multiples of 10, instead of linearly, has become such a standard.
Financial Times: As far as I’m aware, the Financial Times corona tracker created by John Burn-Murdoch was an early trendsetter, and their parchment-colored graphs were copy-pasted everywhere. The innovation that allowed them to present all countries on the same graph, although the pandemic arrived at each country at different times, is that the x-axis does not show time, but rather days since the 100th person was infected. Another innovation: they smooth out the x-axis by showing a rolling 7-day average, instead of the very jittery daily numbers.
They have recently removed the paywall, and have an interesting time-lapse view of when each country locked down.
Our World In Data: Our World In Data is a long-standing leader of data visualizations housed at Oxford University. They present 40 different visualizations of worldwide Covid data, mostly sourced from the European CDC, that present confirmed cases, deaths, testing, that can be viewed per capita or per country.
NYT Map: The New York Times Map view presents their own data interactively, so you can choose to see total cases, deaths, or per capita numbers on the US map. They also perform the useful service of showing growth rates per US metro area.
Stat News: Stat News, the medical news website, has an excellent dashboard, the virtue of which is that you can start from the global level, and drill down into any country, then state/province, down to the county level; and you see a graph over time showing both cumulative and new cases, and Covid deaths.
91-DIVOC: Another useful visualization is the 91-DIVOC “Flip the Script” website by computer science professor Wade Fagen-Ulmschneider. Yes, the website’s name is COVID-19 backwards. It has the cool feature of where you can highlight a particular country among the curves (or a state within the US) to visually compare the one you’re most interested in with the others. You can also have the curves represent deaths, confirmed cases, or new cases per day, etc; and switch between log and linear scales.
Covid Charts on Tableau: Peter Walker, a media analytics professional, has created an absolutely spectacular set of visualizations, hosted on Tableau, which is a platform for creating data visualizations for all sorts of complex phenomena. I must say they do a great job. Peter Walker’s visualizations on the page Coronavirus in USA is an interactive set of graphs for total positive cases, tests, and deaths; it shows the numbers both daily and with a 7-day rolling average on the same graph. Somehow, this crowded information is presented clearly and concisely. You can easily skip to a particular state to view its charts; or focus in on national testing numbers, growth in cases over time, and a map view that uses simple “more/fewer” color-coding.
rt.live: A website created by the founders of Instagram (Kevin Systrom and Mike Krieger) and others, rt.live focuses in on the one measure that arguably is the single most important number that shows how well a region is dealing with the Covid-19 pandemic: the local rate of infection at any point in time (Rt). If under 1, Covid will slowly grind to a halt in that region; if over 1, it will spread. How far above or below 1 the region’s Rt is determines how fast it will spread or die out. As such, you could get a lot of information just by focusing on this one number.
Projections and Models
A model is not a prediction: it is a warning to get us to change our behavior. When a model turns out wrong, therefore, it isn’t a sign that it failed. It is a sign that we heeded the red flashing light it held over our heads. This, alone, negates some of the bad-faith critiques models that these models have received.
The Imperial College Model: At the start of the pandemic, the UK, under Boris Johnson’s leadership, planned to “ride it out” by not taking any social distancing measures. The plan was to permit widespread infections and allow the population to achieve herd immunity. It was the Imperial College projection that changed their trajectory. With no interventions, it predicted 550K dead in the UK and 2.2 million dead in the US. Imperial’s models are treated as a gold standard when it comes to epidemics: that projection made governments on both sides of the Atlantic sit up and listen. Recently, Imperial has revised their numbers downward. Rather than a “walkback” or evidence of conspiracy, as the forever-conspiratorial right-wingers tend to see it, this is a sign that people across the US and UK have started to socially distance.
UW Projections: The University of Washington’s projections of how many Covid infections/complications each US state would see were built for hospitals to plan resource use. But they quickly became a widely-used tool to get a sense for how well social distancing was working in each state since they adjust their projections over time as the country shelters in place. Apparently these projections were also used by Dr. Fauci and Dr. Birx at the White House; leading to Trump touting 100K-200K dead Americans as some sort of victory. Lately, this projected number has fallen to around 60K dead.
Covid Act Now: Covid Act Now is another such projection. It was built by a disparate team of volunteers that includes data scientists and doctors, to show how early states need to lockdown or isolate in order to avoid their hospitals being overloaded. It is based on their open source data model.
There are other ways to track the pandemic than the brute force numbers of infections and deaths. Ultimately, death is, as they say, a lagging indicator of where the pandemic has taken hold, while the number of infections reported are inevitably an under-count, limited by the amount of testing.
So people have tracked different metrics, some very surprising ones, to give a sense for how the pandemic is hurtling through the world.
Excess Deaths: As hospitals fill up, tests are in short supply, and emergency response teams are slammed, is it any surprise that many will die, at home, uncounted? The official cause of death might be cardiac arrest or pneumonia-related complications, but during an outbreak, very likely the underlying cause of the death was the virus SARS-Cov-2. While there is no way to be certain, if a region shows a death toll much higher than what is “normal”, one can make a very good guess that at least some of those excess deaths were due to the outbreak (some will inevitably occur because patients with unrelated problems could not get treatment in time). Studies in Italy, Spain, Wuhan, and the UK showed the actual death toll may be from two to ten times higher than the official numbers. In NYC alone, a graph of cardiac arrest calls from the Economist (above) shows how grievous the under-counting might be.
Atypical Illness from Kinsa Smart Thermometer: In the early days of the pandemic’s US spread, I saw a “weather map” that showed the US map with some regions that were coded red that even to an untrained eye signal danger. It turned out that a consumer product called the Kinsa Smart Thermometer had been collecting data about elevated temperatures from their 1 million strong customer base throughout the country. Analyzing this data along with the “expected” seasonal flu variation, they were able to see hotspots (shown in red) where people reported fevers much over and above the seasonal flu. This way, they were able to identify the Florida coastline as and some other counties in the south were about to be Covid-19 hotspots while their beaches were still full of Spring Break revelers.
Google Searches: A really fascinating study out of Cornell University looks for outbreaks of Covid-19 using Google Search data. As this chart shows, the searches for symptoms such as “loss of smell” tracks closely with an outbreak of Covid-19 in that area. In fact, they might have identified eye pain as a potential symptom of Covid-19.
Location Data: Tech companies have been on the backfoot for a few years now, fighting off concerns about how much consumer data they were collecting stealthily. It is eerie to hear, in normal times, that your social media or cell phone company knows where you are and where you’ve been in the last week.
But it turns out this type of data is actually invaluable when one is in the midst of a global contagion. Some of this GPS cell phone data has been used to produce score cards and charts to show how well your community is social distancing. But the more promising use—that a number of tech companies are thinking about—is to produce a heat map of known coronavirus infections, and help governments in tracing all who’ve been in proximity of known positive cases to test and quarantine early, if they ever gear up to do so. Tech data is used in exactly this way in South Korea with a great deal of success.
What about privacy concerns? I wouldn’t be surprised if people are skeptical, but Sundar Pichai of Google and Tim Cook of Apple coordinated their tweets, sent out a minute apart, to say they were jointly working on a solution using Bluetooth that would respect consumers’ privacy.
The Genome: I’ve filed this under “offbeat” data but it really is the core of the coronavirus SARS-CoV-2, its gene sequence.
Scientists all over the world jumped on the data the Chinese scientists published, and have now sequenced over 3000 genomes. They have a pretty good idea of how the virus has mutated as it spread through the world. They have identified 11 separate mutations, which shows that it is not mutating very quickly. This bodes well for an eventual vaccine; the flu vaccine, for instance, has to be rejiggered each year because the flu virus mutates so rapidly.
Added bonus: Nextstrain, the website that visualizes open-source genome data, has some colorful charts showing SARS-CoV-2’s spread, and several different ways to visualize it: check it out.