Visualization Of COVID-19 New Cases Over Time In Python

Visualization Of COVID-19 New Cases Over Time In Python

Inspired by another concise data visualization, the author of this article has crafted and shared the code for a heatmap which visualizes the COVID-19 pandemic in the United States over time.


By Jason Bowling, Manager, Network Communications at University of Akron

Figure

Heat map of new COVID-19 cases per 100K of population, by day (click to enlarge)

This heat map shows the progression of the COVID-19 pandemic in the United States over time. The map is read from left to right, and color coded to show the relative numbers of new cases by state, adjusted for population.

This visualization was inspired by a similar heat map that I saw on a discussion forum thread. I could never locate the source, as it was only a pasted image with no link. The original version was also crafted to make a political point, separating states by predominate party affiliation, which I was not as interested in. I was fascinated by how it concisely showed the progression of the pandemic, so I decided to create a similar visualization myself that I could update regularly.

Source code is hosted on my Github repo. If you are just interested in seeing updated versions of this heat map, I publish them weekly on my Twitter feed. It’s important to note that you should be careful comparing graphs from one week to another to each other, as the color map may change as new data is included. Comparisons are only valid within a given heatmap.

The script relies on pandas, numpy, matplotlib, and seaborn.

The data comes from the New York Times COVID-19 Github repo. A simple launcher script clones the latest copy of the repository and copies the required file, and then launches the Python script to create the heat map. Only one file is really needed, so it could certainly be tightened up, but this works.

echo "Clearing old data..."
rm -rf covid-19-data/
rm us-states.csv
echo "Getting new data..."
git clone https://github.com/nytimes/covid-19-data
echo "Done."

cp covid-19-data/us-states.csv .
echo "Starting..."

python3 heatmap-newcases.py
echo "Done."

The script first loads a CSV file containing the state populations into a dictionary, which is used to scale daily new case results. The new cases are computed for each day from the running total in the NY Times data, and then scaled to new cases per 100,000 people in the population.

We could display the heat map at that point, but if we do, states with very high numbers of cases per 100,000 people will swamp the detail of the states with lower numbers of cases. Applying a log(x+1) transform improves contrast and readability significantly.

Finally, Seaborn and Matplotlib are used to generate the heatmap and save it to an image file.

That’s it! Feel free to use this as a framework for your own visualization. You can customize it to zero in on areas of interest.

Full source code is below. Thanks for reading, and I hope you found it useful.

import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import csv
import datetime

reader = csv.reader(open('StatePopulations.csv'))

statePopulations = {}
for row in reader:
    key = row[0]
    if key in statePopulations:
        pass
    statePopulations[key] = row[1:]

filename = "us-states.csv"
fullTable = pd.read_csv(filename)
fullTable = fullTable.drop(['fips'], axis=1)
fullTable = fullTable.drop(['deaths'], axis=1)

# generate a list of the dates in the table
dates = fullTable['date'].unique().tolist()
states = fullTable['state'].unique().tolist()

result = pd.DataFrame()
result['date'] = fullTable['date']

states.remove('Northern Mariana Islands')
states.remove('Puerto Rico')
states.remove('Virgin Islands')
states.remove('Guam')

states.sort()

for state in states:
    # create new dataframe with only the current state's date
    population = int(statePopulations[state][0])
    print(state + ": " + str(population))
    stateData = fullTable[fullTable.state.eq(state)]

    newColumnName = state
    stateData[newColumnName] = stateData.cases.diff()
    stateData[newColumnName] = stateData[newColumnName].replace(np.nan, 0)
    stateData = stateData.drop(['state'], axis=1)
    stateData = stateData.drop(['cases'], axis=1)

    stateData[newColumnName] = stateData[newColumnName].div(population)
    stateData[newColumnName] = stateData[newColumnName].mul(100000.0)

    result = pd.merge(result, stateData, how='left', on='date')

result = result.drop_duplicates()
result = result.fillna(0)

for state in states:
    result[state] = result[state].add(1.0)
    result[state] = np.log10(result[state])
    #result[state] = np.sqrt(result[state])

result['date'] = pd.to_datetime(result['date'])
result = result[result['date'] >= '2020-02-15']
result['date'] = result['date'].dt.strftime('%Y-%m-%d')

result.set_index('date', inplace=True)
result.to_csv("result.csv")
result = result.transpose()

plt.figure(figsize=(16, 10))
g = sns.heatmap(result, cmap="coolwarm", linewidth=0.05, linecolor='lightgrey')
plt.xlabel('')
plt.ylabel('')

plt.title("Daily New Covid-19 Cases Per 100k Of Population", fontsize=20)

updateText = "Updated " + str(datetime.date.today()) + 
    ". Scaled with Log(x+1) for improved contrast due to wide range of values. Data source: NY Times Github. Visualization by @JRBowling"

plt.suptitle(updateText, fontsize=8)

plt.yticks(np.arange(.5, 51.5, 1.0), states)

plt.yticks(fontsize=8)
plt.xticks(fontsize=8)
g.set_xticklabels(g.get_xticklabels(), rotation=90)
g.set_yticklabels(g.get_yticklabels(), rotation=0)
plt.savefig("covidNewCasesper100K.png")

 
Bio: Jason Bowling is Manager of Network Communications at University of Akron. Jason is a proven technology professional with a focus on network administration, security and medical device design. Outstanding troubleshooting skills, excellent written communications, and established project management experience. You can find more of his writing on Medium.

Original. Reposted with permission.

Related:

About: admin