春江暮客

春江暮客的个人学习分享网站

Using folium to Draw a COVID-19 Pandemic Map

2020-05-03 Technology
Using folium to Draw a COVID-19 Pandemic Map

The main point of this article is not the pandemic conclusion itself, but the mapping workflow: how to take country-level public data and turn it into an interactive world map. If you want a hands-on example for folium, Choropleth, and country-code preprocessing, this is a practical one.

First, Python has many plotting libraries such as matplotlib, pandas, and plotly, but for quickly generating an HTML map with geographic boundaries, folium is still a very direct option. This article uses that route and breaks the workflow into a few small steps:

  • Getting global COVID-19 data
  • Drawing a world map
  • Data preprocessing
  • Adding pandemic data to the map
  • Improving the pandemic map

Before you start

This is a historical data-visualization workflow. Public data endpoints from WHO or third-party sources may change over time, so if you reproduce this later, check whether the source URL, fields, or access pattern has changed.

How to validate the data before plotting

Before building the map, verify three things first:

  1. The JSON request still returns data
  2. The dimensions and metrics fields still exist
  3. The resulting DataFrame still contains the country and cumulative-confirmed columns you need

For example:

print(covid19_dataframe.columns)
print(covid19_dataframe.head())

If the columns look different from the example, the upstream API schema probably changed and the cleaning step needs to be updated first.

Common issues

1. The map renders, but many countries stay blank

The most common reason is that the country-code mapping failed, so fullname does not match the identifiers inside the geojson file.

2. The data request fails

Usually the endpoint changed, access is restricted, or the source schema was updated. Print the raw JSON structure first before changing the rest of the code.

3. The color differences are hard to see

When the numeric range is very wide, using raw values in a choropleth often makes most countries look pale. A log transform is a practical fix for that.

Log-transformed cumulative confirmed cases

Getting Global COVID-19 Data

To get the latest data, Alipay publishes daily data, but I couldn’t find a public API. Through search engines, I found the following resources updated daily:

WHO data is of course the newest and most reliable. On the WHO data website, you can click through to access it. Using Chrome’s Inspect tool to check network connections, the data API endpoint was found at WHO API
who_json_api

This is JSON data and can be parsed with Python’s json package. Since this is the most authoritative source, this article uses WHO data.

First, install the necessary packages for this article:

pip install folium
pip install numpy
pip install pandas

Next is the real data fetching and processing flow.

## Download data
import requests
import json
covid19_json = requests.get("https://dashboards-dev.sprinklr.com/data/9043/global-covid19-who-gis.json").content

## Parse JSON data
covid19_json = json.loads(covid19_json)

## Check data column names
print(covid19_json['dimensions'])
"""
[{'name': 'day', 'type': 'TIMESTAMP', 'dataType': None},
 {'name': 'Country', 'type': 'STRING', 'dataType': None},
 {'name': 'Region', 'type': 'STRING', 'dataType': None}]
"""

### The first three rows are timestamp, country, region
print(covid19_json['metrics'])
"""
[{'name': 'Deaths', 'type': 'NUMBER', 'dataType': None},
 {'name': 'Cumulative Deaths', 'type': 'NUMBER', 'dataType': None},
 {'name': 'Confirmed', 'type': 'NUMBER', 'dataType': None},
 {'name': 'Cumulative Confirmed', 'type': 'NUMBER', 'dataType': None}]
"""
## The last four rows are daily deaths, cumulative deaths, daily confirmed cases, cumulative confirmed cases

Based on each row’s data, finally create a dataframe for analysis:

import pandas as pd

## Define column names
colname = pd.DataFrame(covid19_json['dimensions'])["name"].append(pd.DataFrame(covid19_json['metrics'])["name"]).to_list()

## Generate dataframe
covid19_dataframe = pd.DataFrame(covid19_json["rows"], columns=colname)

covid19_dataframe.head()
"""
day    Country  Region  Deaths  Cumulative Deaths  Confirmed  Cumulative Confirmed
0 1582502400000 AF      EMRO    0       0              1          1
1 1582588800000 AF      EMRO    0       0              0          1
2 1582675200000 AF      EMRO    0       0              0          1
3 1582761600000 AF      EMRO    0       0              0          1
4 1582848000000 AF      EMRO    0       0              0          1
"""

Up to here, we have obtained data from WHO’s COVID-19 official site and generated a Python DataFrame including time, country, region, deaths, cumulative deaths, confirmed cases, and cumulative confirmed cases.

Drawing the World Map

Next, draw a blank world map to add confirmed cases and death data later. We use the previously mentioned folium package. For details, see the folium official docs.

Drawing a world map:

import folium

m = folium.Map()
m

You will see a generated map:
folium_map

Data Preprocessing

Since WHO uses country codes in short form, to be compatible with folium, we need a dictionary to map WHO’s country codes to folium’s country codes and get geo-boundary info. This allows us to represent countries by colors and filter unnecessary info.

Folium provides related country geo JSON data:

## Download geojson data
url = "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json"

# If GitHub cannot be accessed, use https://www.bobobk.com/wp-content/uploads/2020/05/world-countries.json as a substitute

geojson = json.loads(requests.get(url).content)

The country code CSV file has been downloaded and is available for download here:

Country code abbreviation CSV

Replace the country codes in COVID-19 data:

## Map country codes
df_abb = pd.read_csv("https://www.bobobk.com/wp-content/uploads/2020/05/country_codes.csv", header=None)
dic_abb = {df_abb.iloc[i,1]: df_abb.iloc[i,2] for i in range(np.shape(df_abb)[0])}

covid19_dataframe['fullname'] = covid19_dataframe['Country'].map(dic_abb)

## Simplify data to only include the latest date confirmed and cumulative confirmed cases
today = str(covid19_dataframe.day.unique().max())
covid19_dataframe = covid19_dataframe[covid19_dataframe.day == today]

covid19_dataframe = covid19_dataframe[["Country", "fullname", "Cumulative Confirmed", "Confirmed"]]

Adding Pandemic Data to the Map

Now, witness the magic: draw the latest COVID-19 data from WHO onto the map with folium.Choropleth:

folium.Choropleth(
    geo_data=url,
    name="covid-19 cumulative confirm map",
    data=covid19_dataframe,
    columns=["fullname", "Cumulative Confirmed"],
    key_on="feature.id",
    fill_color='PuRd',
    nan_fill_color='white'
).add_to(m)

Cumulative confirmed cases map:
total_confirm

Parameter explanation:

  • geo_data: URL to geojson data (no need to download)
  • name: Map layer name
  • data: Pandas DataFrame with the data
  • columns: Two columns — one for country code, must match geo_data, and one numeric column (here cumulative confirmed cases)
  • key_on: Specifies the geojson feature property to match with DataFrame keys (here country full name)
  • fill_color: Color scheme
  • nan_fill_color: Fill color for missing data

Let’s also add daily new confirmed cases map:

m = folium.Map()
folium.Choropleth(
    geo_data=url,
    name="covid-19 confirm map",
    data=covid19_dataframe,
    columns=["fullname", "Confirmed"],
    key_on="feature.id",
    fill_color='PuRd',
    nan_fill_color='white'
).add_to(m)
# m.save("new_confirm.html")

New confirmed cases map:
new_confirm

Improving the Pandemic Map

The previous maps show confirmed cases, but due to large differences in numbers, most countries appear white (zero or near zero cases). So, we apply a logarithmic transformation to better visualize differences.

Code to draw improved maps:

covid19_dataframe["log_total_confirm"] = np.log(covid19_dataframe["Cumulative Confirmed"] + 1)
covid19_dataframe["log_new_confirm"] = np.log(covid19_dataframe["Confirmed"] + 1)

covid19_dataframe.head()

## Log-transformed cumulative confirmed cases
m = folium.Map()
folium.Choropleth(
    geo_data=url,
    name="covid-19 log total confirm map",
    data=covid19_dataframe,
    columns=["fullname", "log_total_confirm"],
    key_on="feature.id",
    fill_color='PuRd',
    nan_fill_color='white'
).add_to(m)
m.save("total_confirm_improve.html")

## Log-transformed new confirmed cases
m = folium.Map()
folium.Choropleth(
    geo_data=url,
    name="covid-19 log new confirm map",
    data=covid19_dataframe,
    columns=["fullname", "log_new_confirm"],
    key_on="feature.id",
    fill_color='PuRd',
    nan_fill_color='white'
).add_to(m)
m.save("new_confirm_improve.html")

Result visualization:
Log cumulative confirmed:
_total_confirm_improve
Log new confirmed:
new_confirm_improve

You can clearly see where new cases are rising fast and which areas are safer. Pay special attention to these countries and regions.

Summary

This article combines WHO’s global COVID-19 data with the folium package to draw maps of total confirmed and new confirmed cases worldwide. It helps people worldwide get the latest updates daily, take care of their health, go out less, wash hands frequently, and wear masks — contributing to the fight against COVID-19.

Python COVID-19 analysis code download

Notes

For daily updated info, the site offers daily pandemic data views at tool.bobobk.com with API addresses:

友情链接

其它