春江暮客

春江暮客的个人学习分享网站

Drawing Raincloud Plots with Python

2022-01-11 Technology
Drawing Raincloud Plots with Python

For exploratory analysis, box plots and violin plots are already useful, but a raincloud plot is often more expressive when you want both distribution shape and individual observations in one figure. By layering a violin, a boxplot, and scattered observations, it gives you a richer view without losing readability.

This article focuses on three practical questions:

  1. What layers make up a raincloud plot
  2. How to build those layers step by step in Python
  3. What to check before exporting the final figure

Introduction

A raincloud plot is actually a hybrid plot consisting of four parts: a violin plot (the cloud), a boxplot (the umbrella), and a swarm plot (the rain).


Data Preparation

We’ll continue to use the penguin dataset as an example, which is already available on this site for direct download: Penguin Data

Download Data

pip install ptitprince
wget https://www.bobobk.com/wp-content/uploads/2021/12/penguins.csv

This penguin dataset is a good fit because it combines categorical groups with continuous measurements.

Load data

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("penguins.csv")
df.head()

Result

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	MALE
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	FEMALE
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	FEMALE
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	FEMALE

Starting the Plotting

We’ve already installed the ptitprince package using pip. Now, let’s start plotting.

Violin Plot

First, let’s create half of the violin plot, which is the “cloud” part of the raincloud plot. The data is for the bill_length_mm variable, grouped by island. The half_violinplot function creates half a violin plot, and inner controls the small lines below. To position the cloud correctly, we place the variable on the Y-axis.

import matplotlib.pyplot as plt
import ptitprince as pt
pt.half_violinplot(data=df,y="island",x="bill_length_mm",inner=None)
plt.savefig("half_violin.png",dpi=200)

Boxplot

Next is the “umbrella” part.

pt.half_violinplot(data=df,y="island",x="bill_length_mm",inner=None)
sns.boxplot(data=df,y="island",x="bill_length_mm",width = .15, zorder = 10,boxprops = {'facecolor':'none', "zorder":10},
            whiskerprops = {'linewidth':2, "zorder":10})
plt.savefig("violin_box.png",dpi=200)

Strip Plot

The “rain” part of the raincloud plot uses a stripplot, and the jitter parameter disperses the scattered points.

pt.half_violinplot(data=df,y="island",x="bill_length_mm",inner=None)
sns.boxplot(data=df,y="island",x="bill_length_mm",width = .15, zorder = 10,boxprops = {'facecolor':'none', "zorder":10},
            whiskerprops = {'linewidth':2, "zorder":10})
sns.stripplot(data=df,y="island",x="bill_length_mm",jitter=1,edgecolor = "white",zorder = 0)
plt.savefig("raincloud.png",dpi=200)

R Implementation

Here, I’m also providing the R language implementation of the raincloud plot.

library(ggplot2)
library(ggdist)
df = read.table("penguins.csv",sep=",",header=TRUE)
pdf("raincloud.pdf",width=14, height=7)
ggplot(data=df,aes(y=bill_length_mm,x=factor(island),fill=factor(island)))+
  ggdist::stat_halfeye(adjust=0.5,justification=-.2,.width=0,point_colour=NA) + 
  geom_boxplot(width=0.2,outlier.color=NA) +
  ggdist::stat_dots(side="left",justification=1.1) 
dev.off()

What to tune first in real usage

If you want to use the figure in a paper, slide deck, or article, adjust these first:

  1. jitter to control how much the points spread
  2. width in the boxplot so the middle layer does not hide too much of the violin
  3. Violin smoothing parameters so the cloud does not become too wide or too sharp
  4. Color choices so the figure stays readable and publication-friendly

How to check whether the raincloud plot is readable

After plotting, verify at least these points:

  1. The layered components are still distinguishable
  2. Dense point regions do not become unreadable
  3. Category labels and axes are fully visible
  4. The exported image remains clear when scaled down

For formal output, save a higher-resolution version as well:

plt.savefig("raincloud.png", dpi=300, bbox_inches="tight")

Common issues

1. The layered chart looks too busy

Most common cause: too many visible points, strong colors, or a boxplot that is too wide.

Fix:

  1. Reduce jitter or point size
  2. Narrow the boxplot width
  3. Simplify the palette

2. The violin layer looks unnatural

Most common cause: smoothing is not appropriate for the sample size.

Fix:

  • Adjust the smoothing parameters
  • Be cautious with violin layers for very small groups
  • If needed, keep only the boxplot and point layers

3. The saved image is not sharp enough

Fix:

  • Increase dpi
  • Keep bbox_inches="tight"
  • Check the scaled-down version before final use

Summary

This article demonstrates how to create a raincloud plot by combining violin plots, box plots, and stripplots. This type of plot is more intuitive and visually appealing for displaying data distributions.

友情链接

其它