Plotting a column-wise bee-swarm plot in Python
Bee-swarm plots are a great way to visualize distributions, especially when you're dealing with multiple categories or columnar data. They allow you to see the distribution of points in a dataset while avoiding overlap, which gives them a more detailed and granular view than box plots or histograms. In this article, we’ll explore how to create a column-wise bee-swarm plot in Python.
Table of Content
What is a Bee-Swarm Plot?
A bee-swarm plot is a type of scatter plot where data points are plotted along a single axis, but are adjusted to avoid overlapping. This results in a "swarm" of points that provide insights into data distribution across different categories.
Why Use Bee-Swarm Plots?
Bee-swarm plots are particularly useful for:
- Visualizing data density: They show how many data points exist in different regions of the dataset.
- Spotting outliers: The spread of points makes it easy to identify any anomalies in your data.
- Comparing categories: Bee-swarm plots are great when comparing distributions across different groups or categories.
Creating Column-Wise Bee-Swarm Plot
Before we dive into creating a column-wise bee-swarm plot, we need to set up the environment and install the required libraries. For bee-swarm plots, we will use Seaborn, a powerful library built on top of Matplotlib, which simplifies statistical data visualization.
Make sure you have Python installed on your system. To install the required libraries, you can use the following commands:
pip install seaborn matplotlib pandas
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
To create a bee-swarm plot, we need a dataset. Seaborn comes with several built-in datasets that are perfect for plotting. For this article, we will use the Iris dataset. This dataset contains information about different species of Iris flowers, including measurements like petal length and sepal width.
# Load the Iris dataset
df = sns.load_dataset('iris')
print(df.head())
Seaborn provides a swarmplot function to create bee-swarm plots. To make the plot more informative, let’s extend it to multiple columns. We will plot the distributions of sepal_length, sepal_width, petal_length, and petal_width side by side.
# Create a figure with subplots for each column
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Create bee-swarm plots for each feature
sns.swarmplot(ax=axes[0, 0], x='species', y='sepal_length', data=df)
axes[0, 0].set_title('Sepal Length by Species')
sns.swarmplot(ax=axes[0, 1], x='species', y='sepal_width', data=df)
axes[0, 1].set_title('Sepal Width by Species')
sns.swarmplot(ax=axes[1, 0], x='species', y='petal_length', data=df)
axes[1, 0].set_title('Petal Length by Species')
sns.swarmplot(ax=axes[1, 1], x='species', y='petal_width', data=df)
axes[1, 1].set_title('Petal Width by Species')
plt.tight_layout()
plt.show()
Output:

In the plot, we’ve created a 2x2 grid of subplots, each containing a bee-swarm plot for one of the columns (sepal_length, sepal_width, petal_length, and petal_width). This allows us to easily compare the distributions across multiple columns and species.
Customizing the Bee-Swarm Plot
Seaborn and Matplotlib offer many customization options. Let’s look at some of the most commonly used customizations.
1. Adding Color Palette
You can customize the color of each bee-swarm plot using Seaborn's color palettes.
# Customizing with Color Palette
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Sepal Length
sns.swarmplot(ax=axes[0, 0], x='species', y='sepal_length', data=df, palette='Set1')
axes[0, 0].set_title('Sepal Length by Species')
# Sepal Width
sns.swarmplot(ax=axes[0, 1], x='species', y='sepal_width', data=df, palette='Set2')
axes[0, 1].set_title('Sepal Width by Species')
# Petal Length
sns.swarmplot(ax=axes[1, 0], x='species', y='petal_length', data=df, palette='Set3')
axes[1, 0].set_title('Petal Length by Species')
# Petal Width
sns.swarmplot(ax=axes[1, 1], x='species', y='petal_width', data=df, palette='Dark2')
axes[1, 1].set_title('Petal Width by Species')
plt.tight_layout()
plt.show()
Output:

2. Adjusting Point Size
You can change the size of the data points to make the plot more readable.
# Customizing Point Size
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Sepal Length
sns.swarmplot(ax=axes[0, 0], x='species', y='sepal_length', data=df, size=10)
axes[0, 0].set_title('Sepal Length by Species')
# Sepal Width
sns.swarmplot(ax=axes[0, 1], x='species', y='sepal_width', data=df, size=8)
axes[0, 1].set_title('Sepal Width by Species')
# Petal Length
sns.swarmplot(ax=axes[1, 0], x='species', y='petal_length', data=df, size=6)
axes[1, 0].set_title('Petal Length by Species')
# Petal Width
sns.swarmplot(ax=axes[1, 1], x='species', y='petal_width', data=df, size=4)
axes[1, 1].set_title('Petal Width by Species')
plt.tight_layout()
plt.show()
Output:

3. Setting Marker Styles
You can change the marker style of the points.
# Customizing Marker Style
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Sepal Length
sns.swarmplot(ax=axes[0, 0], x='species', y='sepal_length', data=df, marker='o')
axes[0, 0].set_title('Sepal Length by Species')
# Sepal Width
sns.swarmplot(ax=axes[0, 1], x='species', y='sepal_width', data=df, marker='s')
axes[0, 1].set_title('Sepal Width by Species')
# Petal Length
sns.swarmplot(ax=axes[1, 0], x='species', y='petal_length', data=df, marker='D')
axes[1, 0].set_title('Petal Length by Species')
# Petal Width
sns.swarmplot(ax=axes[1, 1], x='species', y='petal_width', data=df, marker='^')
axes[1, 1].set_title('Petal Width by Species')
plt.tight_layout()
plt.show()
Output:

4. Adjusting Alpha (Transparency)
You can adjust the transparency of the points to avoid overlapping and make the plot clearer.
# Customizing Transparency with Alpha
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Sepal Length
sns.swarmplot(ax=axes[0, 0], x='species', y='sepal_length', data=df, alpha=0.9)
axes[0, 0].set_title('Sepal Length by Species')
# Sepal Width
sns.swarmplot(ax=axes[0, 1], x='species', y='sepal_width', data=df, alpha=0.7)
axes[0, 1].set_title('Sepal Width by Species')
# Petal Length
sns.swarmplot(ax=axes[1, 0], x='species', y='petal_length', data=df, alpha=0.6)
axes[1, 0].set_title('Petal Length by Species')
# Petal Width
sns.swarmplot(ax=axes[1, 1], x='species', y='petal_width', data=df, alpha=0.8)
axes[1, 1].set_title('Petal Width by Species')
plt.tight_layout()
plt.show()
Output:

Best Practices for Bee-Swarm Plots
When using bee-swarm plots in your analysis, keep the following best practices in mind:
- Use for small to medium datasets: Bee-swarm plots are ideal for datasets with a manageable number of points. Large datasets may require different approaches like density plots.
- Color wisely: Coloring by category adds another layer of insight, but too many colors can overwhelm the reader.
- Overlay with other plots: Combining bee-swarm plots with box plots or violin plots can give a fuller picture of the data distribution.
Conclusion
Bee-swarm plots are a versatile and informative way to visualize distributions across categories in a dataset. In this article, we demonstrated how to create both simple and column-wise bee-swarm plots in Python using Seaborn. We also covered various customization techniques, including color schemes, point sizes, and overlays with other plot types. Whether you're working with small or medium datasets, bee-swarm plots provide a visually compelling way to understand your data.