Visualizing Multiple Datasets on the Same Scatter Plot

Last Updated : 13 Jul, 2024

Seaborn is a powerful Python visualization library built on top of Matplotlib, designed for making statistical graphics easier and more attractive. One common requirement in data visualization is to compare two datasets on the same scatter plot to identify patterns, correlations, or differences. This article will guide you through the process of plotting two datasets on the same scatter plot using Seaborn.

Table of Content

Customizing the Plot For Two Data Sets

Handling Larger Multiple Datasets

Concatenating Data Sets with Scatter Plot

The first step in plotting two data sets on the same scatter plot is to concatenate them into a single DataFrame. This can be achieved using the pd.concat function from Pandas. By concatenating the data sets, we can preserve the information about which row belongs to which dataset by adding a new column indicating the dataset origin. Example:

import pandas as pd

# Concatenate data sets
combined_data = pd.concat([set1, set2], keys=['set1', 'set2'])

Understanding the Importance of Overlaying Scatterplots

Overlaying scatterplots is valuable for several reasons:

Comparison: Visualizing two datasets simultaneously allows for a direct comparison of their distributions, identifying potential clusters, outliers, or trends that might be unique to each dataset.
Correlation Analysis: By examining how the points from each dataset are distributed relative to each other, you can gain insights into potential correlations or lack thereof.
Interactive Exploration: With interactive plots, you can hover over individual points to reveal details about both datasets, facilitating a deeper understanding of the underlying data.

Step-by-Step Guide for Plotting Dual Datasets

Setting Up the Environment: Importing Libraries and Creating Sample Data

First, let's import the necessary libraries and create some sample data:

Python

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Creating sample data
data1 = pd.DataFrame({
    'x': range(10),
    'y': [2, 3, 5, 7, 11, 13, 17, 19, 23, 29],
    'label': 'Dataset 1'
})

data2 = pd.DataFrame({
    'x': range(10),
    'y': [1, 4, 9, 16, 25, 36, 49, 64, 81, 100],
    'label': 'Dataset 2'
})

Step 1: Combining the Datasets Using pd.concat()

To plot both datasets on the same scatter plot, you can combine them into a single DataFrame. This makes it easier to handle them with Seaborn, which works well with tidy data:

Python

combined_data = pd.concat([data1, data2])

Step 2: Plotting the Scatter Plot

Now that we have the combined data, we can use Seaborn to plot it. The scatterplot function will be used to create the scatter plot. By using the hue parameter, we can differentiate between the two datasets:

Python

# Creating the scatter plot
sns.scatterplot(data=combined_data, x='x', y='y', hue='label')

# Adding labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot of Two Datasets')
plt.legend(title='Dataset')

# Display the plot
plt.show()

Output:

Screenshot-2024-07-05-003629 — Plotting the Scatter Plot

Customizing the Plot For Two Data Sets

Seaborn offers various customization options to enhance the visualization. You can customize markers, colors, and add additional elements like regression lines or error bars.

1. Customizing Markers and Colors

Python

# Customizing markers and colors
sns.scatterplot(data=combined_data, x='x', y='y', hue='label', style='label', palette=['blue', 'red'], markers=['o', 's'])

# Adding labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Customized Scatter Plot of Two Datasets')
plt.legend(title='Dataset')

# Display the plot
plt.show()

Output:

Screenshot-2024-07-05-003751 — Customizing Markers and Colors

2. Adding a Regression Line

You can use sns.lmplot to add a regression line to the scatter plot. This function allows you to fit and visualize a linear model for each dataset:

Python

# Adding regression lines
sns.lmplot(data=combined_data, x='x', y='y', hue='label', markers=['o', 's'], palette=['blue', 'red'], height=6, aspect=1.5)

# Adding labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot with Regression Lines')

# Display the plot
plt.show()

Output:

Screenshot-2024-07-05-003920 — Adding a Regression Line

Handling Larger Multiple Datasets

When dealing with larger datasets, scatter plots can become cluttered and difficult to interpret. Seaborn provides several options for handling larger datasets, such as using transparency and aggregating data points.

Python

# Sample larger dataset
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Generate random data
np.random.seed(42)
large_data1 = pd.DataFrame({
    'x': np.random.rand(100),
    'y': np.random.rand(100),
    'dataset': ['set1'] * 100
})

large_data2 = pd.DataFrame({
    'x': np.random.rand(100),
    'y': np.random.rand(100),
    'dataset': ['set2'] * 100
})

# Combine the larger datasets
large_combined_data = pd.concat([large_data1, large_data2])

# Create the scatter plot with transparency
sns.scatterplot(data=large_combined_data, x='x', y='y', hue='dataset', alpha=0.5)
plt.show()

Output:

download---2024-07-13T025842182 — Handling Larger Multiple Datasets

Conclusion

Plotting two datasets on the same scatter plot using Seaborn is a straightforward process that involves combining the datasets into a single DataFrame and leveraging Seaborn's powerful plotting functions. By customizing markers, colors, and adding elements like regression lines, you can create informative and attractive visualizations that clearly convey the relationships between your datasets.

Visualizing Multiple Datasets on the Same Scatter Plot

jyotijb23

Improve

Article Tags :

Visualizing Multiple Datasets on the Same Scatter Plot

Concatenating Data Sets with Scatter Plot

Understanding the Importance of Overlaying Scatterplots

Step-by-Step Guide for Plotting Dual Datasets

Step 1: Combining the Datasets Using pd.concat()

Step 2: Plotting the Scatter Plot

Customizing the Plot For Two Data Sets

1. Customizing Markers and Colors

2. Adding a Regression Line

Handling Larger Multiple Datasets

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?