Box plot in R using ggplot2
A box plot is a graphical display of a data set which indicates its distribution and highlights potential outliers It displays the range of the data, the median, and the quartiles, making it easy to observe the spread and skewness of the data.
In ggplot2, the geom_boxplot() function is used to create a box plot. This function provides several customization options like the look and information content of the plot.
Syntax:
geom_boxplot( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL, outlier.shape = 19, outlier.size = 1.5, notch = FALSE,na.rm = FALSE, show.legend = FALSE, inherit.aes = FALSE)
The parameters of geom_boxplot() support extensive customizations such as controlling the display of outliers, displaying the notch, and display of legends.
Example: Creating a Basic Box Plot
Let us first create a regular boxplot, for that, we first have to import all the required libraries and datasets in use. Then simply put all the attributes to plot by in ggplot() function along with geom_boxplot.
The dataset in use: Crop_recommendation
library(ggplot2)
# loading data set and storing it in ds variable
ds <- read.csv("c://crop//archive//Crop_recommendation.csv", header = TRUE)
crop=ggplot(data=ds, mapping=aes(x=label, y=temperature))+geom_boxplot()
crop
Output:

Adding mean value to the boxplot
To add the mean value on the box plot, you can make use of the stat_summary() function. It enables you to add summary statistics such as the mean, which will be included directly in the plot.
Syntax:
stat_summary( fun, geom)
library(ggplot2)
ds <- read.csv("c://crop//archive//Crop_recommendation.csv", header = TRUE)
# add mean to ggplot2 boxplot
ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot() +
stat_summary(fun = "mean", geom = "point", shape = 8,
size = 2, color = "white")
Output:

Change legend position of Boxplot
The position of the legend on the plot is easy to customize with the use of the theme() function. For instance, you can include the legend on top, at the bottom, or suppress it altogether.
library(ggplot2)
ds <- read.csv("c://crop//archive//Crop_recommendation.csv", header = TRUE)
# change the legend position in R using ggplot2
ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot() +
theme(legend.position = "top")
Output:

Explanation: This will put the legend in the top of the plot. The theme() function offers further customizations of plot titles, axes, and background.
Creating a Horizontal Box Plot
Boxplots can also be placed horizontally using coord_flip() function. This function just switches the x and y-axis.
library(ggplot2)
ds <- read.csv("c://crop//archive//Crop_recommendation.csv", header = TRUE)
# Creating a Horizontal Boxplot using ggplot2 in R
ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot() +
coord_flip()
Output:

Changing Box Plot Line Colors
1. Default Line Colors by Groups: You can reverse the outline color of the boxes according to a grouping variable. This can be achieved by mapping the color aesthetic onto a variable.
# Change box plot line colors by groups
crop2<-ggplot(ds, aes(x=label, y=temperature, color=label)) +
geom_boxplot()
crop2
Output:

2. Custom Line Colors: To have greater control over the box outline colors, you may use the scale_color_manual() function to specify certain colors for each group.
# Change box plot line colors by groups
ggplot(ds, aes(x = label, y = temperature, color = label)) +
geom_boxplot() +
scale_color_manual(values = c("#999999", "#E69F00", "#56B4E9", "Red", "Green"))
Output:

3. Using brewer color palettes: You can change the outline color of the boxplot with brewer color palettes. For doing so you just need to use the scale_color_brewer() function and set the palette argument within this function.
# Change box plot line colors by groups
ggplot(ds, aes(x = label, y = temperature, color = label)) +
geom_boxplot() +
scale_color_brewer(palette = "Dark2")
Output:

Fill the boxplot with color
1. Default Filling
To fill the boxes with color, you can use the fill attribute inside the geom_boxplot() function.
# Now fill the boxplot with choice of your color
ggplot(data = ds, aes(x = label, y = temperature)) +
geom_boxplot(fill = 'green')
Output:

2. Fill by Group
If you want to fill the boxes with different colors based on the label variable, you can map the fill aesthetic to this variable.
ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot(outlier.colour = "black", outlier.shape = 16, outlier.size = 2)
Output:

3. Custom Fill Colors
To manually specify colors for the fills, use scale_fill_manual().
ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot(outlier.colour = "black", outlier.shape = 16, outlier.size = 2) +
scale_fill_manual(values = c("#999999", "#E69F00", "#56B4E9", "Red", "Green"))
Output:

4. Using Brewer Color Palettes for Filling
Similar to the outline color, you can use scale_fill_brewer() to apply a color palette to the fill.
ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot(outlier.colour = "black", outlier.shape = 16, outlier.size = 2) +
scale_fill_brewer(palette = "Dark1")
Output:

5. Using grayscale:
To fill color of boxplots with grayscale use scale_fill_grey() with theme_classic().
crop3<-ggplot(ds, aes(x = label, y = temperature, fill = label)) +
geom_boxplot(outlier.colour="black", outlier.shape=16, outlier.size=2)
crop3 + scale_fill_grey() + theme_classic()
Output:

Adding Jitters in Boxplots
Jitters assist in minimizing over plotting when data points coincide. you can control the location of jittered points using the position_jitter() function.
ggplot(ds, aes(x = label, y = temperature)) +
geom_boxplot() +
geom_jitter(position = position_jitter(0.2))
Output:

Notched Box Plot
A notched box plot gives the added information of emphasizing the confidence interval of the median. To plot a notched box plot, use the notch parameter as TRUE.
ggplot(ds, aes(x = label, y = temperature)) +
geom_boxplot(notch = TRUE) +
geom_jitter(position = position_jitter(0.2))
Output:
