Skewness in R Programming
Skewness is a statistical numerical method to measure the asymmetry of the distribution or data set. It tells about the position of the majority of data values in the distribution around the mean value. It is essential to several disciplines, including data analysis, social sciences, economics and finance.
- Positive Skewness (Right-Skewed): The tail on the right side of the distribution is longer or fatter. Mean > Median.
- Negative Skewness (Left-Skewed): The tail on the left side is longer or fatter. Mean < Median.
- Zero Skewness (Symmetrical): The distribution is symmetrical, Mean = Median.
Mathematical Definition of Skewness
\text{Skewness} = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s}\right)^3
Where:
- n is the number of observations,
- xi is the i-th observation.
\bar{x} is the sample mean.,- s is the sample standard deviation.
A skewness value:
Skewness > 0 indicates positive skewness.
Skewness < 0 indicates negative skewness.
Skewness = 0 indicates no skewness (symmetrical distribution).
How to Calculate Skewness in R
R provides multiple ways to calculate skewness, including base R functions, specialized packages and custom implementations.
1. Using the e1071
Package
The e1071
package provides a straightforward function to calculate skewness.
install.packages("e1071")
library(e1071)
data <- c(2, 3, 5, 6, 8, 9, 12, 15, 18, 21)
skewness_value <- skewness(data)
print(skewness_value)
Output:
[1] 0.3880299
2. Using the moments
Package
Another popular package for calculating skewness is moments
.
install.packages("moments")
library(moments)
skewness_value <- skewness(data)
print(skewness_value)
Output:
[1] 0.454466
This method is better for small sample.
3. Base R Implementation
While base R does not have a built-in skewness function, you can calculate it manually:
n <- length(data)
mean_data <- mean(data)
sd_data <- sd(data)
skewness_value <- (n * sum((data - mean_data)^3)) / ((n - 1) * (n - 2) * sd_data^3)
print(skewness_value)
Output:
[1] 0.5389304
Types of Skewness in R
Now we will discuss 3 types of skewness values on the basis of which the asymmetry of the graph is decided. These are as follows:
Positive Skewness in R
Positive skewness refers to distributions where the tail extends towards higher values. In such cases, the mean is greater than the median
# Required for skewness() function
library(moments)
x <- c(40, 41, 42, 43, 50)
hist(x)
Output:

Negatively Skewness in R
Negative skewness refers to distributions where the tail extends towards lower values. The mean is less than the median in such cases.
library(moments)
x <- c(10, 11, 21, 22, 23, 25)
hist(x)
Output:

A histogram showing negative skewness, with a tail extending towards lower values.
Zero Skewness in R
Zero skewness indicates a symmetrical distribution, where the data is balanced around the mean and both mean and median are equal.
library(moments)
# Defining normally distributed data vector
x <- rnorm(50, 10, 10)
hist(x)
Output:

Importance of Skewness
Here are the some of the main Importance of Skewness.
- Normality Assumption: Many statistical methods, like t-tests and ANOVA, assume data is normally distributed. Skewness can indicate departures from normality.
- Impact on Mean and Median: In skewed distributions, the mean is pulled towards the tail, making the median a better measure of central tendency.
- Interpretation in Data Analysis: Understanding the skewness of your data can influence decisions in model selection, data transformation and interpretation of results.
Related Article: