Root-Mean-Square Error in R Programming
Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors. It is a useful error metric for numerical predictions, primarily to compare prediction errors of different models or configurations for the same variable, as it is scale-dependent. RMSE measures how well a regression line fits the data.
Formula for MAE
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2}
Where:
y_i = actual value\hat{y_i} = predicted valuen = number of observations
Note: The difference between the actual values and the predicted values is known as residuals.
Significance of RMSE
Here are some significance of RMSE.
- Scale-Dependent: RMSE has the same units as the target variable. A lower RMSE indicates better model performance, but the value must be compared with the scale of the target variable to make sense.
- Sensitive to Outliers: Since RMSE squares the error terms, larger errors have a disproportionately large effect, making RMSE sensitive to outliers.
- Comparing Models: RMSE can be used to compare models. A model with a lower RMSE value is generally considered better at predicting the target variable.
Computing RMSE in R
Now we will discuss different method to compute RMSE in R Programming Language.
1. Simple RMSE Calculation
Let’s first compute the RMSE between two vectors (actual and predicted values) manually.
actual = c(1.5, 1.0, 2.0, 7.4, 5.8, 6.6)
predicted = c(1.0, 1.1, 2.5, 7.3, 6.0, 6.2)
rmse <- sqrt(mean((actual - predicted)^2))
rmse
Output:
[1] 0.3464102
The above code calculates the RMSE between the actual and predicted values manually by following the RMSE formula.
2. Calculating RMSE Using the Metrics
Package
The Metrics
package offers a convenient rmse() function. First, install and load the package.
install.packages("Metrics")
library(Metrics)
rmse_value <- rmse(actual, predicted)
rmse_value
Output:
[1] 0.3464102
3. Calculating RMSE Using the caret
Package
The caret
package is a popular package for machine learning and model evaluation. It provides a similar RMSE() function.
install.packages("caret")
library(caret)
rmse_value <- RMSE(predicted, actual)
rmse_value
Output:
[1] 0.3464102
4. Calculating RMSE for Regression Models
In regression models, RMSE is used to evaluate the performance of the model. Let’s fit a linear regression model in R and compute the RMSE for the predicted values.
data(mtcars)
model <- lm(mpg ~ hp, data = mtcars)
predicted_values <- predict(model, mtcars)
actual_values <- mtcars$mpg
rmse_regression <- sqrt(mean((actual_values - predicted_values)^2))
rmse_regression
Output:
[1] 3.740297
This example fits a linear regression model predicting the miles per gallon
(mpg) of cars based on horsepower (hp
) and computes the RMSE to evaluate the model's prediction accuracy.
Interpreting RMSE involves understanding its relationship with the data.
- Low RMSE: Indicates that the model's predictions are close to the actual values.
- High RMSE: Indicates large errors in prediction.
However, the RMSE value should always be interpreted in the context of the data. For example, an RMSE of 10 might be considered good for a dataset where the target variable ranges between 100 and 500, but it could indicate poor performance if the target variable ranges between 0 and 20.
5. Visualizing RMSE
Visualizing the performance of our model can help in understanding where the model is underperforming. A scatter plot of actual vs. predicted values can provide insights into how well the model fits the data.
plot(actual_values, predicted_values, xlab = "Actual", ylab = "Predicted",
main = "Actual vs Predicted Values")
abline(0, 1, col = "red")
Output:

The closer the points are to the red line (where actual = predicted), the better the model's predictions.