Mean Squared Error in Python
Mean Squared Error (MSE) is one of the most common metrics used for evaluating the performance of regression models. It measures the average of the squares of the errors—that is, the average squared difference between the predicted and actual values. MSE provides a way to quantify how much error exists in a model’s predictions, and is particularly useful in fields like machine learning, data science, and statistics.
Mean Squared Error (MSE) Formula
The formula for the mean squared error is:
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
Where:
y_i is the actual value (true value).\hat{y_i} is the predicted value (from the model).- n is the total number of data points.
Example: Given the actual and predicted values for the regression problem calculate the MSE and RMSE.
Actual Values: [15, 25, 35, 45, 55]
Predicted Values: [18, 22, 38, 42, 52]
Solution:
Index | Actual Value (Yi) | Predicted Value (Pi) | Error (Yi - pi) | Squared Error (Error^2) |
---|---|---|---|---|
1 | 15 | 18 | -3 | 9 |
2 | 25 | 22 | 3 | 9 |
3 | 35 | 38 | -3 | 9 |
4 | 45 | 42 | 3 | 9 |
5 | 55 | 52 | 3 | 9 |
Now, we sum up the squared errors:
Using the formula for MSE:
To calculate RMSE, take the square root of MSE:
Example: Consider the given data points: (1,1), (2,1), (3,2), (4,2), (5,4).
Regression line equation: Y = 0.7X - 0.1
X | Y | |
---|---|---|
1 | 1 | 0.6 |
2 | 1 | 1.29 |
3 | 2 | 1.99 |
4 | 2 | 2.69 |
5 | 4 | 3.4 |
1. Calculating MSE using Scikit-Learn
Scikit-learn, a popular machine learning library, provides a built-in function to calculate MSE, which simplifies the process.
from sklearn.metrics import mean_squared_error
# Given values
Y_true = [1,1,2,2,4] # Y_true = Y (original values)
# calculated values
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y'
# Calculation of Mean Squared Error (MSE)
mean_squared_error(Y_true,Y_pred)
Output: 0.21606
Explanation: This code calculates the Mean Squared Error (MSE) using Scikit-learn's mean_squared_error function. It takes the true values (Y_true) and predicted values (Y_pred) as inputs, then computes the squared differences between them, averages them, and returns the MSE. This function simplifies the process of calculating MSE compared to manually implementing the steps.
2. Calculating MSE using NumPy
Let’s start by manually computing the Mean Squared Error using NumPy.
import numpy as np
# Given values
Y_true = [1,1,2,2,4] # Y_true = Y (original values)
# Calculated values
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y'
# Mean Squared Error
MSE = np.square(np.subtract(Y_true,Y_pred)).mean()
Output: 0.21606
Explanation: This code calculates the Mean Squared Error (MSE) by first finding the difference between the true values (Y_true) and the predicted values (Y_pred). It then squares each of these differences to eliminate negative values and emphasize larger errors. Finally, the code computes the mean of these squared differences to obtain the MSE, which quantifies the average squared difference between the actual and predicted values.