Evaluating Regression Models: Improving your model’s efficiency

Spread the love

In this article, I will go over various evaluation metrics available for a regression model. I will also go over the advantages and disadvantages of all the various metrics. Please note, this article isn’t about the in-depth mathematics behind these metrics, instead, it will focus more on the application side of these metrics.

The evaluation metrics which we will cover are:

R Squared / Adjusted R Squared
Mean Squared Error (MSE) / Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)

Preparing our data

Before we jump into the metrics, let’s quickly import our data, do a little cleanup and fit the data into a linear regression model. We will then use that model’s performance and evaluate using the mentioned metrics.

# Importing Libraries
from sklearn.datasets import fetch_california_housing

# Importing Data
X, y = fetch_california_housing(as_frame=True, return_X_y=True)

# Splitting the data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)

# Pre-processing the data
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline

# Creating pre-process pipeline
preprocessing = Pipeline(steps=[
    ('impute', SimpleImputer()),
    ('scale', StandardScaler()),
])

X_train_processed = preprocessing.fit_transform(X_train)

# Simple Linear Regression
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(X_train_processed, y_train)

# Predictions
y_pred = lin_reg.predict(X_train_processed)

The above code:

Fetches the data
Splits the data into training and test set
Pre-processes the data
Fits the data into a linear regression model
Predicts the result using the linear regression model

Please Note: This article focuses only on the evaluation metrics and not on the above topics. At the time of writing this article, I do have an article about splitting the data, a link to that article can be found here.

Evaluating the model

R Square/Adjusted R Square

R Square measures how much of variability in dependent variable can be explained by the model It’s the square of the correlation coefficient
- It’s a good measure to determine how well the model fits the dependent variables
- It doesn’t take into consideration of overfitting problem
- Best possible score is 1
Adjusted R Square penalises additional independent variables added to the model and adjust the metric to prevent the overfitting issue

from sklearn.metrics import r2_score

predicted_r2_score = r2_score(y_train, y_pred)
print(f'R2 score of predicted values: {predicted_r2_score}')

Output:

R2 score of predicted values: 0.6125511913966952

As we can see 61% of the dependent variability can be explained by the model

Mean Squared Error (MSE) / Root Mean Squared Error (RMSE)

Mean Squared Error
- It is the sum of square of prediction error (which is real output minus the predicted output) divided by the number of data points
- It gives an absolute number on how much the predicted result deviate from the actual value
- It doesn’t provide much insights but is a good metric to compare different models
- It gives larger penalisation to big prediction error
Root Mean Squared Error
- It’s the root of MSE
- More commonly used than MSE

from sklearn.metrics import mean_squared_error

predicted_mse = mean_squared_error(y_train, y_pred)
print(f'Predicted MSE: {predicted_mse}')

predicted_rmse = np.sqrt(predicted_mse)
print(f'Predicted RMSE: {predicted_rmse}')

Output:

Predicted MSE: 0.5179331255246699
Predicted RMSE: 0.7196757085831575

Mean Absolute Error (MAE)

It is similar to MSE. The only difference is that instead of taking the sum of square of error (like in MSE), it takes the sum of absolute value of error
Compared to MSE or RMSE, it is more direct representation of sum of error terms
It treats all the errors the same

from sklearn.metrics import mean_absolute_error

predicted_mae = mean_absolute_error(y_train, y_pred)
print(f'Predicted MAE: {predicted_mae}')

Output:

Predicted MAE: 0.5286283596581934

Conclusion

There are other evaluating metrics also like explained variance, max error, root mean squared log error, and so on. I have only discussed the above since these are the most common ones and are widely used.

Spread the love

Evaluating Regression Models: Improving your model’s efficiency

Preparing our data

Evaluating the model

R Square/Adjusted R Square

Mean Squared Error (MSE) / Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

Conclusion

Preet Parmar

Leave a Reply Cancel reply

SQL Simplified: Let’s go back to the basics

Handling customers with same name

“useState”: The most commonly used hook in React

Python DefaultDict – Ability to assign default values to your keys

How and when to use “decorators” in Python

Joins in PySpark: Let’s understand how to join multiple data using PySpark

Feature Importance: a special use case of Random Forest Classifier

SQLite: Integrating Python and SQL

NamedTuple: A pythonic way for writing your code

Live Weather using BeautifulSoup

Preparing our data

Evaluating the model

R Square/Adjusted R Square

Mean Squared Error (MSE) / Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

Conclusion

Preet Parmar

You might also like

Leave a Reply Cancel reply