Analyzing a Linear Regression Result

4 min readNov 30, 2020

In statistics, a common tool is to build a linear regression model. As a data scientist, it is easy to get caught up in the technical improvements of a model: improving the R-squared, reducing the RMSEs, and removing features with high p-values. However, it is important not to lose sight of the context of your analysis: what do the regression results actually mean? Here we will go through several steps of regression interpretation so that we can understand the results we produce and apply them to a business problem at hand!

Analysis

To begin, let’s look at the snapshot below of a regression result. We will use the following example to explain how to interpret OLS results:

Interpreting the R-squared

A data scientist’s goal is to improve the regression model so that the R-squared value is as close to 1 as possible. In technical terms, the R-squared measures the amount of variance that is explained by the model. So, in the case above, an R-squared of 0.699 means that our model predicts our data with 69.9% accuracy.

2. Interpreting the Intercept

The intercept, also known as the “constant”, tells you the value of Y when all variables of X=0. In the case above, the Y value represents price. So the intercept tells us — if all X values are zero, what is the price of a home? Well, in this scenario, this intercept has no meaning. If the X value of “sqft_living” is 0, it means that a house has 0 square feet of living space. There is no home that will follow these dimensions. For a home to exist, the X value of sqft_living will never be zero, and thus our intercept has no intrinsic meaning.

3. Interpret Linear Coefficients

Linear coefficients are the most straight forward to interpret. We will look at the variable “view” as an example. The coefficient here is 2.425e+04 (or 24,260). Since both view and price are linear coefficients, a one unit increase in view would increase price by $24,260. Pretty straightforward!

4. Log Coefficients

Log coefficients are slightly more complex to analyze. For example, the variable sqft_living was log transformed. As a result, this has to be interpreted differently: a one percent increase in sqft_living will increase price by the value of the coefficient, divided by 100. The coefficient is 7.924e+04 (or more easily read as 79,240), so a 1% increase in sqft_living will increase price by 79,240/100 = $792.

5. Interpret Dummies

Interpreting dummies requires a different approach. Recall that, when a continuous variable is transformed to a dummy variable, one dummy variable must be dropped in order to avoid the dummy variable problem of collinearity. The dataset used to build the regression above was from King County, and each dummy represents a different city. In this model, “Seattle” was dropped to avoid the dummy variable trap. So, the coefficients on the remaining dummies, and thus the remaining cities, are all in relation to the impact Seattle has on price.

For example, the coefficient on City_Bellevue (the area where Bill Gates lives!) is 4.732e+04, or $47,320. So, if a home is located in Bellevue, the impact on price, compared to a home in Seattle, would be an increase of $47,320.

6. Interpret Interactions

Lastly, we will learn how to interpret interaction terms. There are two interaction terms in this regression example: sqft_living*floors and sqft_living*bathrooms. These interactions show us that the impact of sqft_living on price differs for different values of floors and for different values of bathrooms.

If we look at sqft_living*floors, the coefficient is 7.38e+04 or $73,800. This means that, as floors increase by one unit, the impact of sqft_living on price will increase by $73,800.

If we look at sqft_living*bathrooms, the coefficient is 1.476e+04 or $14,760. This means that, as bathrooms increase by one unit, the impact of sqft_living on price will increase by $14,760.

Conclusion

There are various types of coefficients that can be included in a regression model and it is important that careful attention is drawn towards analyzing them correctly. Whether an X-variable is linear, log, interaction or dummied will impact how its coefficient relates to an impact on the Y-value.

This post should have given you a solid foundation on regression analysis, but there are many more types of X-Y relationships to explore. For more detail on how to interpret further relationships, take a look at the following link.

Good luck on your future regression models — I hope this helped you on your road to becoming a Data Scientist!

Analyzing a Linear Regression Result

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Adina Steinman

No responses yet

More from Adina Steinman

Navigating my first API: the TMDb Database

The TMDb Database is a powerful, easy-to-use API. New to the world of APIs? Read below for some tips on how to get started!

Building A Collaborative-Filtering Recommendation System in Surprise

Many companies use collaborative filtering recommendation systems. For example, Dick’s Sporting Goods’ message in the image above makes…

Analyzing a Linear Regression Result

In statistics, a common tool is to build a linear regression model. As a data scientist, it is easy to get caught up in the technical…

Building a Tableau Dashboard

So far, I have covered various topics involving data collection, data mining, data processing and machine learning. While all of these…

Recommended from Medium

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Fashion Recommendation System using Image Features and Python

A Fashion Recommendation System using Image Features leverages computer vision and Machine Learning techniques to analyze fashion items’…

Predicting Bitcoin Prices with Machine Learning: A Neural Network Approach

In the volatile world of cryptocurrency, accurately predicting price movements can be the difference between significant gains and…

Sentiment Analysis of Online Reviews with Different Lexicons using R

This is the third article in a series that explores the topic of sentiment analysis using R. Sentiment analysis is a powerful technique…

10 Must-Know Machine Learning Algorithms for Data Scientists

Machine learning is the science of getting computers to act without being explicitly programmed.” — Andrew Ng

Building an Autonomous Twitter Account with LLMs

I created my own Twitter bot using Hacker News posts, the GPT-4 API, and scheduled CRON jobs. Check out its tweets here.