Linear regression, feature scaling, and regression coefficients

Click For Summary

Discussion Overview

The discussion revolves around the role of feature scaling in linear regression, exploring its impact on regression coefficients, interpretability, and the necessity of scaling in various algorithms. Participants examine the implications of scaling on the performance and accuracy of linear regression models, as well as the broader context of statistical significance and computational concerns.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that scaling affects the values of regression coefficients, suggesting that larger ranges in predictor variables lead to larger coefficients, which can mislead interpretations if not standardized.
  • Others argue that the statistical significance of coefficients should be prioritized over their magnitude, emphasizing the importance of p-values in assessing the relevance of predictors.
  • A participant shares an example where a coefficient is near zero but has a small p-value, indicating that scaling can lead to larger coefficients while maintaining statistical significance.
  • One participant asserts that linear regression coefficients are computed correctly as long as there are no data entry errors, but questions the appropriateness of the coefficients in context.
  • Concerns are raised about the necessity of scaling in certain algorithms, with some participants noting that methods like K-nearest neighbors require scaling to avoid dominance of variables with larger magnitudes.
  • There is discussion about whether scaling should be an automatic process in software or an option for users, with some participants advocating for its inclusion as a standard practice.
  • Multiple participants express that scaling is generally beneficial and can prevent numerical issues, though it is not universally required.

Areas of Agreement / Disagreement

Participants express a mix of views on the necessity and impact of scaling in linear regression. While there is some consensus on the benefits of scaling, particularly in avoiding numerical problems, there remains disagreement on whether it is always required and how it affects coefficient interpretation.

Contextual Notes

Participants highlight limitations regarding the assumptions made in linear regression models, the dependence on the context of the data, and the potential for numerical issues without scaling. The discussion does not resolve these complexities.

fog37
Messages
1,566
Reaction score
108
TL;DR
Linear regression, feature scaling, and regression coefficients
Hello,

In studying linear regression more deeply, I learned that scaling play an important role in multiple ways:

a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned a larger regression coefficient and comparing the relative importance of the regression coefficients solely based on coefficient magnitude is misleading. The more appropriate way to compare coefficients to determine relative importance is to standardize the independent variables (standardization is a form of scaling) before building the model.

Another benefit of scaling the predictor variables (standardization, normalization or any other scaling technique) is to extract more meaning from the interpretation of the coefficients: sometimes a regression coefficient may be extremely small and that may just be due to the particular scaling of the data. It is possible to get a larger coefficient and extract more understanding about the relationship between ##Y## and ##X## by properly scaling the predictor variable.

I also read that certain statistical and ML algorithm really require scaling while other (rule-based ones) don't.

So, in essence, scaling is useful but not always required. However, in some cases, it is required as a pre-processing step...

Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly? Aside from interpretability issues, does linear regression (OLS) generate larger coefficients for variables with larger range?

Thank you for any input on this!
 
Physics news on Phys.org
Whether or not to scale is primarily determined just by concerns about computational overflows and round-off errors. You should always look at the statistical significance of the coefficients (how many standard deviations they are away from zero) rather than just their magnitude. Any reasonable statistics package will have a regression algorithm that includes the information you need.
 
  • Like
Likes   Reactions: fog37
True. I have an example where the coefficient is practically zero and the p-value is very very small (<0.05).
Linear scaling leads to a larger regression coefficient keeping the same p-value.

I guess my dilemma is that the certain algorithm change "require" feature scaling to perform correctly and I am wondering if linear regression is one of them...
 
"Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly?"
Always, as long as there are not any data entry errors. The problem here is that your question is not well-phrased: if you take any set of data, correctly entered, and apply least squares, then assuming the program carries out LS correctly the coefficients are computed correctly -- you get the answers you should get based on the inputs.
What I think you mean by "computed correctly" is this: are they the ones appropriate for the context of the problem? IMO the answer there is more subtle: note that
We never know the true form of of any model: whenever you specify the form of a linear regression model you are making an assumption that it is correct. This means that, by default, assuming no errors in data entry, recording, or in the calculations, the coefficients are computed correctly for the assumed model form.
If you're asking about scaling there are two things [at least] to think about.
First: suppose, as an extreme example, you're trying to perform linear regression with a person's age in years based on their salary, in dollars. Typically salaries will be in thousands, age will be at most 100 [and most likely under 70, since we're talking about salaries]. In order to get an equation that looks like
Age = intercept + slope Salary
work the slope will need to be very small to give values on the right down to the scale of Age.
however, if Salary is in tens of thousands of dollars the slope won't be tiny, since the recorded values for salary are already roughly on the scale of age.
In short, in linear regression scaling is most often a matter of choice.
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis
 
  • Like
Likes   Reactions: fog37 and FactChecker
statdad said:
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis
Would that be a standard step in the tool algorithm or at least an option that the user can select?
 
To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done
 
  • Like
Likes   Reactions: FactChecker
statdad said:
To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done
I can't think of any case where scaling was bad to do, and there are certainly cases where it should be done.
 
I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.
 
statdad said:
I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.
It's not a required part of the algorithm. It is just safer in some cases to avoid numerical problems with the calculations.
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K