Linear regression, feature scaling, and regression coefficients

In summary: There is no one "correct" way to do this. You can choose to scale all variables before running the analysis, or you can let the algorithm choose which variables to scale.
  • #1
1,539
107
TL;DR Summary
Linear regression, feature scaling, and regression coefficients
Hello,

In studying linear regression more deeply, I learned that scaling play an important role in multiple ways:

a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned a larger regression coefficient and comparing the relative importance of the regression coefficients solely based on coefficient magnitude is misleading. The more appropriate way to compare coefficients to determine relative importance is to standardize the independent variables (standardization is a form of scaling) before building the model.

Another benefit of scaling the predictor variables (standardization, normalization or any other scaling technique) is to extract more meaning from the interpretation of the coefficients: sometimes a regression coefficient may be extremely small and that may just be due to the particular scaling of the data. It is possible to get a larger coefficient and extract more understanding about the relationship between ##Y## and ##X## by properly scaling the predictor variable.

I also read that certain statistical and ML algorithm really require scaling while other (rule-based ones) don't.

So, in essence, scaling is useful but not always required. However, in some cases, it is required as a pre-processing step...

Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly? Aside from interpretability issues, does linear regression (OLS) generate larger coefficients for variables with larger range?

Thank you for any input on this!
 
Physics news on Phys.org
  • #2
Whether or not to scale is primarily determined just by concerns about computational overflows and round-off errors. You should always look at the statistical significance of the coefficients (how many standard deviations they are away from zero) rather than just their magnitude. Any reasonable statistics package will have a regression algorithm that includes the information you need.
 
  • Like
Likes fog37
  • #3
True. I have an example where the coefficient is practically zero and the p-value is very very small (<0.05).
Linear scaling leads to a larger regression coefficient keeping the same p-value.

I guess my dilemma is that the certain algorithm change "require" feature scaling to perform correctly and I am wondering if linear regression is one of them...
 
  • #4
"Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly?"
Always, as long as there are not any data entry errors. The problem here is that your question is not well-phrased: if you take any set of data, correctly entered, and apply least squares, then assuming the program carries out LS correctly the coefficients are computed correctly -- you get the answers you should get based on the inputs.
What I think you mean by "computed correctly" is this: are they the ones appropriate for the context of the problem? IMO the answer there is more subtle: note that
We never know the true form of of any model: whenever you specify the form of a linear regression model you are making an assumption that it is correct. This means that, by default, assuming no errors in data entry, recording, or in the calculations, the coefficients are computed correctly for the assumed model form.
If you're asking about scaling there are two things [at least] to think about.
First: suppose, as an extreme example, you're trying to perform linear regression with a person's age in years based on their salary, in dollars. Typically salaries will be in thousands, age will be at most 100 [and most likely under 70, since we're talking about salaries]. In order to get an equation that looks like
Age = intercept + slope Salary
work the slope will need to be very small to give values on the right down to the scale of Age.
however, if Salary is in tens of thousands of dollars the slope won't be tiny, since the recorded values for salary are already roughly on the scale of age.
In short, in linear regression scaling is most often a matter of choice.
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis
 
  • Like
Likes fog37 and FactChecker
  • #5
statdad said:
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis
Would that be a standard step in the tool algorithm or at least an option that the user can select?
 
  • #6
To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done
 
  • Like
Likes FactChecker
  • #7
statdad said:
To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done
I can't think of any case where scaling was bad to do, and there are certainly cases where it should be done.
 
  • #8
I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.
 
  • #9
statdad said:
I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.
It's not a required part of the algorithm. It is just safer in some cases to avoid numerical problems with the calculations.
 

Suggested for: Linear regression, feature scaling, and regression coefficients

Replies
4
Views
664
Replies
6
Views
1K
Replies
30
Views
2K
Replies
6
Views
595
Replies
23
Views
2K
Back
Top