Linear regression, feature scaling, and regression coefficients

fog37 · Feb 24, 2023

Hello,

In studying linear regression more deeply, I learned that scaling play an important role in multiple ways:

a) the range of the independent variables ##X## affects the values of the regression coefficients. For example, a predictor variable ##X## with a large range typically get assigned a larger regression coefficient and comparing the relative importance of the regression coefficients solely based on coefficient magnitude is misleading. The more appropriate way to compare coefficients to determine relative importance is to standardize the independent variables (standardization is a form of scaling) before building the model.

Another benefit of scaling the predictor variables (standardization, normalization or any other scaling technique) is to extract more meaning from the interpretation of the coefficients: sometimes a regression coefficient may be extremely small and that may just be due to the particular scaling of the data. It is possible to get a larger coefficient and extract more understanding about the relationship between ##Y## and ##X## by properly scaling the predictor variable.

I also read that certain statistical and ML algorithm really require scaling while other (rule-based ones) don't.

So, in essence, scaling is useful but not always required. However, in some cases, it is required as a pre-processing step...

Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly? Aside from interpretability issues, does linear regression (OLS) generate larger coefficients for variables with larger range?

Thank you for any input on this!

FactChecker · Feb 24, 2023

Whether or not to scale is primarily determined just by concerns about computational overflows and round-off errors. You should always look at the statistical significance of the coefficients (how many standard deviations they are away from zero) rather than just their magnitude. Any reasonable statistics package will have a regression algorithm that includes the information you need.

fog37 · Feb 24, 2023

True. I have an example where the coefficient is practically zero and the p-value is very very small (<0.05).
Linear scaling leads to a larger regression coefficient keeping the same p-value.

I guess my dilemma is that the certain algorithm change "require" feature scaling to perform correctly and I am wondering if linear regression is one of them...

statdad · Mar 9, 2023

"Finally my question: without any type of scaling the independent variables, does linear regression (multiple or single) perform properly, i.e. are the regression coefficient computed correctly?"
Always, as long as there are not any data entry errors. The problem here is that your question is not well-phrased: if you take any set of data, correctly entered, and apply least squares, then assuming the program carries out LS correctly the coefficients are computed correctly -- you get the answers you should get based on the inputs.
What I think you mean by "computed correctly" is this: are they the ones appropriate for the context of the problem? IMO the answer there is more subtle: note that
We never know the true form of of any model: whenever you specify the form of a linear regression model you are making an assumption that it is correct. This means that, by default, assuming no errors in data entry, recording, or in the calculations, the coefficients are computed correctly for the assumed model form.
If you're asking about scaling there are two things [at least] to think about.
First: suppose, as an extreme example, you're trying to perform linear regression with a person's age in years based on their salary, in dollars. Typically salaries will be in thousands, age will be at most 100 [and most likely under 70, since we're talking about salaries]. In order to get an equation that looks like
Age = intercept + slope Salary
work the slope will need to be very small to give values on the right down to the scale of Age.
however, if Salary is in tens of thousands of dollars the slope won't be tiny, since the recorded values for salary are already roughly on the scale of age.
In short, in linear regression scaling is most often a matter of choice.
Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis

FactChecker · Mar 9, 2023

statdad said:

Second: there are some more sophisticated methods [K-nearest neighbors for one] where the essential calculations are based on distances between values, and if one or more of the variables are on significantly greater magnitude than others those variables will have dominate the calculations: here good practice is to scale all variables to have the same magnitude and variability prior to performing the analysis

Would that be a standard step in the tool algorithm or at least an option that the user can select?

statdad · Mar 15, 2023

To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done

FactChecker · Mar 15, 2023

statdad said:

To some extent whether the scaling is done automatically or left as an on/off option for the user depends on the software. Regardless, for the types of processes I mentioned scaling should be done

I can't think of any case where scaling was bad to do, and there are certainly cases where it should be done.

statdad · Mar 15, 2023

I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.

FactChecker · Mar 15, 2023

statdad said:

I can't think of any where it would be bad, and [I believe] your comment also implies that there are situations where it isn't required.

It's not a required part of the algorithm. It is just safer in some cases to avoid numerical problems with the calculations.

Linear regression, feature scaling, and regression coefficients

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight