I Scaling and Standardization in Statistical Analysis

AI Thread Summary
Standardizing and scaling input variables in statistical analysis, particularly for multivariate linear regression, is often beneficial due to differing ranges and variances among variables. While it is not necessary to always standardize, doing so can prevent certain variables from disproportionately influencing model weights. Most statistical algorithms inherently handle scaling and normalization, reducing the need for manual adjustments. The statistical significance of independent variables remains unaffected by their scale, although the magnitude of coefficients may vary. Overall, careful consideration of variable scaling can enhance model accuracy and interpretability.
fog37
Messages
1,566
Reaction score
108
TL;DR Summary
scaling and standardization in statistical analysis
Hello everyone,

When working with variables in a data set to find the appropriate statistical model (linear, nonlinear regression, etc.), the variables can have different range, standard deviation, mean, etc.

Should all the input variables be always standardized and scaled before the analysis is applied so they have the same mean and range?

For example, when determining the price of a house (target output variable) using a multivariate linear regression model, the input variables (square footage, year it was build, number of rooms, etc.) have very different ranges...It could happen that a certain variables gets a larger weight just because of the range of its values...

What do do?
 
Physics news on Phys.org
I wouldn’t say “always”, but certainly “often”.
 
Most algorithms or equations will include the appropriate scaling and normalization. Usually, you do not need to do it yourself.
 
And often the variable of interest is a difference or % change in which case scale does not matter. This is how finance and economics mostly works. SDs generally do not get whitened for OLS
 
Last edited:
fog37 said:
using a multivariate linear regression model, the input variables (square footage, year it was build, number of rooms, etc.) have very different ranges...It could happen that a certain variables gets a larger weight just because of the range of its values...
The statistical significance of independent variables in multivariate linear regression does not depend on the scale of the variable values. That effect is compensated for. The magnitude and variance of the multiplying coefficients are affected by the scale of the variables but the statistical significance is not.
 
Back
Top