Scaling and Standardization in Statistical Analysis

Click For Summary

Discussion Overview

The discussion revolves around the necessity and implications of scaling and standardizing input variables in statistical analysis, particularly in the context of multivariate linear regression. Participants explore whether all input variables should be standardized to have the same mean and range before applying statistical models.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant questions if input variables should always be standardized and scaled, citing the potential for variables with larger ranges to disproportionately influence model weights.
  • Another participant suggests that standardization is not always necessary but is often beneficial.
  • A different viewpoint indicates that many algorithms inherently include scaling and normalization, implying that manual scaling may not be required.
  • It is noted that in some contexts, such as finance and economics, the scale of variables may not matter if the variable of interest is a difference or percentage change.
  • One participant asserts that while the magnitude and variance of coefficients in multivariate linear regression are affected by variable scale, the statistical significance of independent variables remains unaffected by the scale of the variable values.

Areas of Agreement / Disagreement

Participants express differing views on the necessity of scaling and standardization, indicating that there is no consensus on whether it should always be applied. Multiple competing perspectives on the topic remain unresolved.

Contextual Notes

Some assumptions regarding the applicability of scaling and normalization in various statistical contexts are not fully explored. The discussion does not resolve the implications of scaling on statistical significance versus coefficient magnitude.

fog37
Messages
1,566
Reaction score
108
TL;DR
scaling and standardization in statistical analysis
Hello everyone,

When working with variables in a data set to find the appropriate statistical model (linear, nonlinear regression, etc.), the variables can have different range, standard deviation, mean, etc.

Should all the input variables be always standardized and scaled before the analysis is applied so they have the same mean and range?

For example, when determining the price of a house (target output variable) using a multivariate linear regression model, the input variables (square footage, year it was build, number of rooms, etc.) have very different ranges...It could happen that a certain variables gets a larger weight just because of the range of its values...

What do do?
 
Physics news on Phys.org
I wouldn’t say “always”, but certainly “often”.
 
Most algorithms or equations will include the appropriate scaling and normalization. Usually, you do not need to do it yourself.
 
And often the variable of interest is a difference or % change in which case scale does not matter. This is how finance and economics mostly works. SDs generally do not get whitened for OLS
 
Last edited:
fog37 said:
using a multivariate linear regression model, the input variables (square footage, year it was build, number of rooms, etc.) have very different ranges...It could happen that a certain variables gets a larger weight just because of the range of its values...
The statistical significance of independent variables in multivariate linear regression does not depend on the scale of the variable values. That effect is compensated for. The magnitude and variance of the multiplying coefficients are affected by the scale of the variables but the statistical significance is not.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 14 ·
Replies
14
Views
3K
Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K