Why Standardize Data to Mean 0 & SD 1 in Multivariate Analysis?

  • Context: Undergrad 
  • Thread starter Thread starter Philip Wong
  • Start date Start date
  • Tags Tags
    Data
Click For Summary

Discussion Overview

The discussion centers on the rationale behind standardizing data to have a mean of 0 and a standard deviation (S.D.) of 1 in multivariate analysis. Participants explore the implications of this standardization process, particularly questioning the choice of S.D. and its relationship to data coverage and variation.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant questions why S.D. is set to 1, suggesting that using S.D. = 3 might cover more data (99.95%) and expressing concern about information loss during centering.
  • Another participant clarifies that S.D. = 1 is a result of standardization, which involves dividing each data point by the actual S.D.
  • A follow-up inquiry seeks further explanation on whether S.D. must be exactly 1 or if other values, such as S.D. = 3, could be used, linking this to the concept of z-scores and their implications for variation.
  • A participant provides a mathematical perspective, stating that the formula for z-scores applies to any linear combination of data points and explains how the variance of standardized data results in an S.D. of 1.

Areas of Agreement / Disagreement

Participants express differing views on the necessity of setting S.D. to 1, with some supporting the standardization process while others propose alternative values. The discussion remains unresolved regarding the optimal choice of S.D. and its implications.

Contextual Notes

Participants highlight the relationship between standard deviation, z-scores, and the coverage of data, but the discussion does not resolve the assumptions or implications of using different standard deviations.

Philip Wong
Messages
95
Reaction score
0
hi guys,
I'm wondering in multivariate analysis, when we standardize our mean to 0 (centroid our data) why do we have to set our S.D = 1?

I mean doesn't S.D = 1, only coves 50% of the data? shouldn't we use S.D = 3? where 99.95% of sampling data was covered instead? When we centroid our data, we've already lost some information, so why then do we still use S.D =1?

can someone explain it to me! thanks!
 
Physics news on Phys.org
You don't set sd =1 because you standardize the mean.

The reason for sd = 1 is because, as part of standardization, you are dividing each data point with the actual sd.
 
EnumaElish said:
You don't set sd =1 because you standardize the mean.

The reason for sd = 1 is because, as part of standardization, you are dividing each data point with the actual sd.

can you explained it more on what you mean?
I understand up to the point, where we use s.d. = 1 because that would give us the actual variation of the sample from the population (i.e. it got something to do with z-score). But does it have to be 1? or it could be any number up till and including s.d.=3?

let say if we could use any s.d., I'm aware that with a higher s.d. it gives us a smaller z score, correct? meaning it has lesser variation between sample and population, correct?

the reason I came to this conclusion is because, z = (mean - mu)/sigma
 
z = (data point - mu)/sigma for any data point, and for any linear combination of data points. Since "mean" (sample average) is a linear combination of the data points, the same formula applies to the sample average.

In general if the variance of a random variable X (read: data points) is V then the variance of bX + a is Vb^2, where a and b are constants (make the substitutions a = -mu/sigma and b = 1/sigma). Since sd is the sqrt of variance, it follows that sd of z has to be 1.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K