# Standardize data

1. May 25, 2010

### Philip Wong

hi guys,
I'm wondering in multivariate analysis, when we standardize our mean to 0 (centroid our data) why do we have to set our S.D = 1?

I mean doesn't S.D = 1, only coves 50% of the data? shouldn't we use S.D = 3? where 99.95% of sampling data was covered instead? When we centroid our data, we've already lost some information, so why then do we still use S.D =1?

can someone explain it to me! thanks!

2. May 25, 2010

### EnumaElish

You don't set sd =1 because you standardize the mean.

The reason for sd = 1 is because, as part of standardization, you are dividing each data point with the actual sd.

3. May 26, 2010

### Philip Wong

can you explained it more on what you mean?
I understand up to the point, where we use s.d. = 1 because that would give us the actual variation of the sample from the population (i.e. it got something to do with z-score). But does it have to be 1? or it could be any number up till and including s.d.=3?

let say if we could use any s.d., I'm aware that with a higher s.d. it gives us a smaller z score, correct? meaning it has lesser variation between sample and population, correct?

the reason I came to this conclusion is because, z = (mean - mu)/sigma

4. May 26, 2010

### EnumaElish

z = (data point - mu)/sigma for any data point, and for any linear combination of data points. Since "mean" (sample average) is a linear combination of the data points, the same formula applies to the sample average.

In general if the variance of a random variable X (read: data points) is V then the variance of bX + a is Vb^2, where a and b are constants (make the substitutions a = -mu/sigma and b = 1/sigma). Since sd is the sqrt of variance, it follows that sd of z has to be 1.