Why Standardize Data to Mean 0 & SD 1 in Multivariate Analysis?

Philip Wong · May 25, 2010

hi guys,
I'm wondering in multivariate analysis, when we standardize our mean to 0 (centroid our data) why do we have to set our S.D = 1?

I mean doesn't S.D = 1, only coves 50% of the data? shouldn't we use S.D = 3? where 99.95% of sampling data was covered instead? When we centroid our data, we've already lost some information, so why then do we still use S.D =1?

can someone explain it to me! thanks!

EnumaElish · May 25, 2010

You don't set sd =1 because you standardize the mean.

The reason for sd = 1 is because, as part of standardization, you are dividing each data point with the actual sd.

Philip Wong · May 26, 2010

EnumaElish said:

You don't set sd =1 because you standardize the mean.

The reason for sd = 1 is because, as part of standardization, you are dividing each data point with the actual sd.

can you explained it more on what you mean?
I understand up to the point, where we use s.d. = 1 because that would give us the actual variation of the sample from the population (i.e. it got something to do with z-score). But does it have to be 1? or it could be any number up till and including s.d.=3?

let say if we could use any s.d., I'm aware that with a higher s.d. it gives us a smaller z score, correct? meaning it has lesser variation between sample and population, correct?

the reason I came to this conclusion is because, z = (mean - mu)/sigma

EnumaElish · May 26, 2010

z = (data point - mu)/sigma for any data point, and for any linear combination of data points. Since "mean" (sample average) is a linear combination of the data points, the same formula applies to the sample average.

In general if the variance of a random variable X (read: data points) is V then the variance of bX + a is Vb^2, where a and b are constants (make the substitutions a = -mu/sigma and b = 1/sigma). Since sd is the sqrt of variance, it follows that sd of z has to be 1.

blue_raver22 · Jun 2, 2010

Standardizing data to mean 0 and standard deviation (SD) 1 in multivariate analysis is a common practice that has several benefits. First, it allows for easier comparison between variables with different units and scales. This is because all variables are now on the same scale, making it easier to interpret their relationships. For example, if one variable is measured in meters and another in kilograms, standardizing them to mean 0 and SD 1 would allow for a more direct comparison between the two.

Second, standardizing data to mean 0 and SD 1 helps to reduce the influence of outliers on the analysis. Outliers can have a significant impact on the mean and standard deviation of a dataset, and by standardizing the data, we can minimize this effect and make the analysis more robust.

Additionally, setting the SD to 1 is a convention in multivariate analysis. It allows for a more intuitive interpretation of the data, as the SD represents the average distance of data points from the mean. Setting it to 1 means that 68% of the data falls within one standard deviation from the mean, which is a commonly used benchmark in statistical analysis.

Furthermore, using SD = 3 as suggested in the question would result in a larger range of data being considered as "normal" or within an acceptable range. This could potentially mask important variations in the data and make it more difficult to identify patterns and relationships.

In summary, standardizing data to mean 0 and SD 1 in multivariate analysis is a common and useful practice that allows for easier comparison between variables, reduces the influence of outliers, and follows established conventions. While it may result in some loss of information, the benefits of standardization often outweigh this drawback.

Why Standardize Data to Mean 0 & SD 1 in Multivariate Analysis?

1. Why is standardizing data to mean 0 and SD 1 important in multivariate analysis?

2. How does standardizing data to mean 0 and SD 1 affect the results of multivariate analysis?

3. Can I standardize data to mean 0 and SD 1 in any type of multivariate analysis?

4. Is standardizing data to mean 0 and SD 1 always necessary in multivariate analysis?

5. What are the potential drawbacks of standardizing data to mean 0 and SD 1 in multivariate analysis?

Similar threads

Hot Threads

Recent Insights