Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Standardize data

  1. May 25, 2010 #1
    hi guys,
    I'm wondering in multivariate analysis, when we standardize our mean to 0 (centroid our data) why do we have to set our S.D = 1?

    I mean doesn't S.D = 1, only coves 50% of the data? shouldn't we use S.D = 3? where 99.95% of sampling data was covered instead? When we centroid our data, we've already lost some information, so why then do we still use S.D =1?

    can someone explain it to me! thanks!
     
  2. jcsd
  3. May 25, 2010 #2

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    You don't set sd =1 because you standardize the mean.

    The reason for sd = 1 is because, as part of standardization, you are dividing each data point with the actual sd.
     
  4. May 26, 2010 #3
    can you explained it more on what you mean?
    I understand up to the point, where we use s.d. = 1 because that would give us the actual variation of the sample from the population (i.e. it got something to do with z-score). But does it have to be 1? or it could be any number up till and including s.d.=3?

    let say if we could use any s.d., I'm aware that with a higher s.d. it gives us a smaller z score, correct? meaning it has lesser variation between sample and population, correct?

    the reason I came to this conclusion is because, z = (mean - mu)/sigma
     
  5. May 26, 2010 #4

    EnumaElish

    User Avatar
    Science Advisor
    Homework Helper

    z = (data point - mu)/sigma for any data point, and for any linear combination of data points. Since "mean" (sample average) is a linear combination of the data points, the same formula applies to the sample average.

    In general if the variance of a random variable X (read: data points) is V then the variance of bX + a is Vb^2, where a and b are constants (make the substitutions a = -mu/sigma and b = 1/sigma). Since sd is the sqrt of variance, it follows that sd of z has to be 1.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook