Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Homework Help: Difference between sample standard deviation and population standard deviation?

  1. Jun 24, 2012 #1
    1. The problem statement, all variables and given/known data

    Just as the title suggests, although this is more to do with the formula. I know that for a sample, it implies it's a subset of a population. Why in the formula do you divide by n-1, whereas for calculating standard deviation for a population you divide by the total amount of elements in it?
  2. jcsd
  3. Jun 24, 2012 #2

    Simon Bridge

    User Avatar
    Science Advisor
    Homework Helper

    Short Answer:
    The idea is to use the sample to estimate the population statistics.
    Dividing by n-1 gives you a better estimator for standard deviation.

    Longer Answer:
    The reason that n-1 is used instead of n in the formula for the sample variance is as follows: The sample variance can be thought of as a random variable, i.e. a function which takes on different values for different samples of the same distribution. Its use is as an estimate for the true variance of the distribution. In statistics, one typically does not know the true variance; one uses the sample variance to ESTIMATE the true variance. Since the sample variance is a random variable, it usually has a mean, or average value. One would hope that this average value is close to the actual value that the sample variance is estimating, i.e. close to the true variance. In fact, if the n-1 is used in the defining formula for the sample variance, then it is possible to prove that the average value of the sample variance EQUALS the true variance. If we replace the n-1 by an n, then the average value of the sample variance is ((n-1)/n) times as large as the true variance.

    A random variable X which is used to estimate a parameter p of a distribution is called an unbiased estimator if the expected value of X equals p. Thus, using the n-1 gives an
    unbiased estimator of the variance of a distribution.
  4. Jun 24, 2012 #3
    OK, thanks for the response. I understand what is being said there in general. However I don't quite understand why n - 1 is still used. It's saying if n - 1 is used in the defining formula then it's possible to prove etc... I do not understand that part, can you give me a short numerical example?
  5. Jun 24, 2012 #4

    Simon Bridge

    User Avatar
    Science Advisor
    Homework Helper

    The usual exercise is to get the student to work out the distribution of sample variances.

    But I think the confusion arises over the terms used vis: the "sample variance" is a technical terms that does not quite mean the same thing as "the variance of the sample", but the "population variance" is the same thing as "the variance of the population".

    The sample variance is an approximation to the population variance which is agreed upon by convention. The division by (n-1) gives a better approximation than the division by n (which would have given you the variance of the sample).

    To see what they are doing: remember that the idea is to figure out what the population mean and variance is without actually polling the entire population. You could take a sample of 1000 out of a population of several million ... what can you say, in general, about the entire population, from such a small number?

    You could find the mean and variance for the sample ... OK. But if you took another sample of 1000 tomorrow you will very likely get a different mean and variance from them.

    If you take a lot of samples, and they are all random, then you get the mean-value-theorum giving you a distribution of means and variances which are, themselves, normal distributions.

    If the population was normally distributed, then the mean of the means will get closer to the population mean as the number of samples increases but the mean of the variances (of the sample) will be bigger than the population variance.

    You should be able to confirm that by working them out for just three or four random normal variables. You should know how to add random distributions by now.

    What the passage quoted is saying is that if you define the sample variance to divide by (n-1) it is more convenient for estimating the population variance which is what we are after.

    We don't have to do it that way, it's a convention.
  6. Jun 25, 2012 #5
    Thanks for that Simon, I have a clearer understanding now.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook