Register to reply

Difference between sample standard deviation and population standard deviation?

by NewtonianAlch
Tags: deviation, difference, population, sample, standard
Share this thread:
NewtonianAlch
#1
Jun24-12, 12:17 AM
P: 440
1. The problem statement, all variables and given/known data

Just as the title suggests, although this is more to do with the formula. I know that for a sample, it implies it's a subset of a population. Why in the formula do you divide by n-1, whereas for calculating standard deviation for a population you divide by the total amount of elements in it?
Phys.Org News Partner Science news on Phys.org
Sapphire talk enlivens guesswork over iPhone 6
Geneticists offer clues to better rice, tomato crops
UConn makes 3-D copies of antique instrument parts
Simon Bridge
#2
Jun24-12, 01:55 AM
Homework
Sci Advisor
HW Helper
Thanks
Simon Bridge's Avatar
P: 12,447
Short Answer:
The idea is to use the sample to estimate the population statistics.
Dividing by n-1 gives you a better estimator for standard deviation.

Longer Answer:
The reason that n-1 is used instead of n in the formula for the sample variance is as follows: The sample variance can be thought of as a random variable, i.e. a function which takes on different values for different samples of the same distribution. Its use is as an estimate for the true variance of the distribution. In statistics, one typically does not know the true variance; one uses the sample variance to ESTIMATE the true variance. Since the sample variance is a random variable, it usually has a mean, or average value. One would hope that this average value is close to the actual value that the sample variance is estimating, i.e. close to the true variance. In fact, if the n-1 is used in the defining formula for the sample variance, then it is possible to prove that the average value of the sample variance EQUALS the true variance. If we replace the n-1 by an n, then the average value of the sample variance is ((n-1)/n) times as large as the true variance.

A random variable X which is used to estimate a parameter p of a distribution is called an unbiased estimator if the expected value of X equals p. Thus, using the n-1 gives an
unbiased estimator of the variance of a distribution.
NewtonianAlch
#3
Jun24-12, 02:17 AM
P: 440
OK, thanks for the response. I understand what is being said there in general. However I don't quite understand why n - 1 is still used. It's saying if n - 1 is used in the defining formula then it's possible to prove etc... I do not understand that part, can you give me a short numerical example?

Simon Bridge
#4
Jun24-12, 11:16 PM
Homework
Sci Advisor
HW Helper
Thanks
Simon Bridge's Avatar
P: 12,447
Difference between sample standard deviation and population standard deviation?

The usual exercise is to get the student to work out the distribution of sample variances.

But I think the confusion arises over the terms used vis: the "sample variance" is a technical terms that does not quite mean the same thing as "the variance of the sample", but the "population variance" is the same thing as "the variance of the population".

The sample variance is an approximation to the population variance which is agreed upon by convention. The division by (n-1) gives a better approximation than the division by n (which would have given you the variance of the sample).

To see what they are doing: remember that the idea is to figure out what the population mean and variance is without actually polling the entire population. You could take a sample of 1000 out of a population of several million ... what can you say, in general, about the entire population, from such a small number?

You could find the mean and variance for the sample ... OK. But if you took another sample of 1000 tomorrow you will very likely get a different mean and variance from them.

If you take a lot of samples, and they are all random, then you get the mean-value-theorum giving you a distribution of means and variances which are, themselves, normal distributions.

If the population was normally distributed, then the mean of the means will get closer to the population mean as the number of samples increases but the mean of the variances (of the sample) will be bigger than the population variance.

You should be able to confirm that by working them out for just three or four random normal variables. You should know how to add random distributions by now.

What the passage quoted is saying is that if you define the sample variance to divide by (n-1) it is more convenient for estimating the population variance which is what we are after.

We don't have to do it that way, it's a convention.
NewtonianAlch
#5
Jun25-12, 02:37 AM
P: 440
Thanks for that Simon, I have a clearer understanding now.


Register to reply

Related Discussions
Sample standard deviation proof Calculus & Beyond Homework 2
Standard deviation of a new sample Set Theory, Logic, Probability, Statistics 1
Sample size without standard Deviation Set Theory, Logic, Probability, Statistics 6
Standard Deviation of a sample of a population's means Set Theory, Logic, Probability, Statistics 7
Standard deviation revised by removing a sample Introductory Physics Homework 7