Difference between sample standard deviation and population standard deviation?

In summary, the formula for sample variance uses n-1 instead of n because it gives a better estimator for the true variance of a distribution. This is because using n-1 allows the average value of the sample variance to equal the true variance. It is an unbiased estimator and is used to estimate the population statistics. Dividing by n-1 is a convention and is more convenient for estimating the population variance.
  • #1
NewtonianAlch
453
0

Homework Statement



Just as the title suggests, although this is more to do with the formula. I know that for a sample, it implies it's a subset of a population. Why in the formula do you divide by n-1, whereas for calculating standard deviation for a population you divide by the total amount of elements in it?
 
Physics news on Phys.org
  • #2
Short Answer:
The idea is to use the sample to estimate the population statistics.
Dividing by n-1 gives you a better estimator for standard deviation.

Longer Answer:
The reason that n-1 is used instead of n in the formula for the sample variance is as follows: The sample variance can be thought of as a random variable, i.e. a function which takes on different values for different samples of the same distribution. Its use is as an estimate for the true variance of the distribution. In statistics, one typically does not know the true variance; one uses the sample variance to ESTIMATE the true variance. Since the sample variance is a random variable, it usually has a mean, or average value. One would hope that this average value is close to the actual value that the sample variance is estimating, i.e. close to the true variance. In fact, if the n-1 is used in the defining formula for the sample variance, then it is possible to prove that the average value of the sample variance EQUALS the true variance. If we replace the n-1 by an n, then the average value of the sample variance is ((n-1)/n) times as large as the true variance.

A random variable X which is used to estimate a parameter p of a distribution is called an unbiased estimator if the expected value of X equals p. Thus, using the n-1 gives an
unbiased estimator of the variance of a distribution.
 
  • #3
OK, thanks for the response. I understand what is being said there in general. However I don't quite understand why n - 1 is still used. It's saying if n - 1 is used in the defining formula then it's possible to prove etc... I do not understand that part, can you give me a short numerical example?
 
  • #4
The usual exercise is to get the student to work out the distribution of sample variances.

But I think the confusion arises over the terms used vis: the "sample variance" is a technical terms that does not quite mean the same thing as "the variance of the sample", but the "population variance" is the same thing as "the variance of the population".

The sample variance is an approximation to the population variance which is agreed upon by convention. The division by (n-1) gives a better approximation than the division by n (which would have given you the variance of the sample).

To see what they are doing: remember that the idea is to figure out what the population mean and variance is without actually polling the entire population. You could take a sample of 1000 out of a population of several million ... what can you say, in general, about the entire population, from such a small number?

You could find the mean and variance for the sample ... OK. But if you took another sample of 1000 tomorrow you will very likely get a different mean and variance from them.

If you take a lot of samples, and they are all random, then you get the mean-value-theorum giving you a distribution of means and variances which are, themselves, normal distributions.

If the population was normally distributed, then the mean of the means will get closer to the population mean as the number of samples increases but the mean of the variances (of the sample) will be bigger than the population variance.

You should be able to confirm that by working them out for just three or four random normal variables. You should know how to add random distributions by now.

What the passage quoted is saying is that if you define the sample variance to divide by (n-1) it is more convenient for estimating the population variance which is what we are after.

We don't have to do it that way, it's a convention.
 
  • #5
Thanks for that Simon, I have a clearer understanding now.
 

1. What is the difference between sample standard deviation and population standard deviation?

The sample standard deviation is a measure of the spread of data in a sample, while the population standard deviation is a measure of the spread of data in an entire population. In other words, the sample standard deviation tells us how much the data varies within a sample, while the population standard deviation tells us how much the data varies within a population.

2. How are sample standard deviation and population standard deviation calculated?

The sample standard deviation is calculated by taking the square root of the sum of the squared differences between each data point and the mean of the sample, divided by the number of data points minus one. The population standard deviation is calculated by taking the square root of the sum of the squared differences between each data point and the mean of the population, divided by the total number of data points.

3. Can the sample standard deviation be used to make inferences about the population standard deviation?

Yes, the sample standard deviation can be used to estimate the population standard deviation, but it is not as accurate as using the population standard deviation itself. As the sample size increases, the sample standard deviation becomes closer to the population standard deviation.

4. When should I use sample standard deviation and when should I use population standard deviation?

You should use sample standard deviation when you only have data from a sample and want to understand the variability within that sample. You should use population standard deviation when you have data from an entire population and want to understand the variability within that population.

5. What are the implications of using the wrong standard deviation measure?

Using the wrong standard deviation measure can lead to inaccurate conclusions about the data. If the sample standard deviation is used instead of the population standard deviation, the variability within the population may be underestimated. This can result in incorrect statistical analyses and incorrect decisions based on the data.

Similar threads

  • Precalculus Mathematics Homework Help
Replies
10
Views
2K
  • Precalculus Mathematics Homework Help
Replies
5
Views
8K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
869
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Precalculus Mathematics Homework Help
Replies
4
Views
11K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Precalculus Mathematics Homework Help
Replies
9
Views
4K
  • Precalculus Mathematics Homework Help
Replies
2
Views
1K
  • Introductory Physics Homework Help
Replies
3
Views
2K
  • General Math
Replies
6
Views
1K
Back
Top