Why Do We Subtract 1 from n When Calculating Standard Deviation?

Click For Summary
Subtracting 1 from n when calculating sample standard deviation, known as Bessel's correction, addresses bias in variance estimation, ensuring that the sample variance converges to the true population variance. This adjustment is necessary because the sample mean is used as an estimate of the population mean, which can lead to underestimating variability in smaller samples. Squaring the differences between each data point and the mean helps eliminate negative values and emphasizes larger deviations, providing a measure of spread. Understanding these concepts is crucial for accurately interpreting statistical results, as estimators can vary based on sample size and context. The discussion highlights that statistical methods are not universally applicable and depend on the specific problem being addressed.
caironotebook
Messages
1
Reaction score
0
...I understand why it becomes more and more insignificant as the value of n increases, but what is the purpose of subtracting 1 from n in the sample population in the first place? Also, what is the purpose of squaring the result of x values minus the mean?

I can do the problems. I'm just a bit confused about what each step means and where the numbers are coming from. Any help you could give me would be most appreciated...
 
Physics news on Phys.org
caironotebook said:
...I understand why it becomes more and more insignificant as the value of n increases, but what is the purpose of subtracting 1 from n in the sample population in the first place? Also, what is the purpose of squaring the result of x values minus the mean?

I can do the problems. I'm just a bit confused about what each step means and where the numbers are coming from. Any help you could give me would be most appreciated...

It's called Bessel's correction and it corrects the bias of the variance estimator. This means the uncorrected sample variance does not converge to the population variance.

http://en.wikipedia.org/wiki/Bessel's_correction
 
Last edited:
Using n-1 makes the average of the estimated variance equal to the true variance. This is a result of the fact that you are estimating the mean from the sample. If you knew the true mean, then you would use n not n-1.
 
Hey caironotebook and welcome to the forums.

It might help you understand the above comments by understanding the properties of estimators in statistics.

Good estimators have among other things, the property that the expectation (think average of the distribution) is the value of the parameter being estimated for all samples relating to a particular distribution, and all sample sizes possible for a sample.

You should see that if this didn't hold (the mean value changed for different sample sizes as an example), then you would end up getting crazy results for different sample sizes and you would have to keep correcting things. By ensuring that the estimator is estimating the parameter properly for amongst other things, different sample sizes, then we can use the estimator to do its job which is to estimate a parameter.

There is a lot more to this issue and the above is by no means complete, but it hopefully gives you an idea of one problem that would occur if you had an unbiased estimator (in particular when the bias itself has a lot of variance over different sample sizes).
 
So I previously responded to this post and I apparently offended someone. My post was taken down and I was chastised for not knowing what I was talking about. The points that I was trying to make were:

a) Statistics is not axiomatic. There is not necessarily a basis for making one right choice. It depends on the problem at hand, the data available, and what you are trying to accomplish.

b) Specifically in the case of Bessel's correction, correcting for bias introduces instability in the measurement of variance. Whether that instability matters depends on what you are doing. I searched and found this good explanation:

http://www.johndcook.com/bias_consistency.html

This article contains a link to a very well known paper on the subject. Statistics is a tool. What is "good" for one case may not be "good" in another. While removing bias may seem intuitively to be a desirable goal, it may introduce another problem. It depends on what you are trying to achieve. I think that we do a disservice to the original poster by leading him to believe that there is one correct way to do statistics (although there are a lot of bad ways).
 
caironotebook said:
Also, what is the purpose of squaring the result of x values minus the mean?

I look at it this way:

When you consider geometric shapes, scaling factors play a key role. For example, in "similar" triangles, the corresponding sides are in the same ratios, so, in theory, if you wanted to have reference picture for a family of similar triangles, you could draw a picture of one and scale it up or down to get pictures of all the others.

When you consider the most important probability distribution, the normal distribution, you are really considering a "family" of distributions, each with a different mean and/or variance. However, there is a way to look at all these distributions as having "similar" shape. In fact, if you pick a coordinate system so the mean of the distribution is at the origin and you set 1 "unit" on the axis equal to the standard deviation then all these distributions plot as the same curve. So if you need to compute something about "the" normal distribution, you only need one numerical table or computer subroutine that handles this single "standard" normal. The mean is like a translation parameter. The standard deviation is like a scaling factor.

For some non-normal distributions, the standard deviation is not the only scaling factor needed to "standardize" them, but the standard deviation is still useful in many other cases.

Your question is not about the population standard deviation. You are asking about the procedure for estimating the variance and standard deviation.

Most people accept the procedure for computing the mean of a sample as "natural" since it appears to be a version of the procedure for computing the mean of the population. Thinking this way can handicap you in understanding statistics. If you do think this way and believe what I said about the importance of the standard deviation (and hence of the variance) then it would be natural to think that taking the mean of the square deviations from the sample average is "natural" since it imitates the calculation you would do for the population standard deviation and variance. So maybe you can think this way as a crutch!

The truth is that there is no mathematical principle that says "If you want to estimate the value of something computed for the population then do a similar computation on your sample." This sometimes works well and sometimes doesn't. The scenario you must understand is this: When you try to estimate a parameter of the population (e.g. the mean, variance, median, mode etc.) you usually do a computation on the sample values. This computation is an "estimator" of the parameter. It is not necessarily equal to the population parameter. If we consider that a sample is random vector of values, then an "estimator" is also a random variable. Thus "the sample mean" can be considered a random variable. The custom of omitting the phrase "estimator of" from things like "sample mean", "sample variance" etc. results in some confusion. In some contexts "sample mean" will mean a specific number like 23.5. In other contexts "sample mean" will be considered a random variable and properties of its distribution will be discussed.

The case of "sample variance" is interesting. Some books define it as computed by dividing by n and some define it as computed by dividing by n-1. It's a rather arbitrary definition, if you aren't claiming that the result is an estimator. If you want to get as specific as using the terminology "the usual unbiased estimator of the sample variance", this refers to computing it by dividing by n-1.

An intuitive justification for the n-1 is that if you take a small sample where extrremes are improbable then you will tend to underestimate the variability of the population since your are unlikely to pick up the extremes in your sample. So you divide by n-1 instead of n to bump up the value. (Of course this doesn't explain why wouldn't use some other formula to bump it up. You'll have to study the real calculation to understand that.)

I agree with Alan2's point that statistics is not axiomatic. (I can't comment on his deleted post; i didn't see it.) I don't interpret this to mean that the the mathematics of statistics isn't axiomatic I interpret it to mean that applying statistics to real world problems is a subjective matter. Further, the terminology used in statistics "confidence intervals" , "statistical significance" can be misleading to laymen. They usually interpret such terms to be the answers they want to know about a situation rather than what such calculations actually mean.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
5K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K