Why Do We Subtract 1 from n When Calculating Standard Deviation?

caironotebook · Feb 4, 2012

...I understand why it becomes more and more insignificant as the value of n increases, but what is the purpose of subtracting 1 from n in the sample population in the first place? Also, what is the purpose of squaring the result of x values minus the mean?

I can do the problems. I'm just a bit confused about what each step means and where the numbers are coming from. Any help you could give me would be most appreciated...

SW VandeCarr · Feb 4, 2012

caironotebook said:

...I understand why it becomes more and more insignificant as the value of n increases, but what is the purpose of subtracting 1 from n in the sample population in the first place? Also, what is the purpose of squaring the result of x values minus the mean?

I can do the problems. I'm just a bit confused about what each step means and where the numbers are coming from. Any help you could give me would be most appreciated...

It's called Bessel's correction and it corrects the bias of the variance estimator. This means the uncorrected sample variance does not converge to the population variance.

http://en.wikipedia.org/wiki/Bessel's_correction

mathman · Feb 5, 2012

Using n-1 makes the average of the estimated variance equal to the true variance. This is a result of the fact that you are estimating the mean from the sample. If you knew the true mean, then you would use n not n-1.

chiro · Feb 5, 2012

Hey caironotebook and welcome to the forums.

It might help you understand the above comments by understanding the properties of estimators in statistics.

Good estimators have among other things, the property that the expectation (think average of the distribution) is the value of the parameter being estimated for all samples relating to a particular distribution, and all sample sizes possible for a sample.

You should see that if this didn't hold (the mean value changed for different sample sizes as an example), then you would end up getting crazy results for different sample sizes and you would have to keep correcting things. By ensuring that the estimator is estimating the parameter properly for amongst other things, different sample sizes, then we can use the estimator to do its job which is to estimate a parameter.

There is a lot more to this issue and the above is by no means complete, but it hopefully gives you an idea of one problem that would occur if you had an unbiased estimator (in particular when the bias itself has a lot of variance over different sample sizes).

alan2 · Feb 10, 2012

So I previously responded to this post and I apparently offended someone. My post was taken down and I was chastised for not knowing what I was talking about. The points that I was trying to make were:

a) Statistics is not axiomatic. There is not necessarily a basis for making one right choice. It depends on the problem at hand, the data available, and what you are trying to accomplish.

b) Specifically in the case of Bessel's correction, correcting for bias introduces instability in the measurement of variance. Whether that instability matters depends on what you are doing. I searched and found this good explanation:

http://www.johndcook.com/bias_consistency.html

This article contains a link to a very well known paper on the subject. Statistics is a tool. What is "good" for one case may not be "good" in another. While removing bias may seem intuitively to be a desirable goal, it may introduce another problem. It depends on what you are trying to achieve. I think that we do a disservice to the original poster by leading him to believe that there is one correct way to do statistics (although there are a lot of bad ways).

Stephen Tashi · Feb 10, 2012

caironotebook said:

Also, what is the purpose of squaring the result of x values minus the mean?

I look at it this way:

When you consider geometric shapes, scaling factors play a key role. For example, in "similar" triangles, the corresponding sides are in the same ratios, so, in theory, if you wanted to have reference picture for a family of similar triangles, you could draw a picture of one and scale it up or down to get pictures of all the others.

When you consider the most important probability distribution, the normal distribution, you are really considering a "family" of distributions, each with a different mean and/or variance. However, there is a way to look at all these distributions as having "similar" shape. In fact, if you pick a coordinate system so the mean of the distribution is at the origin and you set 1 "unit" on the axis equal to the standard deviation then all these distributions plot as the same curve. So if you need to compute something about "the" normal distribution, you only need one numerical table or computer subroutine that handles this single "standard" normal. The mean is like a translation parameter. The standard deviation is like a scaling factor.

For some non-normal distributions, the standard deviation is not the only scaling factor needed to "standardize" them, but the standard deviation is still useful in many other cases.

Your question is not about the population standard deviation. You are asking about the procedure for estimating the variance and standard deviation.

Most people accept the procedure for computing the mean of a sample as "natural" since it appears to be a version of the procedure for computing the mean of the population. Thinking this way can handicap you in understanding statistics. If you do think this way and believe what I said about the importance of the standard deviation (and hence of the variance) then it would be natural to think that taking the mean of the square deviations from the sample average is "natural" since it imitates the calculation you would do for the population standard deviation and variance. So maybe you can think this way as a crutch!

The truth is that there is no mathematical principle that says "If you want to estimate the value of something computed for the population then do a similar computation on your sample." This sometimes works well and sometimes doesn't. The scenario you must understand is this: When you try to estimate a parameter of the population (e.g. the mean, variance, median, mode etc.) you usually do a computation on the sample values. This computation is an "estimator" of the parameter. It is not necessarily equal to the population parameter. If we consider that a sample is random vector of values, then an "estimator" is also a random variable. Thus "the sample mean" can be considered a random variable. The custom of omitting the phrase "estimator of" from things like "sample mean", "sample variance" etc. results in some confusion. In some contexts "sample mean" will mean a specific number like 23.5. In other contexts "sample mean" will be considered a random variable and properties of its distribution will be discussed.

The case of "sample variance" is interesting. Some books define it as computed by dividing by n and some define it as computed by dividing by n-1. It's a rather arbitrary definition, if you aren't claiming that the result is an estimator. If you want to get as specific as using the terminology "the usual unbiased estimator of the sample variance", this refers to computing it by dividing by n-1.

An intuitive justification for the n-1 is that if you take a small sample where extrremes are improbable then you will tend to underestimate the variability of the population since your are unlikely to pick up the extremes in your sample. So you divide by n-1 instead of n to bump up the value. (Of course this doesn't explain why wouldn't use some other formula to bump it up. You'll have to study the real calculation to understand that.)

I agree with Alan2's point that statistics is not axiomatic. (I can't comment on his deleted post; i didn't see it.) I don't interpret this to mean that the the mathematics of statistics isn't axiomatic I interpret it to mean that applying statistics to real world problems is a subjective matter. Further, the terminology used in statistics "confidence intervals" , "statistical significance" can be misleading to laymen. They usually interpret such terms to be the answers they want to know about a situation rather than what such calculations actually mean.

Why Do We Subtract 1 from n When Calculating Standard Deviation?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight