Bessel's correction and degrees of freedom

AI Thread Summary
The discussion centers on the concept of degrees of freedom in statistics, particularly regarding Bessel's correction, which uses n-1 instead of n when calculating sample variance. It explains that sample residuals must sum to zero, limiting their variability and necessitating the adjustment to n-1 to provide an unbiased estimate of the population variance. The conversation also touches on the complexity of estimating population parameters from samples, emphasizing that statistics derived from samples can behave differently than the population they represent. Additionally, it highlights the importance of understanding the mathematical foundations behind these statistical methods, including the definitions of estimators and their properties. Overall, the thread seeks to clarify the rationale behind using n-1 in variance calculations and the broader implications for statistical estimation.
chiropter
Messages
25
Reaction score
0
Wondering about degrees of freedom. So basically let me express my current understanding:
Comparing the statistical errors to the residuals, the former involves taking a subset of observations from a population and comparing them to the population mean, while the latter involves comparing the observations to the sample mean calculated from the same subset of observations. The differences between a sample of observations and the population mean obviously does not have to sum to 0, while the differences between the observations and the sample mean does sum to 0.

Thus, if you are trying to estimate the 'average' of the difference between the population mean and the set of values in the population, using sample residuals (which are derived from a set of sample observations and a sample mean) to do so will result in the last 'residual' NOT being free to vary over the 'full range' a true statistical error would normally do, it will just have to be whatever the value is that will allow the sum to add up to 0.

So, if this is correct so far, it seems pretty straightforward to then claim that final sample residual won't contribute the required quantity to the sum of (squared) residuals that would be needed to estimate the 'average' of n (squared) residuals, and so instead, we redistribute that final quantity to all the other residuals and divide the sum by n-1 instead. So it's like you don't actually have n bits of variability from which you can estimate the errors (variance).

Ok, if that makes sense so far, my question is, why does n-1 take care of it? Why is it as if we have exactly one fewer residuals to estimate the population variance, wherein we take the remaining amount of 'variability' and distribute it equally to the other residuals before dividing by the number of things to get the average (my interpretation of dividing by n-1)? Why isn't it n-2 or 0.5n?

It bothers me that something as intuitive as calculating the average, or dividing, or summing can suddenly become so cryptic when we come to sample statistics. Hoping to resolve some of this confusion.
 
Physics news on Phys.org
chiropter said:
It bothers me that something as intuitive as calculating the average, or dividing, or summing can suddenly become so cryptic when we come to sample statistics. Hoping to resolve some of this confusion.

I've read that people had the same outlook in the early days of statistics. They valued methods of estimating population properties from samples that used calculations that were analogous how the population properties were computed from probability densities. (For example, the sample mean is the sum of the sample values, each weighted by its frequency of occurrence. The population mean is the sum of the population values each weighted by its probability density (or the integral over each value with respect to the probability density.) When a method of estimation a population parameter from a sample was analogous to the definition of how the parameter is computed from a probability density, the method of estimating was called "consistent". (This is not the modern definition of a "consistent" estimator.)

I suppose it's human nature to hope that a strong analogy between two processes will always show us the "best" way to do things. However, consider the following problem:

A random variable X is know to have a probability density given by P(X=k-1/2) = 1/2, P(X=k+1) = 1/2 where k is some unknown integer. If you have the set of sample values { 3 1/2, 5, 3 1/2, 3 /12} how should you estimate the mean of X?

The population mean of X , which is 4 1/4, can be deduced from the values that occur in the sample. Using the sample mean as the estimate wouldn't be "best". The process of deducing the population mean from the sample can be described as a computer algorithm, which implements various "if...then..." statements. From the modern point of view of a "function", such a computer algorithm is as much a function as a something written as a single algebraic expression.

You must open your mind to the complexity of the general scenario for estimation. When you consider estimating a population parameter from samples of a random variable X, don't think of X as "the" random variable in the problem. The values in the sample, are random variables, even if they are independent realizations of X. When you compute some function S of the random variables in the sample, you obtain another random variable, which need not have the same distribution as X. Such a random variable is called a "statistic". The random variable S has its own population mean, variance, etc which need not be the same as corresponding population mean, variance etc. of X.


The two common uses of statistics are 1) to test a hypothesis 2) to estimate something - often a parameter of the population distribution.

When we use a statistic to estimate something, we naturally call it an "estimator". An estimator is a thus a random variable. If we have two different estimators for a population parameter, how do we decide which one is "best". How do we even define "best" mathematically? I don't know whether you studies in statistics have gotten to that question yet.
 
  • Like
Likes 1 person
So is what I'm saying in the second and third paragraphs off-base?
 
I wouldn't say your 2nd and 3rd paragraphs are "wrong". I'd say they are not clear enough to be mathematical arguments. You can find many threads on the forum that give a mathematical derivation of why the customary "unbiased" estimator of the population variance of a normal distribution uses a n-1 in the denominator. (If you can't find any searching on keywords "unbiased", "estimator","variance" then let me know and I'll look up one for you.

"Degrees of freedom" means various things in various contexts. It usually has to do with situations where some given equations have solutions that can be formed by making "arbitrary" choices for the values of a certain number of variables. There are many possible sets of equations. There isn't a single simple verbal argument that explains the mathematics of all these situation.
 
Thanks, Stephen. I have already read and understand that taking the expectation of the sample variance resolves to equal the population variance multiplied by n/(n-1), which is equivalent to dividing by (n-1) instead of (n) in a formula for calculating the sample variance.
 
I was reading a Bachelor thesis on Peano Arithmetic (PA). PA has the following axioms (not including the induction schema): $$\begin{align} & (A1) ~~~~ \forall x \neg (x + 1 = 0) \nonumber \\ & (A2) ~~~~ \forall xy (x + 1 =y + 1 \to x = y) \nonumber \\ & (A3) ~~~~ \forall x (x + 0 = x) \nonumber \\ & (A4) ~~~~ \forall xy (x + (y +1) = (x + y ) + 1) \nonumber \\ & (A5) ~~~~ \forall x (x \cdot 0 = 0) \nonumber \\ & (A6) ~~~~ \forall xy (x \cdot (y + 1) = (x \cdot y) + x) \nonumber...
Back
Top