Bessel's correction and degrees of freedom

Click For Summary

Discussion Overview

The discussion revolves around the concept of degrees of freedom in statistics, particularly in relation to Bessel's correction and the calculation of sample variance. Participants explore the implications of using sample means versus population means and the reasoning behind using n-1 instead of n in variance calculations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant describes the difference between statistical errors and residuals, noting that the last residual does not vary freely, which leads to the need for Bessel's correction.
  • Another participant reflects on the historical context of statistical estimation methods and the analogy between sample and population properties.
  • A question is raised about why n-1 is used in the calculation of sample variance, with a desire for clarity on the reasoning behind this choice.
  • One participant suggests that the second and third paragraphs of another's post lack clarity as mathematical arguments but does not label them as incorrect.
  • Another participant acknowledges understanding that the expectation of the sample variance equals the population variance multiplied by n/(n-1), reinforcing the rationale for using n-1.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and clarity regarding the concept of degrees of freedom and the use of n-1 in variance calculations. There is no consensus on the best way to articulate the mathematical reasoning behind these concepts, and some participants seek further clarification.

Contextual Notes

Participants note that "degrees of freedom" can have different meanings in various contexts, and there is an acknowledgment that a single verbal argument may not suffice to explain the mathematics involved in all situations.

chiropter
Messages
25
Reaction score
0
Wondering about degrees of freedom. So basically let me express my current understanding:
Comparing the statistical errors to the residuals, the former involves taking a subset of observations from a population and comparing them to the population mean, while the latter involves comparing the observations to the sample mean calculated from the same subset of observations. The differences between a sample of observations and the population mean obviously does not have to sum to 0, while the differences between the observations and the sample mean does sum to 0.

Thus, if you are trying to estimate the 'average' of the difference between the population mean and the set of values in the population, using sample residuals (which are derived from a set of sample observations and a sample mean) to do so will result in the last 'residual' NOT being free to vary over the 'full range' a true statistical error would normally do, it will just have to be whatever the value is that will allow the sum to add up to 0.

So, if this is correct so far, it seems pretty straightforward to then claim that final sample residual won't contribute the required quantity to the sum of (squared) residuals that would be needed to estimate the 'average' of n (squared) residuals, and so instead, we redistribute that final quantity to all the other residuals and divide the sum by n-1 instead. So it's like you don't actually have n bits of variability from which you can estimate the errors (variance).

Ok, if that makes sense so far, my question is, why does n-1 take care of it? Why is it as if we have exactly one fewer residuals to estimate the population variance, wherein we take the remaining amount of 'variability' and distribute it equally to the other residuals before dividing by the number of things to get the average (my interpretation of dividing by n-1)? Why isn't it n-2 or 0.5n?

It bothers me that something as intuitive as calculating the average, or dividing, or summing can suddenly become so cryptic when we come to sample statistics. Hoping to resolve some of this confusion.
 
Physics news on Phys.org
chiropter said:
It bothers me that something as intuitive as calculating the average, or dividing, or summing can suddenly become so cryptic when we come to sample statistics. Hoping to resolve some of this confusion.

I've read that people had the same outlook in the early days of statistics. They valued methods of estimating population properties from samples that used calculations that were analogous how the population properties were computed from probability densities. (For example, the sample mean is the sum of the sample values, each weighted by its frequency of occurrence. The population mean is the sum of the population values each weighted by its probability density (or the integral over each value with respect to the probability density.) When a method of estimation a population parameter from a sample was analogous to the definition of how the parameter is computed from a probability density, the method of estimating was called "consistent". (This is not the modern definition of a "consistent" estimator.)

I suppose it's human nature to hope that a strong analogy between two processes will always show us the "best" way to do things. However, consider the following problem:

A random variable X is know to have a probability density given by P(X=k-1/2) = 1/2, P(X=k+1) = 1/2 where k is some unknown integer. If you have the set of sample values { 3 1/2, 5, 3 1/2, 3 /12} how should you estimate the mean of X?

The population mean of X , which is 4 1/4, can be deduced from the values that occur in the sample. Using the sample mean as the estimate wouldn't be "best". The process of deducing the population mean from the sample can be described as a computer algorithm, which implements various "if...then..." statements. From the modern point of view of a "function", such a computer algorithm is as much a function as a something written as a single algebraic expression.

You must open your mind to the complexity of the general scenario for estimation. When you consider estimating a population parameter from samples of a random variable X, don't think of X as "the" random variable in the problem. The values in the sample, are random variables, even if they are independent realizations of X. When you compute some function S of the random variables in the sample, you obtain another random variable, which need not have the same distribution as X. Such a random variable is called a "statistic". The random variable S has its own population mean, variance, etc which need not be the same as corresponding population mean, variance etc. of X.


The two common uses of statistics are 1) to test a hypothesis 2) to estimate something - often a parameter of the population distribution.

When we use a statistic to estimate something, we naturally call it an "estimator". An estimator is a thus a random variable. If we have two different estimators for a population parameter, how do we decide which one is "best". How do we even define "best" mathematically? I don't know whether you studies in statistics have gotten to that question yet.
 
  • Like
Likes   Reactions: 1 person
So is what I'm saying in the second and third paragraphs off-base?
 
I wouldn't say your 2nd and 3rd paragraphs are "wrong". I'd say they are not clear enough to be mathematical arguments. You can find many threads on the forum that give a mathematical derivation of why the customary "unbiased" estimator of the population variance of a normal distribution uses a n-1 in the denominator. (If you can't find any searching on keywords "unbiased", "estimator","variance" then let me know and I'll look up one for you.

"Degrees of freedom" means various things in various contexts. It usually has to do with situations where some given equations have solutions that can be formed by making "arbitrary" choices for the values of a certain number of variables. There are many possible sets of equations. There isn't a single simple verbal argument that explains the mathematics of all these situation.
 
Thanks, Stephen. I have already read and understand that taking the expectation of the sample variance resolves to equal the population variance multiplied by n/(n-1), which is equivalent to dividing by (n-1) instead of (n) in a formula for calculating the sample variance.
 

Similar threads

  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
5
Views
6K