Bessel's correction and degrees of freedom

chiropter · Feb 28, 2014

Wondering about degrees of freedom. So basically let me express my current understanding:
Comparing the statistical errors to the residuals, the former involves taking a subset of observations from a population and comparing them to the population mean, while the latter involves comparing the observations to the sample mean calculated from the same subset of observations. The differences between a sample of observations and the population mean obviously does not have to sum to 0, while the differences between the observations and the sample mean does sum to 0.

Thus, if you are trying to estimate the 'average' of the difference between the population mean and the set of values in the population, using sample residuals (which are derived from a set of sample observations and a sample mean) to do so will result in the last 'residual' NOT being free to vary over the 'full range' a true statistical error would normally do, it will just have to be whatever the value is that will allow the sum to add up to 0.

So, if this is correct so far, it seems pretty straightforward to then claim that final sample residual won't contribute the required quantity to the sum of (squared) residuals that would be needed to estimate the 'average' of n (squared) residuals, and so instead, we redistribute that final quantity to all the other residuals and divide the sum by n-1 instead. So it's like you don't actually have n bits of variability from which you can estimate the errors (variance).

Ok, if that makes sense so far, my question is, why does n-1 take care of it? Why is it as if we have exactly one fewer residuals to estimate the population variance, wherein we take the remaining amount of 'variability' and distribute it equally to the other residuals before dividing by the number of things to get the average (my interpretation of dividing by n-1)? Why isn't it n-2 or 0.5n?

It bothers me that something as intuitive as calculating the average, or dividing, or summing can suddenly become so cryptic when we come to sample statistics. Hoping to resolve some of this confusion.

Stephen Tashi · Mar 1, 2014

chiropter said:

It bothers me that something as intuitive as calculating the average, or dividing, or summing can suddenly become so cryptic when we come to sample statistics. Hoping to resolve some of this confusion.

I've read that people had the same outlook in the early days of statistics. They valued methods of estimating population properties from samples that used calculations that were analogous how the population properties were computed from probability densities. (For example, the sample mean is the sum of the sample values, each weighted by its frequency of occurrence. The population mean is the sum of the population values each weighted by its probability density (or the integral over each value with respect to the probability density.) When a method of estimation a population parameter from a sample was analogous to the definition of how the parameter is computed from a probability density, the method of estimating was called "consistent". (This is not the modern definition of a "consistent" estimator.)

I suppose it's human nature to hope that a strong analogy between two processes will always show us the "best" way to do things. However, consider the following problem:

A random variable X is know to have a probability density given by P(X=k-1/2) = 1/2, P(X=k+1) = 1/2 where k is some unknown integer. If you have the set of sample values { 3 1/2, 5, 3 1/2, 3 /12} how should you estimate the mean of X?

The population mean of X , which is 4 1/4, can be deduced from the values that occur in the sample. Using the sample mean as the estimate wouldn't be "best". The process of deducing the population mean from the sample can be described as a computer algorithm, which implements various "if...then..." statements. From the modern point of view of a "function", such a computer algorithm is as much a function as a something written as a single algebraic expression.

You must open your mind to the complexity of the general scenario for estimation. When you consider estimating a population parameter from samples of a random variable X, don't think of X as "the" random variable in the problem. The values in the sample, are random variables, even if they are independent realizations of X. When you compute some function S of the random variables in the sample, you obtain another random variable, which need not have the same distribution as X. Such a random variable is called a "statistic". The random variable S has its own population mean, variance, etc which need not be the same as corresponding population mean, variance etc. of X.

The two common uses of statistics are 1) to test a hypothesis 2) to estimate something - often a parameter of the population distribution.

When we use a statistic to estimate something, we naturally call it an "estimator". An estimator is a thus a random variable. If we have two different estimators for a population parameter, how do we decide which one is "best". How do we even define "best" mathematically? I don't know whether you studies in statistics have gotten to that question yet.

chiropter · Mar 1, 2014

So is what I'm saying in the second and third paragraphs off-base?

Stephen Tashi · Mar 1, 2014

I wouldn't say your 2nd and 3rd paragraphs are "wrong". I'd say they are not clear enough to be mathematical arguments. You can find many threads on the forum that give a mathematical derivation of why the customary "unbiased" estimator of the population variance of a normal distribution uses a n-1 in the denominator. (If you can't find any searching on keywords "unbiased", "estimator","variance" then let me know and I'll look up one for you.

"Degrees of freedom" means various things in various contexts. It usually has to do with situations where some given equations have solutions that can be formed by making "arbitrary" choices for the values of a certain number of variables. There are many possible sets of equations. There isn't a single simple verbal argument that explains the mathematics of all these situation.

chiropter · Mar 3, 2014

Thanks, Stephen. I have already read and understand that taking the expectation of the sample variance resolves to equal the population variance multiplied by n/(n-1), which is equivalent to dividing by (n-1) instead of (n) in a formula for calculating the sample variance.

Bessel's correction and degrees of freedom

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect