Estimator for variance when sampling without replacement

logarithmic · Feb 22, 2011

Does anyone know the formula for an unbiased estimator of the population variance \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2 when taking r samples without replacement from a finite population \{x_1, \dots, x_n\} whose mean is \bar{x}?

A google search doesn't find anything useful other than the the special cases of when r = n the estimator is of course \frac{r-1}{r}s^2, where s^2 = \frac{1}{r-1}\sum_{i=1}^{r}(x_i - \bar{x})^2 which is of course the unbiased estimator when taking r samples with replacement.

I know that a (relatively) simple formula exists, I've seen it somewhere before, just don't remember where.

MaxManus · Feb 22, 2011

Is it hypergeometric distribution?
http://en.wikipedia.org/wiki/Hypergeometric_distribution

logarithmic · Feb 23, 2011

MaxManus said:

Is it hypergeometric distribution?
http://en.wikipedia.org/wiki/Hypergeometric_distribution

Not that one. That's the distribution for the number of black balls drawn without replacement from a box with black and white balls. It isn't an estimator for the population variance.

MaxManus · Feb 23, 2011

logarithmic said:

Not that one. That's the distribution for the number of black balls drawn without replacement from a box with black and white balls. It isn't an estimator for the population variance.

But isn't that the distribution you described? And isn't the variance formula on the right table an estimator?

logarithmic · Feb 23, 2011

MaxManus said:

But isn't that the distribution you described? And isn't the variance formula on the right table an estimator?

Not really. I'm looking for an estimator, that is a function of the samples: f(X_1, ..., X_r) which itself is a random variable, such that E(f(X_1, ..., X_r)) = true population variance.

That variance formula isn't a random variable (i.e. it can't be an estimator), it's the variance of a certain random variable that counts. But a hypergeometric random variable isn't appropriate for measuring the count of the samples since, the number of samples is assumed to be fixed as r.

uart · Feb 25, 2011

logarithmic said:

Does anyone know the formula for an unbiased estimator of the population variance \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2 when taking r samples without replacement from a finite population \{x_1, \dots, x_n\} whose mean is \bar{x}?

Hi logarithmic. With that definition of \bar{x} (which is actually the definition of the population mean) then the unbiased estimator of the population variance is simply,

\frac{1}{r}\sum_{i=1}^{r}(x_i - \bar{x})^2

However I think you really meant for \bar{x} to denote the sample mean of the "r" chosen samples rather than the population mean. In which case unbiased estimator is,

\left(\frac{n-1}{n}\right) \, \frac {1}{r-1}\sum_{i=1}^{r}(x_i - \bar{x})^2

Estimator for variance when sampling without replacement

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Use greedy vertex coloring algorithm to prove the upper bound of χ

Does this series converge uniformly?

Conflicting definitions of linear independence

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers