Estimating Population Variance from Observations

ghostyc · Nov 30, 2009

Suppose the population variance \theta is to be estimated from observations Y_1, Y_2, \dots, Y_n using

\hat{\theta} = \left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right)

where\bar{Y} is the mean.

when it didnt say which distribution Y_is are?

thanks

SW VandeCarr · Nov 30, 2009

ghostyc said:

population variance \theta

estimated using \hat{\theta}=\frac{1}{n}\sum_{i=1}^n (Y_i-\bar{Y})^2

how do find the bias of \hat{\theta}

when it didnt say which distribution Y_is are?

thanks

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

ghostyc · Nov 30, 2009

SW VandeCarr said:

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

hmmmmmm

that's how far i can go ...

\frac{1}{n} E \left( \sum_{i=1}^n Y_i^2 - n \bar{Y}^2 \right)

and i think from CLT i have \bar{Y} \sim N\left(\mu,\frac{\theta}{n} \right)

then i am stuck...
thanks

statdad · Nov 30, 2009

"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, \overline X is an unbiased estimator of \mu, since

 E(\overline X) = \mu 

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of \theta, the population variance

2) The bias is the difference between the result of step 1 and \theta

ghostyc · Nov 30, 2009

statdad said:

"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, \overline X is an unbiased estimator of \mu, since

 E(\overline X) = \mu 

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of \theta, the population variance

2) The bias is the difference between the result of step 1 and \theta

Hey there,

i know what 'bias' means.
the only problem is that i can't simplify that expression as i did in the previous any more
i have tried varies ways...
and i know it looks like a simple question , i just can't get it right...
maybe i will try tmr with my fresh brain LOL

thanks

SW VandeCarr · Nov 30, 2009

statdad said:

 E(\overline X) = \mu 

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of \theta, the population variance

2) The bias is the difference between the result of step 1 and \theta

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know \mu,\theta. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

statdad · Nov 30, 2009

sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

 E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2] 

First: since Y_1, Y_2, \dots, Y_n are independent and identically distributed,

 E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n 

so all the terms in the sum are equal.

Second:

 Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y) 

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.

statdad · Nov 30, 2009

SW VandeCarr said:

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know \mu,\theta. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

 \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 

is a biased estimator of \sigma^2.

ghostyc · Nov 30, 2009

SW VandeCarr said:

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know \mu,\theta. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

i am totally lost now...

are you impliying that it's an UNBIASED estimator of the variance?

let me check my question in the textbook...

there is a follow up question says,

When a Bootstrap sample of size n is taken, the Bootstrap estimate \hat{\theta^*} is a biased estimator of \hat{\theta}. State its bias.

from what i guess, the first one should be biasd?

ghostyc · Nov 30, 2009

statdad said:

sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

 E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2] 

First: since Y_1, Y_2, \dots, Y_n are independent and identically distributed,

 E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n 

so all the terms in the sum are equal.

Second:

 Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y) 

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.

to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance \theta is to be estimated from observations Y_1, Y_2, \dots, Y_n using

\hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2

where\bar{Y} is the mean.no mention of i.i.d at all...

maybe it's supposed to mean that..thanks

SW VandeCarr · Nov 30, 2009

statdad said:

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

 \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 

is a biased estimator of \sigma^2.

You are correct that this estimator is biased. I was thinking of the estimator with Bessel's correction as an unbiased estimator of population variance.

http://mathworld.wolfram.com/BesselsCorrection.html

http://en.wikipedia.org/wiki/Bessel's_correction

statdad · Nov 30, 2009

ghostyc said:

to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance \theta is to be estimated from observations Y_1, Y_2, \dots, Y_n using

\hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2

where\bar{Y} is the mean.

no mention of i.i.d at all...

maybe it's supposed to mean that..

thanks

The estimator you have is biased - the goal of your problem is to find an expression for that bias. And yes, the context of this problem is that the Ys are i.i.d.

The greater picture is this: while it is true that \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 converges in probability to \theta, (it is consistent) the fact that it is biased means that when you take repeated samples and calculate the sample variance for each, the mean of those sample variances will not converge to \theta.

The estimator

 \frac{1}{n-1} \sum_{i=1}^n (Y_i - \overline Y)^2 

is both unbiased and consistent.

ghostyc · Dec 7, 2009

at last, i think i got it

\operatorname{E}(\hat{\theta})=\frac{1}{n}\left(\sum_{i=1}^n \theta - n\frac{\theta}{n} \right)=\frac{1}{n}(n\theta-\theta)=\theta-\frac{\theta}{n} \quad \implies \quad \operatorname{Bias}(\hat{\theta})=\operatorname{E}(\hat{\theta})-\theta=\frac{\theta}{n}

Estimating Population Variance from Observations

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Help me understand skewness in QQ-plots please

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

A Distribution of Range of Samples taken from N(0,1)

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers