Estimating Population Variance from Observations

ghostyc
Messages
25
Reaction score
0
Suppose the population variance \theta is to be estimated from observations Y_1, Y_2, \dots, Y_n using

\hat{\theta} = \left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right)

where\bar{Y} is the mean.

when it didnt say which distribution Y_is are?

thanks
 
Last edited:
Physics news on Phys.org
ghostyc said:
population variance \theta

estimated using \hat{\theta}=\frac{1}{n}\sum_{i=1}^n (Y_i-\bar{Y})^2

how do find the bias of \hat{\theta}

when it didnt say which distribution Y_is are?

thanks

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.
 
Last edited:
SW VandeCarr said:
Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

hmmmmmm

that's how far i can go ...

\frac{1}{n} E \left( \sum_{i=1}^n Y_i^2 - n \bar{Y}^2 \right)

and i think from CLT i have \bar{Y} \sim N\left(\mu,\frac{\theta}{n} \right)

then i am stuck...
thanks
 
"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, \overline X is an unbiased estimator of \mu, since

<br /> E(\overline X) = \mu<br />

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of \theta, the population variance

2) The bias is the difference between the result of step 1 and \theta
 
statdad said:
"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, \overline X is an unbiased estimator of \mu, since

<br /> E(\overline X) = \mu<br />

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of \theta, the population variance

2) The bias is the difference between the result of step 1 and \theta

Hey there,

i know what 'bias' means.
the only problem is that i can't simplify that expression as i did in the previous any more
i have tried varies ways...
and i know it looks like a simple question , i just can't get it right...
maybe i will try tmr with my fresh brain LOL

thanks
 
statdad said:
<br /> E(\overline X) = \mu<br />

An estimator is biased if its expectation is not equal to the parameter it estimates.

to answer your question, you need to do these steps:
1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of \theta, the population variance

2) The bias is the difference between the result of step 1 and \theta

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know \mu,\theta. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?
 
sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

<br /> E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]<br />

First: since Y_1, Y_2, \dots, Y_n are independent and identically distributed,

<br /> E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n<br />

so all the terms in the sum are equal.

Second:

<br /> Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)<br />

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.
 
SW VandeCarr said:
I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know \mu,\theta. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

<br /> \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2<br />

is a biased estimator of \sigma^2.
 
SW VandeCarr said:
I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know \mu,\theta. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

i am totally lost now...

are you impliying that it's an UNBIASED estimator of the variance?

let me check my question in the textbook...

there is a follow up question says,

When a Bootstrap sample of size n is taken, the Bootstrap estimate \hat{\theta^*} is a biased estimator of \hat{\theta}. State its bias.

from what i guess, the first one should be biasd?
 
  • #10
statdad said:
sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

<br /> E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]<br />

First: since Y_1, Y_2, \dots, Y_n are independent and identically distributed,

<br /> E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n<br />

so all the terms in the sum are equal.

Second:

<br /> Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)<br />

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.
to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance \theta is to be estimated from observations Y_1, Y_2, \dots, Y_n using

\hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2

where\bar{Y} is the mean.no mention of i.i.d at all...

maybe it's supposed to mean that..thanks
 
  • #11
statdad said:
No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

<br /> \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2<br />

is a biased estimator of \sigma^2.

You are correct that this estimator is biased. I was thinking of the estimator with Bessel's correction as an unbiased estimator of population variance.

http://mathworld.wolfram.com/BesselsCorrection.html

http://en.wikipedia.org/wiki/Bessel's_correction
 
  • #12
ghostyc said:
to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance \theta is to be estimated from observations Y_1, Y_2, \dots, Y_n using

\hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2

where\bar{Y} is the mean.


no mention of i.i.d at all...

maybe it's supposed to mean that..


thanks

The estimator you have is biased - the goal of your problem is to find an expression for that bias. And yes, the context of this problem is that the Ys are i.i.d.

The greater picture is this: while it is true that \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2 converges in probability to \theta, (it is consistent) the fact that it is biased means that when you take repeated samples and calculate the sample variance for each, the mean of those sample variances will not converge to \theta.

The estimator

<br /> \frac{1}{n-1} \sum_{i=1}^n (Y_i - \overline Y)^2<br />

is both unbiased and consistent.
 
  • #13
at last, i think i got it

\operatorname{E}(\hat{\theta})=\frac{1}{n}\left(\sum_{i=1}^n \theta - n\frac{\theta}{n} \right)=\frac{1}{n}(n\theta-\theta)=\theta-\frac{\theta}{n} \quad \implies \quad \operatorname{Bias}(\hat{\theta})=\operatorname{E}(\hat{\theta})-\theta=\frac{\theta}{n}
 
Back
Top