# BIAS of theta

1. Nov 30, 2009

### ghostyc

Suppose the population variance $$\theta$$ is to be estimated from observations $$Y_1, Y_2, \dots, Y_n$$ using

$$\hat{\theta} = \left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right)$$

where$$\bar{Y}$$ is the mean.

when it didnt say which distribution $$Y_i$$s are?

thanks

Last edited: Nov 30, 2009
2. Nov 30, 2009

### SW VandeCarr

Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples.

Bias arises when the population changes during the sampling period, or when part of the population is not accessible by the sampling method, for example.

Last edited: Nov 30, 2009
3. Nov 30, 2009

### ghostyc

hmmmmmm

that's how far i can go ...

$$\frac{1}{n} E \left( \sum_{i=1}^n Y_i^2 - n \bar{Y}^2 \right)$$

and i think from CLT i have $$\bar{Y} \sim N\left(\mu,\frac{\theta}{n} \right)$$

then i am stuck...
thanks

4. Nov 30, 2009

"Bias arises from incorrect sampling technique such that the estimated population mean does not converge to the true population mean with repeated sampling. You don't need to know the underlying population distribution. The Central Limit Theorem guarantees that the sample means will converge to a normal distribution. The mean of the sample means however may not converge to the population mean if the sampling is biased. You cannot detect bias with just one or too few samples."

slightly true, but not relevant here. We say an estimator of a parameter is unbiased if the expectation of the estimator is the parameter. So, for all distributions that have means, $$\overline X$$ is an unbiased estimator of $$\mu$$, since

$$E(\overline X) = \mu$$

An estimator is biased if its expectation is not equal to the parameter it estimates.

1) Find the expectation of the sample variance (this doesn't depend on the underlying distribution: the assumption is that the required expectations exist). This expectation will be a function of $$\theta$$, the population variance

2) The bias is the difference between the result of step 1 and $$\theta$$

5. Nov 30, 2009

### ghostyc

Hey there,

i know what 'bias' means.
the only problem is that i cant simplify that expression as i did in the previous any more
i have tried varies ways...
and i know it looks like a simple question , i just cant get it right...
maybe i will try tmr with my fresh brain LOL

thanks

6. Nov 30, 2009

### SW VandeCarr

I'm assuming that you're using unbiased estimators. Your estimate is the sample variance. You don't know $$\mu,\theta$$. The real issue is with random sampling. Why would you think that calculating sample variance in the usual way would introduce a bias?

7. Nov 30, 2009

sorry. didn't get your first point.

ONE approach (not the only one):

You need to simplify

$$E\left(\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2\right) = \frac 1 n \sum_{i=1}^n E[(Y_i - \overline Y)^2]$$

First: since $$Y_1, Y_2, \dots, Y_n$$ are independent and identically distributed,

$$E[(Y_i - \overline Y)^2] = E[(Y_1-\overline Y)^2], \quad i = 1, 2, \dots, n$$

so all the terms in the sum are equal.

Second:

$$Y_1 - \overline Y = \left(\frac{n-1} n\right)Y_1 - \frac 1 n \sum_{i = 2}^n (Y_i - \overline Y)$$

use this to "simplify" the squares: it won't be pretty, but the expectations will be a little easier to deal with.

8. Nov 30, 2009

No, the issue here is not sampling. Bias in this context has to do with whether or not the expectation of an estimator is or is not equal to the target parameter. IF the expectation equals the target parameter, the estimator is unbiased: if the expectation does not equal the target, the estimator is biased. That has nothing to do with how the sample was obtained, it has to do with the estimator itself.

And it is (should be) well known that

$$\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2$$

is a biased estimator of $$\sigma^2$$.

9. Nov 30, 2009

### ghostyc

i am totally lost now...

are you impliying that it's an UNBIASED estimator of the variance?

let me check my question in the textbook...

there is a follow up question says,

When a Bootstrap sample of size n is taken, the Bootstrap estimate $$\hat{\theta^*}$$ is a biased estimator of $$\hat{\theta}$$. State its bias.

from what i guess, the first one should be biasd?

10. Nov 30, 2009

### ghostyc

to be honest, i now remember doing the similar thing in my lectures,
however, the question is
====================================================

Suppose the population variance $$\theta$$ is to be estimated from observations $$Y_1, Y_2, \dots, Y_n$$ using

$$\hat{\theta} = \frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2$$

where$$\bar{Y}$$ is the mean.

no mention of i.i.d at all...

maybe it's supposed to mean that..

thanks

11. Nov 30, 2009

### SW VandeCarr

You are correct that this estimator is biased. I was thinking of the estimator with Bessel's correction as an unbiased estimator of population variance.

http://mathworld.wolfram.com/BesselsCorrection.html

http://en.wikipedia.org/wiki/Bessel's_correction

12. Nov 30, 2009

The estimator you have is biased - the goal of your problem is to find an expression for that bias. And yes, the context of this problem is that the Ys are i.i.d.

The greater picture is this: while it is true that $$\frac 1 n \sum_{i=1}^n (Y_i - \overline Y)^2$$ converges in probability to $$\theta$$, (it is consistent) the fact that it is biased means that when you take repeated samples and calculate the sample variance for each, the mean of those sample variances will not converge to $$\theta$$.

The estimator

$$\frac{1}{n-1} \sum_{i=1}^n (Y_i - \overline Y)^2$$

is both unbiased and consistent.

13. Dec 7, 2009

### ghostyc

at last, i think i got it

$$\operatorname{E}(\hat{\theta})=\frac{1}{n}\left(\sum_{i=1}^n \theta - n\frac{\theta}{n} \right)=\frac{1}{n}(n\theta-\theta)=\theta-\frac{\theta}{n} \quad \implies \quad \operatorname{Bias}(\hat{\theta})=\operatorname{E}(\hat{\theta})-\theta=\frac{\theta}{n}$$