# Variances of samples

1. Mar 5, 2012

### georg gill

https://onlinecourses.science.psu.edu/stat414/node/167

in the link they prove variance of samples. It is a bit long to write it all here so I hope one can read link. Main problem for me i will write here. It starts with:

$$Var(\frac{X_1+X_2...X_n}{n})$$

they write n here because it is n samples to get mean of samples. But later in the text they say :

If the instructor had taken larger samples of students, she would have seen less variability in the samples she was obtaining. That is a good thing , but of course, in general, the costs of research studies no doubt increase as the sample size n increases

it seems they are refering to n as number of elements in each sample but from proof it seems to be number of samples. I dont get this. And they also assume that variance is the same for all samples (the second final step) but the example refered to about the teacher which is on the page before in the link have differnt variances.

From examples in my text book it also seems that n is number of tests in a sample not the number of samples which i thought would make sense from this proof

2. Mar 5, 2012

### mathman

Xk is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.

3. Mar 5, 2012

### georg gill

Here is an example from my book where they use n as number of elements in a sample to calculate standard deviation:

http://bildr.no/view/1124676

I just cant see how that is possible friom the derivation given in the link in my first post

Last edited: Mar 5, 2012
4. Mar 5, 2012

### Stephen Tashi

How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.

5. Mar 6, 2012

### georg gill

Say you have n samples with one in each:
You could say that they have the same variance (does not make sense to me but).
If you assume that you could calculate them from formula for variance of samples:

$$\frac{\sigma^2}{n}$$ (a)

Where n is the number of groups of samples consisting of one in each

But since this is n different samples with only one in each one could have found variation between samples (since each sample would be the same as one attempt I would have thought) as:

$$\Sigma_{x=1}^{n}\frac{1}{n-1}(X_i-\mu)^2$$ (b)

(I used n-1 bcause variance could be biased I am not sure if anything is more correct then any other here)

for example here they use (b):

http://bildr.no/view/1124823

But these two are not the same. How does this work one will get different Z from them in approximating normal distribution

If I were to solve this as an assignment I would assume only one variance would be correct. This confuses me.

Last edited: Mar 6, 2012
6. Mar 6, 2012

### Stephen Tashi

My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let $\bar{X}$ be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of $n$ numbers $\{ X_1, X_2,...X_n \}$
to be $\frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n}$ and some define it to be $\frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1}$

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function $\frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1}$ (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".

7. Mar 6, 2012

### georg gill

I guess my understanding is vague.

http://bildr.no/view/1124676 (b)

But I think I got it now. Here is an example that makes the difference a bit clearer. This is normal distribution for one element compared to many in (b):

http://bildr.no/view/1125322 (c)

but what i do not get is this definition from central limit theroem

http://bildr.no/view/1125334 (d)

the part i dont get is as $$n \rightarrow \infty$$ , is the standard normal distribution n(z;0,1)

is that possible to prove?

Last edited: Mar 6, 2012
8. Mar 6, 2012

### Stephen Tashi

Yes, but I'm not saying I can do the proof!

It's an interesting problem in itself just to define what it means for a sequence of distributions to approach another distribution.

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook