Understanding Variances in Sampled Data | PSU STAT 414

georg gill · Mar 5, 2012

https://onlinecourses.science.psu.edu/stat414/node/167

in the link they prove variance of samples. It is a bit long to write it all here so I hope one can read link. Main problem for me i will write here. It starts with:

[tex]Var(\frac{X_1+X_2...X_n}{n})[/tex]

they write n here because it is n samples to get mean of samples. But later in the text they say :

If the instructor had taken larger samples of students, she would have seen less variability in the samples she was obtaining. That is a good thing , but of course, in general, the costs of research studies no doubt increase as the sample size n increases

it seems they are referring to n as number of elements in each sample but from proof it seems to be number of samples. I don't get this. And they also assume that variance is the same for all samples (the second final step) but the example referred to about the teacher which is on the page before in the link have different variances.

From examples in my textbook it also seems that n is number of tests in a sample not the number of samples which i thought would make sense from this proof

mathman · Mar 5, 2012

X_k is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.

georg gill · Mar 5, 2012

mathman said:

X_k is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.

Here is an example from my book where they use n as number of elements in a sample to calculate standard deviation:

http://bildr.no/view/1124676

I just can't see how that is possible friom the derivation given in the link in my first post

Stephen Tashi · Mar 5, 2012

georg gill said:

I just can't see how that is possible friom the derivation given in the link in my first post

How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.

georg gill · Mar 6, 2012

Stephen Tashi said:

How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.

Say you have n samples with one in each:
You could say that they have the same variance (does not make sense to me but).
If you assume that you could calculate them from formula for variance of samples:[tex]\frac{\sigma^2}{n}[/tex] (a)

Where n is the number of groups of samples consisting of one in each

But since this is n different samples with only one in each one could have found variation between samples (since each sample would be the same as one attempt I would have thought) as:

[tex]\Sigma_{x=1}^{n}\frac{1}{n-1}(X_i-\mu)^2[/tex] (b)

(I used n-1 bcause variance could be biased I am not sure if anything is more correct then any other here)

for example here they use (b):

http://bildr.no/view/1124823

But these two are not the same. How does this work one will get different Z from them in approximating normal distribution

If I were to solve this as an assignment I would assume only one variance would be correct. This confuses me.

Stephen Tashi · Mar 6, 2012

My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let [itex]\bar{X}[/itex] be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of [itex]n[/itex] numbers [itex]\{ X_1, X_2,...X_n \}[/itex]
to be [itex]\frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n}[/itex] and some define it to be [itex]\frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1}[/itex]

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function [itex]\frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1}[/itex] (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".

georg gill · Mar 6, 2012

Stephen Tashi said:

My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let [itex]\bar{X}[/itex] be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of [itex]n[/itex] numbers [itex]\{ X_1, X_2,...X_n \}[/itex]
to be [itex]\frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n}[/itex] and some define it to be [itex]\frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1}[/itex]

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function [itex]\frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1}[/itex] (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".

I guess my understanding is vague. http://bildr.no/view/1124676 (b)But I think I got it now. Here is an example that makes the difference a bit clearer. This is normal distribution for one element compared to many in (b):

http://bildr.no/view/1125322 (c)

but what i do not get is this definition from central limit theroem

http://bildr.no/view/1125334 (d)

the part i don't get is as [tex]n \rightarrow \infty[/tex] , is the standard normal distribution n(z;0,1)

is that possible to prove?

Stephen Tashi · Mar 6, 2012

georg gill said:

is that possible to prove?

Yes, but I'm not saying I can do the proof!

It's an interesting problem in itself just to define what it means for a sequence of distributions to approach another distribution.

Understanding Variances in Sampled Data | PSU STAT 414

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Graduate Probability puzzle

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect