Understanding Variances in Sampled Data | PSU STAT 414

  • Context: Graduate 
  • Thread starter Thread starter georg gill
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around the concept of variance in sampled data, particularly in the context of statistical analysis and the implications of sample size on variability. Participants explore the definitions and calculations of variance, the distinction between sample size and the number of samples, and the assumptions underlying these concepts.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant expresses confusion regarding the notation of n in the variance formula, questioning whether it refers to the number of samples or the number of elements in each sample.
  • Some participants clarify that n is typically understood as the number of samples, and they assert that all samples drawn from the same population will have the same variance.
  • Another participant points out discrepancies between the derivation in the provided link and examples from their textbook, suggesting that the definitions of variance may differ.
  • A later reply discusses the different meanings of "variance," distinguishing between the variance of a random variable, the variance of a sample, and the unbiased estimator of the population variance.
  • Participants note that the mean and variance of a sample may not equal those of the population, and they highlight the importance of precise language in discussing these concepts.
  • One participant raises a question about the central limit theorem and its implications for the standard normal distribution as sample size approaches infinity.

Areas of Agreement / Disagreement

There is no clear consensus among participants regarding the definitions and implications of variance in sampled data. Multiple competing views and interpretations remain, particularly concerning the notation and assumptions about sample size and variance.

Contextual Notes

Participants express uncertainty about the definitions of variance and the implications of sample size on variability. There are references to different formulas for variance, which may depend on the context or the specific statistical approach taken.

georg gill
Messages
151
Reaction score
6
https://onlinecourses.science.psu.edu/stat414/node/167

in the link they prove variance of samples. It is a bit long to write it all here so I hope one can read link. Main problem for me i will write here. It starts with:

Var(\frac{X_1+X_2...X_n}{n})

they write n here because it is n samples to get mean of samples. But later in the text they say :

If the instructor had taken larger samples of students, she would have seen less variability in the samples she was obtaining. That is a good thing , but of course, in general, the costs of research studies no doubt increase as the sample size n increases

it seems they are referring to n as number of elements in each sample but from proof it seems to be number of samples. I don't get this. And they also assume that variance is the same for all samples (the second final step) but the example referred to about the teacher which is on the page before in the link have different variances.

From examples in my textbook it also seems that n is number of tests in a sample not the number of samples which i thought would make sense from this proof
 
Physics news on Phys.org
Xk is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.
 
mathman said:
Xk is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.
Here is an example from my book where they use n as number of elements in a sample to calculate standard deviation:

http://bildr.no/view/1124676

I just can't see how that is possible friom the derivation given in the link in my first post
 
Last edited:
georg gill said:
I just can't see how that is possible friom the derivation given in the link in my first post

How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.
 
Stephen Tashi said:
How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.

Say you have n samples with one in each:
You could say that they have the same variance (does not make sense to me but).
If you assume that you could calculate them from formula for variance of samples:\frac{\sigma^2}{n} (a)

Where n is the number of groups of samples consisting of one in each

But since this is n different samples with only one in each one could have found variation between samples (since each sample would be the same as one attempt I would have thought) as:

\Sigma_{x=1}^{n}\frac{1}{n-1}(X_i-\mu)^2 (b)

(I used n-1 bcause variance could be biased I am not sure if anything is more correct then any other here)

for example here they use (b):

http://bildr.no/view/1124823

But these two are not the same. How does this work one will get different Z from them in approximating normal distribution

If I were to solve this as an assignment I would assume only one variance would be correct. This confuses me.
 
Last edited:
My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let \bar{X} be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of n numbers \{ X_1, X_2,...X_n \}
to be \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n} and some define it to be \frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1}

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1} (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".
 
Stephen Tashi said:
My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let \bar{X} be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of n numbers \{ X_1, X_2,...X_n \}
to be \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n} and some define it to be \frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1}

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1} (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".
I guess my understanding is vague. http://bildr.no/view/1124676 (b)But I think I got it now. Here is an example that makes the difference a bit clearer. This is normal distribution for one element compared to many in (b):

http://bildr.no/view/1125322 (c)

but what i do not get is this definition from central limit theroem

http://bildr.no/view/1125334 (d)

the part i don't get is as n \rightarrow \infty , is the standard normal distribution n(z;0,1)

is that possible to prove?
 
Last edited:
georg gill said:
is that possible to prove?

Yes, but I'm not saying I can do the proof!

It's an interesting problem in itself just to define what it means for a sequence of distributions to approach another distribution.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
4K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 27 ·
Replies
27
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 22 ·
Replies
22
Views
4K