Understanding Variances in Sampled Data | PSU STAT 414

  • Thread starter georg gill
  • Start date
In summary: Then, if we repeated that sampling procedure 20 times, we would obtain a 20-element vector (each element being the variance of the student in the sample).3) Finally, there is the "variance of a population" when we speak of the population as a whole. This is the variance of the mean, or some other representative value, computed by multiplying the variances of all the samples from that population.In the link they prove variance of samples. It is a bit long to write it all here so I hope one can read link. Main problem for me i will write here. It starts with:They write n
  • #1
georg gill
153
6
https://onlinecourses.science.psu.edu/stat414/node/167

in the link they prove variance of samples. It is a bit long to write it all here so I hope one can read link. Main problem for me i will write here. It starts with:

[tex]Var(\frac{X_1+X_2...X_n}{n})[/tex]

they write n here because it is n samples to get mean of samples. But later in the text they say :

If the instructor had taken larger samples of students, she would have seen less variability in the samples she was obtaining. That is a good thing , but of course, in general, the costs of research studies no doubt increase as the sample size n increases

it seems they are referring to n as number of elements in each sample but from proof it seems to be number of samples. I don't get this. And they also assume that variance is the same for all samples (the second final step) but the example referred to about the teacher which is on the page before in the link have differnt variances.

From examples in my textbook it also seems that n is number of tests in a sample not the number of samples which i thought would make sense from this proof
 
Physics news on Phys.org
  • #2
Xk is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.
 
  • #3
mathman said:
Xk is a sample from a population. n is the number of samples. All samples are from the same population, so the variance will be the same for each sample.
Here is an example from my book where they use n as number of elements in a sample to calculate standard deviation:

http://bildr.no/view/1124676

I just can't see how that is possible friom the derivation given in the link in my first post
 
Last edited:
  • #4
georg gill said:
I just can't see how that is possible friom the derivation given in the link in my first post

How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.
 
  • #5
Stephen Tashi said:
How what is possible?

The term sample can refer to a single realization of a random variable or it can refer to a set of realizations of a random variable, so, as data, a "sample" may be one number or a set of numbers. When the sample consists of more than one number, the number of numbers in the sample is the "sample size".

When the sample size is n, the computation of the sample mean involves dividing by n. The link you gave in this post is an example about the distribution of the sample mean. Note the bar over the X in that example. This is the notation for the sample mean.

Say you have n samples with one in each:
You could say that they have the same variance (does not make sense to me but).
If you assume that you could calculate them from formula for variance of samples:[tex]\frac{\sigma^2}{n}[/tex] (a)

Where n is the number of groups of samples consisting of one in each

But since this is n different samples with only one in each one could have found variation between samples (since each sample would be the same as one attempt I would have thought) as:

[tex]\Sigma_{x=1}^{n}\frac{1}{n-1}(X_i-\mu)^2[/tex] (b)

(I used n-1 bcause variance could be biased I am not sure if anything is more correct then any other here)

for example here they use (b):

http://bildr.no/view/1124823

But these two are not the same. How does this work one will get different Z from them in approximating normal distribution

If I were to solve this as an assignment I would assume only one variance would be correct. This confuses me.
 
Last edited:
  • #6
My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let [itex] \bar{X} [/itex] be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of [itex] n [/itex] numbers [itex] \{ X_1, X_2,...X_n \}[/itex]
to be [itex] \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n} [/itex] and some define it to be [itex] \frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1} [/itex]

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function [itex] \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1} [/itex] (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".
 
  • #7
Stephen Tashi said:
My analysis of your difficulty is that your don't use language precisely. In the first place, one must distinguish among three different terms involving the word "variance".

1) There is the "variance of a random variable". (If we speak of the sample mean as a random variable, its variance is computed by an expression involving the integral of a probability density, or, for discrete distribution, by a sum involving the probabilities in the distribution).

2) There is the "variance of a sample" when we speak of the sample as a particular set of numbers. Let [itex] \bar{X} [/itex] be the numerical value of the sample mean. Textbooks vary in how they define the variance of a sample. Some define the variance of [itex] n [/itex] numbers [itex] \{ X_1, X_2,...X_n \}[/itex]
to be [itex] \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n} [/itex] and some define it to be [itex] \frac { \sum_{i=1}^n (X_i- \bar{X})^2}{n-1} [/itex]

3) There is the "unbiased estimator of the population variance". This is a function of the sample values. It is the function [itex] \frac { \sum_{i=1}^n (X_i - \bar{X})^2}{n-1} [/itex] (Note that an "estimator" is technically a function, not a single number. When we have a particular sample, we can substitute particular numbers into the formula for the estimator and get a particular estimate.)

It should be clear that the mean of a sample of a population might not be the mean of the population. Likewise the the variance of a sample may not be the variance of the population. Likewise an estimate produced by the unbiased estimator of the population variance may fail to equal the population variance.

All three of the above things are dealt with in the various links you gave. Your confusion comes from the fact that you think all the various links are referring to only one single concept of "variance".
I guess my understanding is vague. http://bildr.no/view/1124676 (b)But I think I got it now. Here is an example that makes the difference a bit clearer. This is normal distribution for one element compared to many in (b):

http://bildr.no/view/1125322 (c)

but what i do not get is this definition from central limit theroem

http://bildr.no/view/1125334 (d)

the part i don't get is as [tex]n \rightarrow \infty[/tex] , is the standard normal distribution n(z;0,1)

is that possible to prove?
 
Last edited:
  • #8
georg gill said:
is that possible to prove?

Yes, but I'm not saying I can do the proof!

It's an interesting problem in itself just to define what it means for a sequence of distributions to approach another distribution.
 

1. What is a variance of a sample?

A variance of a sample is a statistical measure that represents the degree of spread or variability of a set of data points around the mean of the sample. It is calculated by finding the average of the squared differences between each data point and the mean of the sample.

2. How is the variance of a sample calculated?

The variance of a sample is calculated by finding the sum of the squared differences between each data point and the mean of the sample, and then dividing that sum by the total number of data points in the sample.

3. Why is it important to calculate the variance of a sample?

The variance of a sample is an important measure in statistics because it provides information about the spread of the data points. It allows us to understand how much the data points deviate from the mean and can help identify any outliers in the data.

4. What is the relationship between the variance of a sample and its standard deviation?

The standard deviation of a sample is the square root of its variance. This means that the standard deviation is a measure of the spread of the data points in the same units as the original data, while the variance is in squared units.

5. Can the variance of a sample be negative?

No, the variance of a sample cannot be negative. It is always a positive value because it is calculated by squaring the differences between data points and the mean, which eliminates any negative values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
815
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
Replies
9
Views
1K
  • Programming and Computer Science
Replies
25
Views
2K
  • Precalculus Mathematics Homework Help
Replies
14
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Precalculus Mathematics Homework Help
Replies
15
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
Back
Top