Distribution of sample mean and variance, and variance of sample means

Click For Summary
SUMMARY

The discussion centers on the distributions of the sample mean (##\bar{X}##) and sample variance (##S^2##) in relation to a population variable (##X##). It is established that the sample mean does not share the same distribution as the population variable and typically has a smaller variance. The sample variance, when derived from a normally distributed variable, follows a chi-squared distribution. The conversation emphasizes the importance of precise terminology in statistics, particularly distinguishing between estimators and the properties of random variables.

PREREQUISITES
  • Understanding of random variables and their distributions
  • Familiarity with statistical concepts such as sample mean and sample variance
  • Knowledge of the Central Limit Theorem
  • Basic understanding of chi-squared distribution
NEXT STEPS
  • Study the properties of the Central Limit Theorem in depth
  • Learn about the chi-squared distribution and its applications in statistics
  • Explore different types of estimators and their properties
  • Investigate how sample size affects the distribution of sample variance
USEFUL FOR

Statisticians, data analysts, and students studying probability and statistics who seek to deepen their understanding of sample distributions and estimation techniques.

Nikitin
Messages
734
Reaction score
27
Distributions: sample mean and variance, and variance of sample means?

Hi. Say you have a population, and from there you can draw a stochastic variable ##X## with a specified distribution. So you take out a few sizeable samples from the population, and calculate the mean and variance of ##X## for each sample.

A few easy questions:
1) What distribution will ##\bar{X}## have? I assume it will have the same distribution as ##X##? Makes intuitive sense, but can somebody explain it to me anyway?
2) What distribution will ##Var(X)## have?
3) What will the distribution of ##Var(\bar{X})## be? From the central limit theorem, I know that ##E(\bar{X})## has a normal distribution. But what about the variance?

thanks for help :)
 
Last edited:
Physics news on Phys.org
Nikitin said:
.. and calculate the mean and variance of ##X## for each sample.

You should reform your terminology. In general you can't "calculate" the mean and variance "of X" from a sample. Instead you can only "estimate" the mean and variance of X by doing computations on the sample. One often uses the "sample mean" and "sample variance" as estimators.

You should also mention that you are assuming the the samples are independent realizations of X.

1) What distribution will ##\bar{X}## have? I assume it will have the same distribution as ##X##? Makes intuitive sense, but can somebody explain it to me anyway?
The sample mean won't, in general, have the same distribution as X. Intutition says it will have a smaller variance than X does.

2) What distribution will ##Var(X)## have?

Assuming Var(X) denotes the sample variance, I don't think you can answer that question, in general. In the special case when X has a normal distribution then the sample variance has a chi-squared distribution of some sort.

3) What will the distribution of ##Var(\bar{X})## be?

Your notation is unclear. If \bar{X} is the sample mean then it is an random variable and it's variance is a single number, not another random variable.

From the central limit theorem, I know that ##E(\bar{X})## has a normal distribution.

I don't know what you mean by a "distribution" of E( \bar{X}). If you intend \bar{X} to denote the sample mean then the expectation of the sample mean is a single number. It is the expectation of a random variable, not a estimator computed from samples of that random variable.

Are you are thinking about an experiment consisting of groups of sub-experiments where each sub experiment is a sample of several independent realizations of a random variable?
 
Stephen Tashi said:
You should reform your terminology. In general you can't "calculate" the mean and variance "of X" from a sample. Instead you can only "estimate" the mean and variance of X by doing computations on the sample. One often uses the "sample mean" and "sample variance" as estimators.

You should also mention that you are assuming the the samples are independent realizations of X.
OK. Thanks.
The sample mean won't, in general, have the same distribution as X. Intutition says it will have a smaller variance than X does.
What if X is normally distributed? W
Assuming Var(X) denotes the sample variance, I don't think you can answer that question, in general. In the special case when X has a normal distribution then the sample variance has a chi-squared distribution of some sort.
Is that because the estimate for the variance contains squared X-terms?

Your notation is unclear. If \bar{X} is the sample mean then it is an random variable and it's variance is a single number, not another random variable.
Well, ##\bar{X}## is a statistic which has its own distribution. Is that wrong?

And so ##Var(\bar{X})## also has a distribution?

I don't know what you mean by a "distribution" of E( \bar{X}). If you intend \bar{X} to denote the sample mean then the expectation of the sample mean is a single number. It is the expectation of a random variable, not a estimator computed from samples of that random variable.
I meant the distribution ##\mu_{\bar{X}}##. Isn't that the same as the expectation value of ##\bar{X}##?

Are you are thinking about an experiment consisting of groups of sub-experiments where each sub experiment is a sample of several independent realizations of a random variable?

No, no experimets. I was talking about taking multiple samples from a population, estimating the means (1) and variances (2) of the samples, and the means (3) and variances (4) of the means of the samples, and seeing what distribution all of those got.

I just want to know what happens if X is, say, normally distributed or whatever. Just trying to understand statistics.
 
The "expectation" of a random variable is a single number, not a distribution. I suggest you ask your questions again after you study the technical definitions for the words you are using.
 
Kind of hard to study them when I don't know what I have understood correctly and what not. I am actually learning quite allot here about the way to express myself (you can't learn to speak a language purely by reading from a book, can you?). At any rate, I understand and appreciate your corrections and hope it's not too difficult to understand me.

Questions restated in a more correct manner:

1) What kinds of distributions can ##\bar{X}## have, and how do they depend on the distributions of ##X##?
2) What distribution will ##\sigma_{X}^2## have?
3) What will the distribution of ##\sigma_{\bar{X}}^2## be? From the central limit theorem, I know that ##\mu_{\bar{X}}## has a normal distribution. But what about the sample variance of the means ?
 
Let's start with the fact that many statistical terms (such as "mean", "variance") have at least 3 different interpretations. Distinguishing among these meanings requires additional words or context. Such words may refer to:

1) A property of the distribution of a random variable. (e.g. The mean value of a normally distributed random variable

2) A random variable defined as a function of values in a sample of another random variable
(e.g. the sample mean of 10 independent samples from a normal distribution with mean 0 and variance 1. The sample mean is defined by a formula that makes it a function of the values in the sample.)

3) One specific numerical value computed as a function of the values in a sample.
(e.g. "The sample mean was 43.6")

You are using notation that has traditional interpretations, but your questions suggest you don't follow the interpretations. The traditional interpretation of the notation E(...) is that it is the "expected value of" a random variable. This is the same as the "mean of" a random variable. Thus this notation indicates a single number, not a random quantity that has a distribution.

Likewise, a common use of the notation Var(...) is to mean the variance of a random variable. By this interpretation, Var(...) denotes a single number, not a number that has some random variation in it.

I suspect you want to know things about the "sample mean" and "sample variance". Considering them to be random variables, it does make sense to ask about their distributions and the means and variances of those distributions. But it doesn't make sense to ask "What is the distribution of the expected value of the sample mean". The expected value of the sample mean will be a single number. It won't be randomly distributed over different values.
 
  • Like
Likes   Reactions: 1 person
I see. So the mean of sample means can be seen as an estimate of the expectation value of the original random variable. Or an estimate of the mean of all the events from the entire population that resulted from the random variable.

Am I correct?
 
Nikitin said:
I see. So the mean of sample means can be seen as an estimate of the expectation value of the original random variable. Or an estimate of the mean of all the events from the entire population that resulted from the random variable.

Am I correct?

Yes. (At least: yes, the sample mean is an estimator of the mean. The "expectation of the sample means" is not a function of the values in a particular sample, so it is not an estimator.)

However, keep in mind that many different estimators of the expectation are possible.

The mean is intuitively pleasing because the computation of the sample mean (say, from a histogram of sample values) is analogous to how the expectation is computed from the distribution. In the early days of statistics, an estimator computed from a sample in a manner that is analogous to how a distribution parameter is computed from a distribution was called a "consistent" estimator. The modern definition of a "consistent" estimator is different.

A function of the values in a sample is called a "statistic". The modern definition of function allows a function to be any sort of rule or complicated algorithm that produces a unique result from a given set of values. An "estimator" is a "statistic". The term "estimator" conveys the human intention to use the statistic to estimate something.

Besides the mean, it is possible to create other estimators of the expectation of a random variable. For example: (max sample value - min sample value)/2 is function of the values in a sample, so it is a "statistic" and if you use it to estimate the expected value of the random variable, it is an "estimator".

If we have two different estimators for the same property of a distribution, it is natural to ask which one is "best". This brings up the problem of defining "best". No unique way of defining "best" has been found. Instead, there are various technical definitions of "good aspects" of estimators. Common good aspects are "unbiased", "minimum variance" ,"maximum likelihood". These terms have technical definitions.
 
Last edited:
  • Like
Likes   Reactions: 1 person
Thank you. But how does the distribution of the sample variance of ##X##, ##S##, depend on the distribution of ##X##? And I assume it also is highly dependent on the sample size?
 
Last edited:
  • #10
Are you asking for a formula of some kind that gives the parameters of the distribution of the sample variance S as a function of parameters of the distribution of X? I know of no such general formula.

Yes, the distribution of S would, in general, depend on the number of realizations of X in the sample. ( Statistics texts often use the word "sample" to mean a set of realizations of a random variable, and "sample size" to denote the number of realizations, but in other contexts, people use "sample" to mean a single realization of X.)

For particular families of distributions, the distribution of the sample variance may be known. But I don't know of any general approach that is applied to derive the distribution of the sample variance.
 
  • Like
Likes   Reactions: 1 person
  • #11
I was asking for important cases/relations, like if ##X ## is normal, ##S## is chi squared. But it's OK, I guess I will find out this stuff anyway as my experience with statistics increases.

Thanks for the help and patience :)
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 31 ·
2
Replies
31
Views
3K
Replies
5
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 28 ·
Replies
28
Views
3K