Distribution of sample mean and variance, and variance of sample means

Nikitin · Mar 17, 2014

Distributions: sample mean and variance, and variance of sample means?

Hi. Say you have a population, and from there you can draw a stochastic variable ##X## with a specified distribution. So you take out a few sizeable samples from the population, and calculate the mean and variance of ##X## for each sample.

A few easy questions:
1) What distribution will ##\bar{X}## have? I assume it will have the same distribution as ##X##? Makes intuitive sense, but can somebody explain it to me anyway?
2) What distribution will ##Var(X)## have?
3) What will the distribution of ##Var(\bar{X})## be? From the central limit theorem, I know that ##E(\bar{X})## has a normal distribution. But what about the variance?

thanks for help :)

Stephen Tashi · Mar 17, 2014

Nikitin said:

.. and calculate the mean and variance of ##X## for each sample.

You should reform your terminology. In general you can't "calculate" the mean and variance "of X" from a sample. Instead you can only "estimate" the mean and variance of X by doing computations on the sample. One often uses the "sample mean" and "sample variance" as estimators.

You should also mention that you are assuming the the samples are independent realizations of X.

1) What distribution will ##\bar{X}## have? I assume it will have the same distribution as ##X##? Makes intuitive sense, but can somebody explain it to me anyway?

The sample mean won't, in general, have the same distribution as X. Intutition says it will have a smaller variance than X does.

2) What distribution will ##Var(X)## have?

Assuming [itex] Var(X) [/itex] denotes the sample variance, I don't think you can answer that question, in general. In the special case when X has a normal distribution then the sample variance has a chi-squared distribution of some sort.

3) What will the distribution of ##Var(\bar{X})## be?

Your notation is unclear. If [itex] \bar{X} [/itex] is the sample mean then it is an random variable and it's variance is a single number, not another random variable.

From the central limit theorem, I know that ##E(\bar{X})## has a normal distribution.

I don't know what you mean by a "distribution" of [itex] E( \bar{X}) [/itex]. If you intend [itex] \bar{X} [/itex] to denote the sample mean then the expectation of the sample mean is a single number. It is the expectation of a random variable, not a estimator computed from samples of that random variable.

Are you are thinking about an experiment consisting of groups of sub-experiments where each sub experiment is a sample of several independent realizations of a random variable?

Nikitin · Mar 17, 2014

Stephen Tashi said:

You should reform your terminology. In general you can't "calculate" the mean and variance "of X" from a sample. Instead you can only "estimate" the mean and variance of X by doing computations on the sample. One often uses the "sample mean" and "sample variance" as estimators.

You should also mention that you are assuming the the samples are independent realizations of X.

OK. Thanks.

The sample mean won't, in general, have the same distribution as X. Intutition says it will have a smaller variance than X does.

What if X is normally distributed? W

Assuming [itex] Var(X) [/itex] denotes the sample variance, I don't think you can answer that question, in general. In the special case when X has a normal distribution then the sample variance has a chi-squared distribution of some sort.

Is that because the estimate for the variance contains squared X-terms?

Your notation is unclear. If [itex] \bar{X} [/itex] is the sample mean then it is an random variable and it's variance is a single number, not another random variable.

Well, ##\bar{X}## is a statistic which has its own distribution. Is that wrong?

And so ##Var(\bar{X})## also has a distribution?

I don't know what you mean by a "distribution" of [itex] E( \bar{X}) [/itex]. If you intend [itex] \bar{X} [/itex] to denote the sample mean then the expectation of the sample mean is a single number. It is the expectation of a random variable, not a estimator computed from samples of that random variable.

I meant the distribution ##\mu_{\bar{X}}##. Isn't that the same as the expectation value of ##\bar{X}##?

Are you are thinking about an experiment consisting of groups of sub-experiments where each sub experiment is a sample of several independent realizations of a random variable?

No, no experimets. I was talking about taking multiple samples from a population, estimating the means (1) and variances (2) of the samples, and the means (3) and variances (4) of the means of the samples, and seeing what distribution all of those got.

I just want to know what happens if X is, say, normally distributed or whatever. Just trying to understand statistics.

Stephen Tashi · Mar 17, 2014

The "expectation" of a random variable is a single number, not a distribution. I suggest you ask your questions again after you study the technical definitions for the words you are using.

Nikitin · Mar 17, 2014

Kind of hard to study them when I don't know what I have understood correctly and what not. I am actually learning quite allot here about the way to express myself (you can't learn to speak a language purely by reading from a book, can you?). At any rate, I understand and appreciate your corrections and hope it's not too difficult to understand me.

Questions restated in a more correct manner:

1) What kinds of distributions can ##\bar{X}## have, and how do they depend on the distributions of ##X##?
2) What distribution will ##\sigma_{X}^2## have?
3) What will the distribution of ##\sigma_{\bar{X}}^2## be? From the central limit theorem, I know that ##\mu_{\bar{X}}## has a normal distribution. But what about the sample variance of the means ?

Stephen Tashi · Mar 17, 2014

Let's start with the fact that many statistical terms (such as "mean", "variance") have at least 3 different interpretations. Distinguishing among these meanings requires additional words or context. Such words may refer to:

1) A property of the distribution of a random variable. (e.g. The mean value of a normally distributed random variable

2) A random variable defined as a function of values in a sample of another random variable
(e.g. the sample mean of 10 independent samples from a normal distribution with mean 0 and variance 1. The sample mean is defined by a formula that makes it a function of the values in the sample.)

3) One specific numerical value computed as a function of the values in a sample.
(e.g. "The sample mean was 43.6")

You are using notation that has traditional interpretations, but your questions suggest you don't follow the interpretations. The traditional interpretation of the notation [itex] E(...) [/itex] is that it is the "expected value of" a random variable. This is the same as the "mean of" a random variable. Thus this notation indicates a single number, not a random quantity that has a distribution.

Likewise, a common use of the notation [itex] Var(...) [/itex] is to mean the variance of a random variable. By this interpretation, [itex] Var(...) [/itex] denotes a single number, not a number that has some random variation in it.

I suspect you want to know things about the "sample mean" and "sample variance". Considering them to be random variables, it does make sense to ask about their distributions and the means and variances of those distributions. But it doesn't make sense to ask "What is the distribution of the expected value of the sample mean". The expected value of the sample mean will be a single number. It won't be randomly distributed over different values.

Nikitin · Mar 18, 2014

I see. So the mean of sample means can be seen as an estimate of the expectation value of the original random variable. Or an estimate of the mean of all the events from the entire population that resulted from the random variable.

Am I correct?

Stephen Tashi · Mar 18, 2014

Nikitin said:

I see. So the mean of sample means can be seen as an estimate of the expectation value of the original random variable. Or an estimate of the mean of all the events from the entire population that resulted from the random variable.

Am I correct?

Yes. (At least: yes, the sample mean is an estimator of the mean. The "expectation of the sample means" is not a function of the values in a particular sample, so it is not an estimator.)

However, keep in mind that many different estimators of the expectation are possible.

The mean is intuitively pleasing because the computation of the sample mean (say, from a histogram of sample values) is analogous to how the expectation is computed from the distribution. In the early days of statistics, an estimator computed from a sample in a manner that is analogous to how a distribution parameter is computed from a distribution was called a "consistent" estimator. The modern definition of a "consistent" estimator is different.

A function of the values in a sample is called a "statistic". The modern definition of function allows a function to be any sort of rule or complicated algorithm that produces a unique result from a given set of values. An "estimator" is a "statistic". The term "estimator" conveys the human intention to use the statistic to estimate something.

Besides the mean, it is possible to create other estimators of the expectation of a random variable. For example: (max sample value - min sample value)/2 is function of the values in a sample, so it is a "statistic" and if you use it to estimate the expected value of the random variable, it is an "estimator".

If we have two different estimators for the same property of a distribution, it is natural to ask which one is "best". This brings up the problem of defining "best". No unique way of defining "best" has been found. Instead, there are various technical definitions of "good aspects" of estimators. Common good aspects are "unbiased", "minimum variance" ,"maximum likelihood". These terms have technical definitions.

Nikitin · Mar 19, 2014

Thank you. But how does the distribution of the sample variance of ##X##, ##S##, depend on the distribution of ##X##? And I assume it also is highly dependant on the sample size?

Stephen Tashi · Mar 19, 2014

Are you asking for a formula of some kind that gives the parameters of the distribution of the sample variance [itex]S [/itex] as a function of parameters of the distribution of [itex] X [/itex]? I know of no such general formula.

Yes, the distribution of [itex] S [/itex] would, in general, depend on the number of realizations of X in the sample. ( Statistics texts often use the word "sample" to mean a set of realizations of a random variable, and "sample size" to denote the number of realizations, but in other contexts, people use "sample" to mean a single realization of X.)

For particular families of distributions, the distribution of the sample variance may be known. But I don't know of any general approach that is applied to derive the distribution of the sample variance.

Nikitin · Mar 21, 2014

I was asking for important cases/relations, like if ##X ## is normal, ##S## is chi squared. But it's OK, I guess I will find out this stuff anyway as my experience with statistics increases.

Thanks for the help and patience :)

Distribution of sample mean and variance, and variance of sample means

1. What is the distribution of sample mean and variance?

2. How is the variance of sample means calculated?

3. What is the relationship between sample size and the variance of sample means?

4. How is the distribution of sample mean and variance affected by the central limit theorem?

5. Why is the distribution of sample mean and variance important in statistical analysis?

Similar threads

Hot Threads

Recent Insights