Distribution of sample mean and variance, and variance of sample means

In summary: I just want to know what happens if X is, say, normally distributed or whatever. Just trying to understand statistics.The "expected value" of a random variable is a single number, not a distribution.
  • #1
Nikitin
735
27
Distributions: sample mean and variance, and variance of sample means?

Hi. Say you have a population, and from there you can draw a stochastic variable ##X## with a specified distribution. So you take out a few sizeable samples from the population, and calculate the mean and variance of ##X## for each sample.

A few easy questions:
1) What distribution will ##\bar{X}## have? I assume it will have the same distribution as ##X##? Makes intuitive sense, but can somebody explain it to me anyway?
2) What distribution will ##Var(X)## have?
3) What will the distribution of ##Var(\bar{X})## be? From the central limit theorem, I know that ##E(\bar{X})## has a normal distribution. But what about the variance?

thanks for help :)
 
Last edited:
Physics news on Phys.org
  • #2
Nikitin said:
.. and calculate the mean and variance of ##X## for each sample.

You should reform your terminology. In general you can't "calculate" the mean and variance "of X" from a sample. Instead you can only "estimate" the mean and variance of X by doing computations on the sample. One often uses the "sample mean" and "sample variance" as estimators.

You should also mention that you are assuming the the samples are independent realizations of X.

1) What distribution will ##\bar{X}## have? I assume it will have the same distribution as ##X##? Makes intuitive sense, but can somebody explain it to me anyway?
The sample mean won't, in general, have the same distribution as X. Intutition says it will have a smaller variance than X does.

2) What distribution will ##Var(X)## have?

Assuming [itex] Var(X) [/itex] denotes the sample variance, I don't think you can answer that question, in general. In the special case when X has a normal distribution then the sample variance has a chi-squared distribution of some sort.

3) What will the distribution of ##Var(\bar{X})## be?

Your notation is unclear. If [itex] \bar{X} [/itex] is the sample mean then it is an random variable and it's variance is a single number, not another random variable.

From the central limit theorem, I know that ##E(\bar{X})## has a normal distribution.

I don't know what you mean by a "distribution" of [itex] E( \bar{X}) [/itex]. If you intend [itex] \bar{X} [/itex] to denote the sample mean then the expectation of the sample mean is a single number. It is the expectation of a random variable, not a estimator computed from samples of that random variable.

Are you are thinking about an experiment consisting of groups of sub-experiments where each sub experiment is a sample of several independent realizations of a random variable?
 
  • #3
Stephen Tashi said:
You should reform your terminology. In general you can't "calculate" the mean and variance "of X" from a sample. Instead you can only "estimate" the mean and variance of X by doing computations on the sample. One often uses the "sample mean" and "sample variance" as estimators.

You should also mention that you are assuming the the samples are independent realizations of X.
OK. Thanks.
The sample mean won't, in general, have the same distribution as X. Intutition says it will have a smaller variance than X does.
What if X is normally distributed? W
Assuming [itex] Var(X) [/itex] denotes the sample variance, I don't think you can answer that question, in general. In the special case when X has a normal distribution then the sample variance has a chi-squared distribution of some sort.
Is that because the estimate for the variance contains squared X-terms?

Your notation is unclear. If [itex] \bar{X} [/itex] is the sample mean then it is an random variable and it's variance is a single number, not another random variable.
Well, ##\bar{X}## is a statistic which has its own distribution. Is that wrong?

And so ##Var(\bar{X})## also has a distribution?

I don't know what you mean by a "distribution" of [itex] E( \bar{X}) [/itex]. If you intend [itex] \bar{X} [/itex] to denote the sample mean then the expectation of the sample mean is a single number. It is the expectation of a random variable, not a estimator computed from samples of that random variable.
I meant the distribution ##\mu_{\bar{X}}##. Isn't that the same as the expectation value of ##\bar{X}##?

Are you are thinking about an experiment consisting of groups of sub-experiments where each sub experiment is a sample of several independent realizations of a random variable?

No, no experimets. I was talking about taking multiple samples from a population, estimating the means (1) and variances (2) of the samples, and the means (3) and variances (4) of the means of the samples, and seeing what distribution all of those got.

I just want to know what happens if X is, say, normally distributed or whatever. Just trying to understand statistics.
 
  • #4
The "expectation" of a random variable is a single number, not a distribution. I suggest you ask your questions again after you study the technical definitions for the words you are using.
 
  • #5
Kind of hard to study them when I don't know what I have understood correctly and what not. I am actually learning quite allot here about the way to express myself (you can't learn to speak a language purely by reading from a book, can you?). At any rate, I understand and appreciate your corrections and hope it's not too difficult to understand me.

Questions restated in a more correct manner:

1) What kinds of distributions can ##\bar{X}## have, and how do they depend on the distributions of ##X##?
2) What distribution will ##\sigma_{X}^2## have?
3) What will the distribution of ##\sigma_{\bar{X}}^2## be? From the central limit theorem, I know that ##\mu_{\bar{X}}## has a normal distribution. But what about the sample variance of the means ?
 
  • #6
Let's start with the fact that many statistical terms (such as "mean", "variance") have at least 3 different interpretations. Distinguishing among these meanings requires additional words or context. Such words may refer to:

1) A property of the distribution of a random variable. (e.g. The mean value of a normally distributed random variable

2) A random variable defined as a function of values in a sample of another random variable
(e.g. the sample mean of 10 independent samples from a normal distribution with mean 0 and variance 1. The sample mean is defined by a formula that makes it a function of the values in the sample.)

3) One specific numerical value computed as a function of the values in a sample.
(e.g. "The sample mean was 43.6")

You are using notation that has traditional interpretations, but your questions suggest you don't follow the interpretations. The traditional interpretation of the notation [itex] E(...) [/itex] is that it is the "expected value of" a random variable. This is the same as the "mean of" a random variable. Thus this notation indicates a single number, not a random quantity that has a distribution.

Likewise, a common use of the notation [itex] Var(...) [/itex] is to mean the variance of a random variable. By this interpretation, [itex] Var(...) [/itex] denotes a single number, not a number that has some random variation in it.

I suspect you want to know things about the "sample mean" and "sample variance". Considering them to be random variables, it does make sense to ask about their distributions and the means and variances of those distributions. But it doesn't make sense to ask "What is the distribution of the expected value of the sample mean". The expected value of the sample mean will be a single number. It won't be randomly distributed over different values.
 
  • Like
Likes 1 person
  • #7
I see. So the mean of sample means can be seen as an estimate of the expectation value of the original random variable. Or an estimate of the mean of all the events from the entire population that resulted from the random variable.

Am I correct?
 
  • #8
Nikitin said:
I see. So the mean of sample means can be seen as an estimate of the expectation value of the original random variable. Or an estimate of the mean of all the events from the entire population that resulted from the random variable.

Am I correct?

Yes. (At least: yes, the sample mean is an estimator of the mean. The "expectation of the sample means" is not a function of the values in a particular sample, so it is not an estimator.)

However, keep in mind that many different estimators of the expectation are possible.

The mean is intuitively pleasing because the computation of the sample mean (say, from a histogram of sample values) is analogous to how the expectation is computed from the distribution. In the early days of statistics, an estimator computed from a sample in a manner that is analogous to how a distribution parameter is computed from a distribution was called a "consistent" estimator. The modern definition of a "consistent" estimator is different.

A function of the values in a sample is called a "statistic". The modern definition of function allows a function to be any sort of rule or complicated algorithm that produces a unique result from a given set of values. An "estimator" is a "statistic". The term "estimator" conveys the human intention to use the statistic to estimate something.

Besides the mean, it is possible to create other estimators of the expectation of a random variable. For example: (max sample value - min sample value)/2 is function of the values in a sample, so it is a "statistic" and if you use it to estimate the expected value of the random variable, it is an "estimator".

If we have two different estimators for the same property of a distribution, it is natural to ask which one is "best". This brings up the problem of defining "best". No unique way of defining "best" has been found. Instead, there are various technical definitions of "good aspects" of estimators. Common good aspects are "unbiased", "minimum variance" ,"maximum likelihood". These terms have technical definitions.
 
Last edited:
  • Like
Likes 1 person
  • #9
Thank you. But how does the distribution of the sample variance of ##X##, ##S##, depend on the distribution of ##X##? And I assume it also is highly dependant on the sample size?
 
Last edited:
  • #10
Are you asking for a formula of some kind that gives the parameters of the distribution of the sample variance [itex]S [/itex] as a function of parameters of the distribution of [itex] X [/itex]? I know of no such general formula.

Yes, the distribution of [itex] S [/itex] would, in general, depend on the number of realizations of X in the sample. ( Statistics texts often use the word "sample" to mean a set of realizations of a random variable, and "sample size" to denote the number of realizations, but in other contexts, people use "sample" to mean a single realization of X.)

For particular families of distributions, the distribution of the sample variance may be known. But I don't know of any general approach that is applied to derive the distribution of the sample variance.
 
  • Like
Likes 1 person
  • #11
I was asking for important cases/relations, like if ##X ## is normal, ##S## is chi squared. But it's OK, I guess I will find out this stuff anyway as my experience with statistics increases.

Thanks for the help and patience :)
 

FAQ: Distribution of sample mean and variance, and variance of sample means

1. What is the distribution of sample mean and variance?

The distribution of sample mean and variance refers to the probability distribution of the average values and variability of a sample from a larger population. It is typically assumed to follow a normal distribution, with a mean equal to the population mean and a variance equal to the population variance divided by the sample size.

2. How is the variance of sample means calculated?

The variance of sample means is calculated by taking the variance of the population and dividing it by the sample size. It is also known as the standard error of the mean and is used to measure the spread of sample means around the population mean.

3. What is the relationship between sample size and the variance of sample means?

The variance of sample means decreases as the sample size increases. This is because with a larger sample size, there is less variability in the sample means and they are more likely to be closer to the population mean.

4. How is the distribution of sample mean and variance affected by the central limit theorem?

The central limit theorem states that as the sample size increases, the distribution of sample mean and variance will approach a normal distribution, regardless of the distribution of the population. This means that even if the population is not normally distributed, the distribution of sample mean and variance will still follow a normal distribution if the sample size is large enough.

5. Why is the distribution of sample mean and variance important in statistical analysis?

The distribution of sample mean and variance is important in statistical analysis because it allows us to make inferences about the population based on a sample. By understanding the distribution, we can calculate probabilities and make predictions about the population, which is crucial in making informed decisions and drawing accurate conclusions.

Back
Top