Why does normal distribution turn into t distribution when variance is unknown?

Click For Summary

Discussion Overview

The discussion centers on the relationship between the normal distribution and the t distribution, particularly in the context of estimating the mean when the population variance is unknown. Participants explore the implications of using the sample variance in statistical tests and the conditions under which the t distribution arises.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant states that when the population variance is unknown and the sample size is small, the sample mean no longer follows a normal distribution but instead follows a t distribution.
  • Another participant argues that the distribution of the sample mean remains normal if the underlying population is normal, regardless of whether the population variance is known.
  • It is noted that the sample standard deviation is a random variable and follows a chi-distribution, which contributes to the t distribution of the t-statistic.
  • Some participants clarify that the t-statistic is a ratio of a normal random variable and a chi random variable, which leads to the t distribution.
  • There is a discussion about the independence of the sample mean and sample standard deviation, which is necessary for the t-statistic to have a t distribution.

Areas of Agreement / Disagreement

Participants express differing views on whether the distribution of the sample mean changes with knowledge of the population variance. Some assert that it does not change, while others maintain that the use of the t distribution is appropriate when the population variance is unknown and the sample size is small. The discussion remains unresolved with multiple competing views.

Contextual Notes

Participants highlight that the assumptions about the underlying distribution of the data and the independence of the sample mean and standard deviation are critical to the discussion, but these assumptions are not universally agreed upon.

Happiness
Messages
686
Reaction score
30
Suppose ##X## ~ N(##\mu##,##\sigma^2##). Then ##\bar{X}## ~ N(##\mu##,##\frac{\sigma^2}{n}##), where ##\bar{X}## is the random variable for sample mean for samples of size ##n##.

But when the population variance ##\sigma^2## is unknown and the sample size ##n## is small, ##\bar{X}## no longer follows a normal distribution but instead follows a t distribution, such that ##T=\frac{\bar{X}-\mu}{S/\sqrt{n}}## ~ t##_{n-1}##, where ##s^2=\frac{n}{n-1}\times##sample variance##=##the unbiased estimator of the population variance and ##n-1## is the degree of freedom of the t distribution.

My question is why does the distribution of ##\bar{X}## changes just because we do not know the population variance? Shouldn't the population variance still be some fixed value ##\sigma^2## (it's just that it's unknown to us at the moment), and thus making ##\bar{X}## follow a normal distribution still: ##\bar{X}## ~ N(##\mu##,##\frac{\sigma^2}{n}##)? It seems that objective reality (the specific distribution of ##\bar{X}##) changes according to subjective knowledge (whether we know ##\sigma^2## or not). And this I find puzzling.
 
Last edited:
Physics news on Phys.org
We should first observe that you are making some statements that are only true when ##X## has normal distribution.

Happiness said:
But when the population variance ##\sigma^2## is unknown and the sample size ##n## is small, ##\bar{X}## no longer follows a normal distribution but instead

That is not correct. ##T## and ##\bar{X}## are different random variables. ( A "statistic" is defined as a random variable that is a function of the values in a sample. ##T## and ##\bar{X}## are statistics. )

It isn't the distribution of ##\bar{X}## that changes when the sample size is small. Instead it is the choice of which statistic people prefer to use when doing statistical tests.

When the sample size is large, people approximate ##\sigma^2## (the population variance) by the sample variance of the particular sample they have, in the belief that the sample variance computed from a large sample is probably close to the population variance. They assume ##\bar{X}## has distribution ##N(\mu,s^2/n)##.When sample size is small there is less reason to believe that that the sample variance of a particular sample is close to the population variance. So people use ##T-##tests to make decisions. The distribution of ##T## is not the same as a normal distribution. However, this does not change the fact that ##\bar{X}## still has distribution ##N(\mu,\sigma^2/n)## (provided ##X## has a normal distribution).
 
  • Like
Likes   Reactions: StatGuy2000, Happiness and FactChecker
The mean has normal distribution as it was said.

As for the t-statistic T, you first need to see that the sample standard deviation is a random variable (it contains the sum of squares of X, hence it could never be a "fixed value" as you suggested), and that it has a chi-distribution (or equivalently, sample variance has a chi-squared distribution). Check the 2nd post here for a proof: https://stats.stackexchange.com/que...bution-of-variance-a-chi-squared-distribution

Afterward, you can see that the t-statistic T is a ratio of a normal random variable with a chi random variable. You can check why this means that it has a t-student distribution here: https://stats.stackexchange.com/que...sqrt-chi2s-s-gives-you-a-t-distribution-proof, where his W equals your s^2 and his s equals your n.

If the population's variance was known, you would use that instead of S^2, and the only random variable in T would be the mean, with a normal distribution. In that case, T would have normal distribution as well.
 
Last edited:
  • Like
Likes   Reactions: Happiness
ZeGato said:
and the only random variable in T would be the mean, with a normal distribution. In that case, T would have normal distribution as well.

The sample variance ##s## is still a random variable, even if the population variance is known. The ##T##-statistic still has a ##T-##distribution instead of a normal distribution even if ##\sigma## is known. If you used a constant in place of ##s## in the formula for ##T##, you wouldn't be computing the ##T##-statistic. So while it is true that replacing ##s## by ##\sigma## in the formula for the ##T##-statistic changes the formula into a formula for a normally distributed random variable, technically that random variable is no longer ##T##.
 
Stephen Tashi said:
The sample variance ##s## is still a random variable, even if the population variance is known. The ##T##-statistic still has a ##T-##distribution instead of a normal distribution even if ##\sigma## is known. If you used a constant in place of ##s## in the formula for ##T##, you wouldn't be computing the ##T##-statistic. So while it is true that replacing ##s## by ##\sigma## in the formula for the ##T##-statistic changes the formula into a formula for a normally distributed random variable, technically that random variable is no longer ##T##.
I'm aware, and T would just be the variable's name and not representative of the t-statistic.
 
ZeGato said:
The mean has normal distribution as it was said.

As for the t-statistic T, you first need to see that the sample standard deviation is a random variable (it contains the sum of squares of X, hence it could never be a "fixed value" as you suggested), and that it has a chi-distribution (or equivalently, sample variance has a chi-squared distribution). Check the 2nd post here for a proof: https://stats.stackexchange.com/que...bution-of-variance-a-chi-squared-distribution

Afterward, you can see that the t-statistic T is a ratio of a normal random variable with a chi random variable. You can check why this means that it has a t-student distribution here: https://stats.stackexchange.com/que...sqrt-chi2s-s-gives-you-a-t-distribution-proof, where his W equals your s^2 and his s equals your n.

If the population's variance was known, you would use that instead of S^2, and the only random variable in T would be the mean, with a normal distribution. In that case, T would have normal distribution as well.
"Afterward, you can see that the t-statistic T is a ratio of a normal random variable with a chi random variable."

An important word is missing: independent. The sample mean and sample standard deviation have to be independent in order for the statistic T to have a t-distribution. Assuming the data come from a normal distribution ensures that independence.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K