# Why can t statistic deal with small numbers ?

1. Jun 9, 2010

### thrillhouse86

Hi,

I've been trying to get my head around z and t statistics. and I almost have a matra in my head that "when the sample are small, use the t test, when the samples are big, use either the t or the z test".

Now As I understand it, the z test requires a large number of samples, because it assumes you have a normal distribution, and you need a certain number of samples before your samples will start to look like a normal distribution.

But Why does the t test allow us to deal with smaller samples ? what does it have (or what assumptions doesn't it have) which allow us to deal with smaller samples ?

Is it that in the z test the standard error of the mean distribution is determined from the KNOWN population variance, whereas the t test the standard error of the mean distribution is determined from the ESTIMATE of the population variance, and in the limit of large number of samples the ESTIMATE of the population variance will approach the TRUE population variance ?

If this is indeed the case, does the central limit theorem show us that in the limit of a large number of samples the estimate of the population variance will approach the true population variance ?

Thanks

Last edited: Jun 9, 2010
2. Jun 9, 2010

""when the sample are small, use the t test, when the samples are big, use either the t or the z test"." is really old advice, originated before calculators and even computers brought cheap and easy calculations to us.

The origin of the idea: Gossett (the developer of the t-procedure) found that when sample sizes were small, estimates based on the normal distribution (the Z-test and Z-confidence interval) gave results that didn't match experimental observations. (Statistics using th sample mean and standard deviation were more variable for small samples than the normal distribution "expected".) Methods based on the "t"-distribution were developed empirically to circumvent this. By the time samples sizes were around 30, results from the two procedures were in general agreement, and this observation grew into the "small sample size" vs "large sample size" distinction. That was convenient: suppose you always create a 95% confidence interval. When you use the t-distribution intervals, you need a different critical value for each sample size, and many years ago this required tables. When you use the Z-distribution intervals, a single critical value works no matter what the sample size.

"Is it that in the z test the standard error of the mean distribution is determined from the KNOWN population variance"
It doesn't have to be - that really was the point of the distinction. If you are performing a hypothesis test, and all you have are the sample size, sample mean, and sample standard deviation, the form of the test statistic is

$$\frac{\overline x - \mu_0}{\frac s {\sqrt{n}}}$$

For "small samples" you would compare this to critical values from the appropriate t-distribution: for "large samples" you compare it to a value from the standard normal distribution.

If you actually had the true population standard devation, and you were sure the underlying population were normal , then the test statistic would be

$$\frac{\overline x - \mu_0}{\frac{\sigma}{\sqrt n}}$$

and you would compare it to a critical value from the normal distribution.

"If this is indeed the case, does the central limit theorem show us that in the limit of a large number of samples the estimate of the population variance will approach the true population variance ?"

Yes.