Why can t statistic deal with small numbers ?

  • Thread starter Thread starter thrillhouse86
  • Start date Start date
  • Tags Tags
    Numbers Statistic
thrillhouse86
Messages
77
Reaction score
0
Hi,

I've been trying to get my head around z and t statistics. and I almost have a matra in my head that "when the sample are small, use the t test, when the samples are big, use either the t or the z test".

Now As I understand it, the z test requires a large number of samples, because it assumes you have a normal distribution, and you need a certain number of samples before your samples will start to look like a normal distribution.

But Why does the t test allow us to deal with smaller samples ? what does it have (or what assumptions doesn't it have) which allow us to deal with smaller samples ?

Is it that in the z test the standard error of the mean distribution is determined from the KNOWN population variance, whereas the t test the standard error of the mean distribution is determined from the ESTIMATE of the population variance, and in the limit of large number of samples the ESTIMATE of the population variance will approach the TRUE population variance ?

If this is indeed the case, does the central limit theorem show us that in the limit of a large number of samples the estimate of the population variance will approach the true population variance ?

Thanks
 
Last edited:
Physics news on Phys.org
""when the sample are small, use the t test, when the samples are big, use either the t or the z test"." is really old advice, originated before calculators and even computers brought cheap and easy calculations to us.

The origin of the idea: Gossett (the developer of the t-procedure) found that when sample sizes were small, estimates based on the normal distribution (the Z-test and Z-confidence interval) gave results that didn't match experimental observations. (Statistics using th sample mean and standard deviation were more variable for small samples than the normal distribution "expected".) Methods based on the "t"-distribution were developed empirically to circumvent this. By the time samples sizes were around 30, results from the two procedures were in general agreement, and this observation grew into the "small sample size" vs "large sample size" distinction. That was convenient: suppose you always create a 95% confidence interval. When you use the t-distribution intervals, you need a different critical value for each sample size, and many years ago this required tables. When you use the Z-distribution intervals, a single critical value works no matter what the sample size.

"Is it that in the z test the standard error of the mean distribution is determined from the KNOWN population variance"
It doesn't have to be - that really was the point of the distinction. If you are performing a hypothesis test, and all you have are the sample size, sample mean, and sample standard deviation, the form of the test statistic is

<br /> \frac{\overline x - \mu_0}{\frac s {\sqrt{n}}}<br />

For "small samples" you would compare this to critical values from the appropriate t-distribution: for "large samples" you compare it to a value from the standard normal distribution.

If you actually had the true population standard devation, and you were sure the underlying population were normal , then the test statistic would be

<br /> \frac{\overline x - \mu_0}{\frac{\sigma}{\sqrt n}}<br />

and you would compare it to a critical value from the normal distribution.

"If this is indeed the case, does the central limit theorem show us that in the limit of a large number of samples the estimate of the population variance will approach the true population variance ?"

Yes.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top