When not to use the Student t test?

  • Thread starter Thread starter Monique
  • Start date Start date
  • Tags Tags
    Student Test
Monique
Staff Emeritus
Science Advisor
Gold Member
Messages
4,211
Reaction score
68
A Student t test assumes normally distributed data with equal variances.
I know you can test the Gaussian distribution with the Kolmogorov and Smirnov test and test the variances with the F-test.

When data is not normal you use a non-parametric test (Mann-Whitney test), when variances are significantly different you use the Welch-corrected t test.

How strict should I follow those rules?
According to this site (http://www.graphpad.com/articles/interpret/Analyzing_two_groups/choos_anal_comp_two.htm ) the rules work well for >100 samples and works poorly for <12 samples. How about the region in between?

I have samples sets of n around 20, some are not normally distributed. Can I go ahead and do a t test, or should I maybe log transform all the data before doing the t test? Or do a Mann-Whitney test?

Thanks for your input, here is a graph with the data distribution for the 4 samples, together with the 95% CI:
http://img301.imageshack.us/img301/9940/scatter95cifg4.jpg
 
Last edited by a moderator:
Physics news on Phys.org
The question of equal variances is easy: there is a variant of the t-test designed for unequal variances. For ex., proc ttest in SAS will produce one statistic under H0: equal variances, and another statistic under unequal variances, and it will also test for equality of the variances.

A first "gut" reaction to the question of normality is, you should use both types of tests (parametric and non). If the results agree, no worry. You should think some more only if their results turn out differently from each other.

The data look as if a logarithmic transformation would do the trick, esp. for the 3rd and the 4th samples.

What I would have done is to estimate the linear regression Log(Y) = a + b2 d2 + ... + b4 d4 + ε, where di = 1 if Y is in the i'th sample (i = 1, 2, 3, 4), di = 0 otherwise; b's are the parameters to be estimated, and ε is the error term. Each b represents the difference between the mean of the i'th sample from the mean of the control sample. In this model, the first sample is made the control group by having been excluded from the regression, but one can easily change that. I'd first run this as an unweighted regression; alternatively I'd run a weighted regression to control for unequal variances (a problem technically known as heteroscedasticity.)
 
Last edited:
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...

Similar threads

Replies
4
Views
2K
Replies
7
Views
2K
Replies
27
Views
3K
Replies
30
Views
4K
Replies
20
Views
3K
Replies
5
Views
2K
Back
Top