When not to use the Student t test?

Monique · May 11, 2007

A Student t test assumes normally distributed data with equal variances.
I know you can test the Gaussian distribution with the Kolmogorov and Smirnov test and test the variances with the F-test.

When data is not normal you use a non-parametric test (Mann-Whitney test), when variances are significantly different you use the Welch-corrected t test.

How strict should I follow those rules?
According to this site (http://www.graphpad.com/articles/interpret/Analyzing_two_groups/choos_anal_comp_two.htm ) the rules work well for >100 samples and works poorly for <12 samples. How about the region in between?

I have samples sets of n around 20, some are not normally distributed. Can I go ahead and do a t test, or should I maybe log transform all the data before doing the t test? Or do a Mann-Whitney test?

Thanks for your input, here is a graph with the data distribution for the 4 samples, together with the 95% CI:
http://img301.imageshack.us/img301/9940/scatter95cifg4.jpg

EnumaElish · May 11, 2007

The question of equal variances is easy: there is a variant of the t-test designed for unequal variances. For ex., proc ttest in SAS will produce one statistic under H₀: equal variances, and another statistic under unequal variances, and it will also test for equality of the variances.

A first "gut" reaction to the question of normality is, you should use both types of tests (parametric and non). If the results agree, no worry. You should think some more only if their results turn out differently from each other.

The data look as if a logarithmic transformation would do the trick, esp. for the 3rd and the 4th samples.

What I would have done is to estimate the linear regression Log(Y) = a + b₂ d₂ + ... + b₄ d₄ + ε, where d_i = 1 if Y is in the i'th sample (i = 1, 2, 3, 4), d_i = 0 otherwise; b's are the parameters to be estimated, and ε is the error term. Each b represents the difference between the mean of the i'th sample from the mean of the control sample. In this model, the first sample is made the control group by having been excluded from the regression, but one can easily change that. I'd first run this as an unweighted regression; alternatively I'd run a weighted regression to control for unequal variances (a problem technically known as heteroscedasticity.)

When not to use the Student t test?

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Thread 'Roulette wheel physics and probability'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

I Stochastic calculus: Ito's lemma and differentials

I The reason for lambda calculus being universal

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective