Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

When not to use the Student t test?

  1. May 11, 2007 #1


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    A Student t test assumes normally distributed data with equal variances.
    I know you can test the Gaussian distribution with the Kolmogorov and Smirnov test and test the variances with the F-test.

    When data is not normal you use a non-parametric test (Mann-Whitney test), when variances are significantly different you use the Welch-corrected t test.

    How strict should I follow those rules?
    According to this site (http://www.graphpad.com/articles/interpret/Analyzing_two_groups/choos_anal_comp_two.htm [Broken]) the rules work well for >100 samples and works poorly for <12 samples. How about the region in between?

    I have samples sets of n around 20, some are not normally distributed. Can I go ahead and do a t test, or should I maybe log transform all the data before doing the t test? Or do a Mann-Whitney test?

    Thanks for your input, here is a graph with the data distribution for the 4 samples, together with the 95% CI:
    http://img301.imageshack.us/img301/9940/scatter95cifg4.jpg [Broken]
    Last edited by a moderator: May 2, 2017
  2. jcsd
  3. May 11, 2007 #2


    User Avatar
    Science Advisor
    Homework Helper

    The question of equal variances is easy: there is a variant of the t-test designed for unequal variances. For ex., proc ttest in SAS will produce one statistic under H0: equal variances, and another statistic under unequal variances, and it will also test for equality of the variances.

    A first "gut" reaction to the question of normality is, you should use both types of tests (parametric and non). If the results agree, no worry. You should think some more only if their results turn out differently from each other.

    The data look as if a logarithmic transformation would do the trick, esp. for the 3rd and the 4th samples.

    What I would have done is to estimate the linear regression Log(Y) = a + b2 d2 + ... + b4 d4 + ε, where di = 1 if Y is in the i'th sample (i = 1, 2, 3, 4), di = 0 otherwise; b's are the parameters to be estimated, and ε is the error term. Each b represents the difference between the mean of the i'th sample from the mean of the control sample. In this model, the first sample is made the control group by having been excluded from the regression, but one can easily change that. I'd first run this as an unweighted regression; alternatively I'd run a weighted regression to control for unequal variances (a problem technically known as heteroscedasticity.)
    Last edited: May 11, 2007
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook