Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I Degrees of freedom for t-test for 2 samples, 2 variances

  1. Nov 6, 2018 #1
    Hello.

    I will be grateful for your help in finding the logical meaning of each part of the formula of degrees of freedom, which are computed for a t-test when variances are unknown and are assumed to be unequal.

    Please, take a look at the formula, the way I managed to understand some parts of it, and, please, help me to understand the rest of it.

    The formula is as follows:

    degrees of freedom = [ s12/n1 + s22/n2 ] 2 / [ ( s12/n1 ) / n1 + ( s22/n2 ) / n2 ]

    where s12 is the variance of the first sample
    s22 is the variance of the second sample
    n1 - number of observations in the first sample
    n1 - number of observations in the second sample

    Here are parts I managed to decipher:

    (1) s12/n1 means that by dividing the variance of the first sample by the number of observations in this sample we get the mean variance of the first sample, that is the mean of all variances. Even if I am right, I don't understand what does the mean variance give us and its meaning.

    Same for s22/n2 but for the second sample.

    (2) hence the numerator is the squared sum of two mean variances; why do we need to square them, if usually squaring in such contexts is used to avoid negative numbers; with variances negatives are excluded, as those are already squared numbers.

    (3) the meaning of expressions in the denominator has skipped from me)
    Here again we square mean variances, and then divide each by the number of corresponding observations, and then sum both results. What do all these mean?

    Thank you very much.
     
  2. jcsd
  3. Nov 6, 2018 #2

    mjc123

    User Avatar
    Science Advisor

    Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ2.
    The variance s2 of a set of independent observations with a Normal distribution with variance σ2, with f degrees of freedom, is distributed as Γ(f/2,2σ2/f). M = σ2 and V = 2σ4/f.
    Gamma distributions have the property that if x and y are independent gamma variables Γ(α1,β), Γ(α2,β) with the same value of β, then x+y is Γ(α12,β).
    Now fs2 is distributed as Γ(f/2,2σ2), so if we have two independent estimates of the same σ2, then
    f1s12 + f2s22 is Γ((f1+f2)/2,2σ2)
    This is equal to fs2 where f = f1 + f2 and s2 is the pooled estimate of variance. This is how we proceed when we assume the variances of the two samples are equal. The estimated variance of the difference between the sample means is s2(1/n1+1/n2), with f1 + f2 degrees of freedom.
    Now suppose we can't assume the variances are equal, call them σ12 and σ22. The above analysis doesn't apply exactly, but we assume that the deviation from a gamma distribution isn't too great, so we can use it as a reasonable approximation.
    Now the difference between the means has estimated variance S2 = s12/n1 + s22/n2. The mean of this variable is the sum of the two means, i.e. σ12/n1 + σ22/n2. The variance is the sum of the variances, i.e. 2σ14/n12f1 + 2σ24/n22f2.
    If we assume that the distribution is approximately gamma, and remember that mean M = αβ and variance V = αβ2, then this is an approximate s2 distribution with effective degrees of freedom f, where
    f = 2α = 2M2/V
    = (σ12/n1 + σ22/n2)2/(σ14/n12f1 + σ24/n22f2)
    To estimate this f value from the data, we replace the σ2s by the measured s2s.
    I think there's a mistake in your formula; in the denominator, the terms of the form (s2/n)/n should be (s2/n)2/(n-1).
     
  4. Nov 7, 2018 #3
    Thank you very much for such detailed explanation. But it gets away from my question and is too complicated for me) I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there. By meaning I mean what exactly is going on in each step, what it means and why this or that type of math formula is used. For example, by computing s2 / n we find the average of variances, and if we take a square root of that we get the standard error which means the average number of standard deviations from the mean.
    I am not sure I made it clear what I am looking for - it is sort of a correct reading of the meaning of words in a sentence (what they mean, and why they are combined in such a way, how that combination changes the meaning of a sentence, etc). :smile:
     
  5. Nov 7, 2018 #4

    mjc123

    User Avatar
    Science Advisor

    The short answer is that I am trying to answer your question, but it needs context to make sense. Briefly, in the expression for f, the numerator is the square of the mean of S2 (the variance of the difference between two means) and the denominator is half the variance of S2. Most of my post is just explaining why f = 2M2/V.
    To discuss the individual terms, s2/n is not the "average of variances" (whatever that means) but the variance of the mean. If the variance of a single observation is s2 (strictly, σ2 estimated by s2), the variance of the mean of n observations is (estimated by) s2/n. If you're looking at the difference between the means of two samples, the variance of that is the sum of the variances of the two means, s12/n1 + s22/n2. This is what I have called S2.
    If s2 estimates σ2, the mean (expectation) of s2 is σ2 and its variance is 2σ4/f.
    The variance of S2 is the sum of the variances of its two components, i.e 2s14/n12f1 + 2s24/n22f2. And I have assumed f = n-1.

    I hope that makes things at least a little clearer.
     
  6. Nov 10, 2018 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

    You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted