I Degrees of freedom for t-test for 2 samples, 2 variances

AI Thread Summary
The discussion focuses on understanding the formula for degrees of freedom in a t-test when variances are unknown and assumed unequal. Participants seek clarity on the meaning of each component of the formula, particularly the significance of dividing variances by sample sizes and the rationale behind squaring mean variances. There is a debate about whether every term in a mathematical formula can have an intuitive interpretation, with some arguing that not all formulas lend themselves to complete logical understanding. The conversation emphasizes the importance of grasping the "big picture" in statistical testing and the challenges of learning complex statistical concepts. Recommendations for resources to improve mathematical understanding and intuition are also requested.
Vital
Messages
108
Reaction score
4
Hello.

I will be grateful for your help in finding the logical meaning of each part of the formula of degrees of freedom, which are computed for a t-test when variances are unknown and are assumed to be unequal.

Please, take a look at the formula, the way I managed to understand some parts of it, and, please, help me to understand the rest of it.

The formula is as follows:

degrees of freedom = [ s12/n1 + s22/n2 ] 2 / [ ( s12/n1 ) / n1 + ( s22/n2 ) / n2 ]

where s12 is the variance of the first sample
s22 is the variance of the second sample
n1 - number of observations in the first sample
n1 - number of observations in the second sample

Here are parts I managed to decipher:

(1) s12/n1 means that by dividing the variance of the first sample by the number of observations in this sample we get the mean variance of the first sample, that is the mean of all variances. Even if I am right, I don't understand what does the mean variance give us and its meaning.

Same for s22/n2 but for the second sample.

(2) hence the numerator is the squared sum of two mean variances; why do we need to square them, if usually squaring in such contexts is used to avoid negative numbers; with variances negatives are excluded, as those are already squared numbers.

(3) the meaning of expressions in the denominator has skipped from me)
Here again we square mean variances, and then divide each by the number of corresponding observations, and then sum both results. What do all these mean?

Thank you very much.
 
Physics news on Phys.org
Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ2.
The variance s2 of a set of independent observations with a Normal distribution with variance σ2, with f degrees of freedom, is distributed as Γ(f/2,2σ2/f). M = σ2 and V = 2σ4/f.
Gamma distributions have the property that if x and y are independent gamma variables Γ(α1,β), Γ(α2,β) with the same value of β, then x+y is Γ(α12,β).
Now fs2 is distributed as Γ(f/2,2σ2), so if we have two independent estimates of the same σ2, then
f1s12 + f2s22 is Γ((f1+f2)/2,2σ2)
This is equal to fs2 where f = f1 + f2 and s2 is the pooled estimate of variance. This is how we proceed when we assume the variances of the two samples are equal. The estimated variance of the difference between the sample means is s2(1/n1+1/n2), with f1 + f2 degrees of freedom.
Now suppose we can't assume the variances are equal, call them σ12 and σ22. The above analysis doesn't apply exactly, but we assume that the deviation from a gamma distribution isn't too great, so we can use it as a reasonable approximation.
Now the difference between the means has estimated variance S2 = s12/n1 + s22/n2. The mean of this variable is the sum of the two means, i.e. σ12/n1 + σ22/n2. The variance is the sum of the variances, i.e. 2σ14/n12f1 + 2σ24/n22f2.
If we assume that the distribution is approximately gamma, and remember that mean M = αβ and variance V = αβ2, then this is an approximate s2 distribution with effective degrees of freedom f, where
f = 2α = 2M2/V
= (σ12/n1 + σ22/n2)2/(σ14/n12f1 + σ24/n22f2)
To estimate this f value from the data, we replace the σ2s by the measured s2s.
I think there's a mistake in your formula; in the denominator, the terms of the form (s2/n)/n should be (s2/n)2/(n-1).
 
  • Like
Likes WWGD
mjc123 said:
Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ2.
[snip]
I think there's a mistake in your formula; in the denominator, the terms of the form (s2/n)/n should be (s2/n)2/(n-1).
Thank you very much for such detailed explanation. But it gets away from my question and is too complicated for me) I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there. By meaning I mean what exactly is going on in each step, what it means and why this or that type of math formula is used. For example, by computing s2 / n we find the average of variances, and if we take a square root of that we get the standard error which means the average number of standard deviations from the mean.
I am not sure I made it clear what I am looking for - it is sort of a correct reading of the meaning of words in a sentence (what they mean, and why they are combined in such a way, how that combination changes the meaning of a sentence, etc). :smile:
 
The short answer is that I am trying to answer your question, but it needs context to make sense. Briefly, in the expression for f, the numerator is the square of the mean of S2 (the variance of the difference between two means) and the denominator is half the variance of S2. Most of my post is just explaining why f = 2M2/V.
To discuss the individual terms, s2/n is not the "average of variances" (whatever that means) but the variance of the mean. If the variance of a single observation is s2 (strictly, σ2 estimated by s2), the variance of the mean of n observations is (estimated by) s2/n. If you're looking at the difference between the means of two samples, the variance of that is the sum of the variances of the two means, s12/n1 + s22/n2. This is what I have called S2.
If s2 estimates σ2, the mean (expectation) of s2 is σ2 and its variance is 2σ4/f.
The variance of S2 is the sum of the variances of its two components, i.e 2s14/n12f1 + 2s24/n22f2. And I have assumed f = n-1.

I hope that makes things at least a little clearer.
 
Vital said:
I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there.

If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?
 
Stephen Tashi said:
If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

Thank you.
But is it really so that not all formulas can have an intuitive, or logical, interpretation and meaning? I truly don't think so. Of course, I am very bad at math, and it is not my main subject (though, honestly, I wish I could know and understand it at a much much deeper level - I love math), but it is created to explain and in many cases to simplify events, subjects, matters. Hence, I believe that every formula has a meaning and can be explained and interpreted (as in the simplest example of (10 / 3) - 1 tells us how many times 3 is repeated in 10, and by subtracting 1 we see what percentage 3 has in 10). And if the formula is a complicated long one with many parts, then I always though that it can be deconstructed and each part should have the meaning, thus gradually forming the meaning of the whole formula.
As you mentioned, that what truly helps us to remember and understand math operations. It doesn't make sense to simply memorise steps, if one doesn't understand what is going on in this or that math operation, and what it does.

Stephen Tashi said:
You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?
I hope I do, as I am not a student, and have a more than 20 years of working experience. What I am very weak at, is understanding and interpreting math operations when I encounter them, especially now as I am learning new material.
 
Vital said:
But is it really so that not all formulas can have an intuitive, or logical, interpretation and meaning?
Yes, it is really true that, at a person's current level of knowledge and experience, some mathematical formulas have no intutiive or logical interpretation.

It's possible that by gaining more experience and knowledge, one can attain an intuitive or logical interpretation of an unfamiliar formula. But doing that can involve more than picking up a few details. It can involve adopting a completely different view of the world.

I hope I do, as I am not a student, and have a more than 20 years of working experience.
Experienced users of statistics get the mis-impression that a "statistic" (in the abstract sense) must be some quantity that is significant or useful in interpreting data. This is not the mathematical definition of a statistic.

What I am very weak at, is understanding and interpreting math operations when I encounter them, especially now as I am learning new material.
Statistics differs from other mathematical topics in that with other topics, it is possible to present them so new concepts are introduced gradually.

For example, learning calculus doesn't start on page 1 with the notion of a derivative. FIrst, we present functions, limits, etc. Likewise linear algebra doesn't start on page 1 with finding the minimal polynomial of a matrix. It begins by teaching the concepts of vectors spaces, bases, etc.

By contrast, the simplest problems in statistics involve the concepts of random variables, functions of random variables, sample distributions, statistics, estimators etc. Statistics texts may begin slowly by teaching about random variables and probability distributions. But after that, most them rush into presenting practical applications that involve sophisticated concepts before students appreciate those concepts. Fortunately for the students, it is possible to work such problems after seeing a few examples without fully appreciating the concepts behind them.
 
Stephen Tashi said:
[snip]
.
Thank you) Can you, please, recommend books, which help to develop mathematical thinking, understanding, intuition, etc?
 
Back
Top