Degrees of freedom for t-test for 2 samples, 2 variances

In summary, the formula for degrees of freedom is a way of calculating the variability of a t-test. It uses the variance of two samples to calculate the mean of variability, and then uses that to estimate the degrees of freedom.
  • #1
Vital
108
4
Hello.

I will be grateful for your help in finding the logical meaning of each part of the formula of degrees of freedom, which are computed for a t-test when variances are unknown and are assumed to be unequal.

Please, take a look at the formula, the way I managed to understand some parts of it, and, please, help me to understand the rest of it.

The formula is as follows:

degrees of freedom = [ s12/n1 + s22/n2 ] 2 / [ ( s12/n1 ) / n1 + ( s22/n2 ) / n2 ]

where s12 is the variance of the first sample
s22 is the variance of the second sample
n1 - number of observations in the first sample
n1 - number of observations in the second sample

Here are parts I managed to decipher:

(1) s12/n1 means that by dividing the variance of the first sample by the number of observations in this sample we get the mean variance of the first sample, that is the mean of all variances. Even if I am right, I don't understand what does the mean variance give us and its meaning.

Same for s22/n2 but for the second sample.

(2) hence the numerator is the squared sum of two mean variances; why do we need to square them, if usually squaring in such contexts is used to avoid negative numbers; with variances negatives are excluded, as those are already squared numbers.

(3) the meaning of expressions in the denominator has skipped from me)
Here again we square mean variances, and then divide each by the number of corresponding observations, and then sum both results. What do all these mean?

Thank you very much.
 
Physics news on Phys.org
  • #2
Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ2.
The variance s2 of a set of independent observations with a Normal distribution with variance σ2, with f degrees of freedom, is distributed as Γ(f/2,2σ2/f). M = σ2 and V = 2σ4/f.
Gamma distributions have the property that if x and y are independent gamma variables Γ(α1,β), Γ(α2,β) with the same value of β, then x+y is Γ(α12,β).
Now fs2 is distributed as Γ(f/2,2σ2), so if we have two independent estimates of the same σ2, then
f1s12 + f2s22 is Γ((f1+f2)/2,2σ2)
This is equal to fs2 where f = f1 + f2 and s2 is the pooled estimate of variance. This is how we proceed when we assume the variances of the two samples are equal. The estimated variance of the difference between the sample means is s2(1/n1+1/n2), with f1 + f2 degrees of freedom.
Now suppose we can't assume the variances are equal, call them σ12 and σ22. The above analysis doesn't apply exactly, but we assume that the deviation from a gamma distribution isn't too great, so we can use it as a reasonable approximation.
Now the difference between the means has estimated variance S2 = s12/n1 + s22/n2. The mean of this variable is the sum of the two means, i.e. σ12/n1 + σ22/n2. The variance is the sum of the variances, i.e. 2σ14/n12f1 + 2σ24/n22f2.
If we assume that the distribution is approximately gamma, and remember that mean M = αβ and variance V = αβ2, then this is an approximate s2 distribution with effective degrees of freedom f, where
f = 2α = 2M2/V
= (σ12/n1 + σ22/n2)2/(σ14/n12f1 + σ24/n22f2)
To estimate this f value from the data, we replace the σ2s by the measured s2s.
I think there's a mistake in your formula; in the denominator, the terms of the form (s2/n)/n should be (s2/n)2/(n-1).
 
  • Like
Likes WWGD
  • #3
mjc123 said:
Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ2.
[snip]
I think there's a mistake in your formula; in the denominator, the terms of the form (s2/n)/n should be (s2/n)2/(n-1).
Thank you very much for such detailed explanation. But it gets away from my question and is too complicated for me) I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there. By meaning I mean what exactly is going on in each step, what it means and why this or that type of math formula is used. For example, by computing s2 / n we find the average of variances, and if we take a square root of that we get the standard error which means the average number of standard deviations from the mean.
I am not sure I made it clear what I am looking for - it is sort of a correct reading of the meaning of words in a sentence (what they mean, and why they are combined in such a way, how that combination changes the meaning of a sentence, etc). :smile:
 
  • #4
The short answer is that I am trying to answer your question, but it needs context to make sense. Briefly, in the expression for f, the numerator is the square of the mean of S2 (the variance of the difference between two means) and the denominator is half the variance of S2. Most of my post is just explaining why f = 2M2/V.
To discuss the individual terms, s2/n is not the "average of variances" (whatever that means) but the variance of the mean. If the variance of a single observation is s2 (strictly, σ2 estimated by s2), the variance of the mean of n observations is (estimated by) s2/n. If you're looking at the difference between the means of two samples, the variance of that is the sum of the variances of the two means, s12/n1 + s22/n2. This is what I have called S2.
If s2 estimates σ2, the mean (expectation) of s2 is σ2 and its variance is 2σ4/f.
The variance of S2 is the sum of the variances of its two components, i.e 2s14/n12f1 + 2s24/n22f2. And I have assumed f = n-1.

I hope that makes things at least a little clearer.
 
  • #5
Vital said:
I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there.

If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?
 
  • #6
Stephen Tashi said:
If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

Thank you.
But is it really so that not all formulas can have an intuitive, or logical, interpretation and meaning? I truly don't think so. Of course, I am very bad at math, and it is not my main subject (though, honestly, I wish I could know and understand it at a much much deeper level - I love math), but it is created to explain and in many cases to simplify events, subjects, matters. Hence, I believe that every formula has a meaning and can be explained and interpreted (as in the simplest example of (10 / 3) - 1 tells us how many times 3 is repeated in 10, and by subtracting 1 we see what percentage 3 has in 10). And if the formula is a complicated long one with many parts, then I always though that it can be deconstructed and each part should have the meaning, thus gradually forming the meaning of the whole formula.
As you mentioned, that what truly helps us to remember and understand math operations. It doesn't make sense to simply memorise steps, if one doesn't understand what is going on in this or that math operation, and what it does.

Stephen Tashi said:
You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?
I hope I do, as I am not a student, and have a more than 20 years of working experience. What I am very weak at, is understanding and interpreting math operations when I encounter them, especially now as I am learning new material.
 
  • #7
Vital said:
But is it really so that not all formulas can have an intuitive, or logical, interpretation and meaning?
Yes, it is really true that, at a person's current level of knowledge and experience, some mathematical formulas have no intutiive or logical interpretation.

It's possible that by gaining more experience and knowledge, one can attain an intuitive or logical interpretation of an unfamiliar formula. But doing that can involve more than picking up a few details. It can involve adopting a completely different view of the world.

I hope I do, as I am not a student, and have a more than 20 years of working experience.
Experienced users of statistics get the mis-impression that a "statistic" (in the abstract sense) must be some quantity that is significant or useful in interpreting data. This is not the mathematical definition of a statistic.

What I am very weak at, is understanding and interpreting math operations when I encounter them, especially now as I am learning new material.
Statistics differs from other mathematical topics in that with other topics, it is possible to present them so new concepts are introduced gradually.

For example, learning calculus doesn't start on page 1 with the notion of a derivative. FIrst, we present functions, limits, etc. Likewise linear algebra doesn't start on page 1 with finding the minimal polynomial of a matrix. It begins by teaching the concepts of vectors spaces, bases, etc.

By contrast, the simplest problems in statistics involve the concepts of random variables, functions of random variables, sample distributions, statistics, estimators etc. Statistics texts may begin slowly by teaching about random variables and probability distributions. But after that, most them rush into presenting practical applications that involve sophisticated concepts before students appreciate those concepts. Fortunately for the students, it is possible to work such problems after seeing a few examples without fully appreciating the concepts behind them.
 
  • #8
Stephen Tashi said:
[snip]
.
Thank you) Can you, please, recommend books, which help to develop mathematical thinking, understanding, intuition, etc?
 

What is the concept of degrees of freedom for t-test for 2 samples, 2 variances?

Degrees of freedom for t-test for 2 samples, 2 variances refers to the number of independent pieces of information available for estimating a statistical parameter. In this case, it represents the number of values that are free to vary when calculating the t-statistic for comparing two sample means with unequal variances.

How is the degrees of freedom calculated for this t-test?

The degrees of freedom for t-test for 2 samples, 2 variances is calculated by the following formula: df = (s1²/n1 + s2²/n2)² / ((s1²/n1)²/(n1-1) + (s2²/n2)²/(n2-1)). Here, s1 and s2 represent the sample standard deviations, and n1 and n2 represent the sample sizes for the two groups being compared.

Why is it important to consider degrees of freedom when performing this t-test?

Degrees of freedom play a crucial role in determining the critical value for the t-test statistic and, therefore, the likelihood of obtaining a significant result. Not considering degrees of freedom can result in incorrect conclusions and inflated false-positive rates.

What is the relationship between degrees of freedom and sample size in this t-test?

As the sample size increases, the degrees of freedom also increase, leading to a more precise estimate of the true population parameters. This means that larger samples will have a smaller standard error, resulting in a smaller t-value and a higher likelihood of obtaining a significant result.

Can degrees of freedom be negative in this t-test?

No, degrees of freedom cannot be negative. In this t-test, the minimum degrees of freedom is 0, which occurs when either sample has a size of 1. However, having a low number of degrees of freedom can impact the accuracy and reliability of the t-test results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
723
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
925
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
817
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
845
Back
Top