Degrees of freedom for t-test for 2 samples, 2 variances

Vital · Nov 6, 2018

Hello.

I will be grateful for your help in finding the logical meaning of each part of the formula of degrees of freedom, which are computed for a t-test when variances are unknown and are assumed to be unequal.

Please, take a look at the formula, the way I managed to understand some parts of it, and, please, help me to understand the rest of it.

The formula is as follows:

degrees of freedom =^{[ s₁²/n₁ + s₂²/n₂ ] 2} / [ ( s₁²/n₁ ) / n₁ + ( s₂²/n₂ ) / n₂ ]

where s₁² is the variance of the first sample
s₂² is the variance of the second sample
n₁ - number of observations in the first sample
n₁ - number of observations in the second sample

Here are parts I managed to decipher:

(1) s₁²/n₁ means that by dividing the variance of the first sample by the number of observations in this sample we get the mean variance of the first sample, that is the mean of all variances. Even if I am right, I don't understand what does the mean variance give us and its meaning.

Same for s₂²/n₂ but for the second sample.

(2) hence the numerator is the squared sum of two mean variances; why do we need to square them, if usually squaring in such contexts is used to avoid negative numbers; with variances negatives are excluded, as those are already squared numbers.

(3) the meaning of expressions in the denominator has skipped from me)
Here again we square mean variances, and then divide each by the number of corresponding observations, and then sum both results. What do all these mean?

Thank you very much.

mjc123 · Nov 6, 2018

Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ².
The variance s² of a set of independent observations with a Normal distribution with variance σ², with f degrees of freedom, is distributed as Γ(f/2,2σ²/f). M = σ² and V = 2σ⁴/f.
Gamma distributions have the property that if x and y are independent gamma variables Γ(α₁,β), Γ(α₂,β) with the same value of β, then x+y is Γ(α₁+α₂,β).
Now fs² is distributed as Γ(f/2,2σ²), so if we have two independent estimates of the same σ², then
f₁s_1² + f₂s₂² is Γ((f₁+f₂)/2,2σ²)
This is equal to fs² where f = f₁ + f₂ and s² is the pooled estimate of variance. This is how we proceed when we assume the variances of the two samples are equal. The estimated variance of the difference between the sample means is s²(1/n₁+1/n₂), with f1 + f2 degrees of freedom.
Now suppose we can't assume the variances are equal, call them σ₁² and σ₂². The above analysis doesn't apply exactly, but we assume that the deviation from a gamma distribution isn't too great, so we can use it as a reasonable approximation.
Now the difference between the means has estimated variance S² = s₁²/n₁ + s₂²/n₂. The mean of this variable is the sum of the two means, i.e. σ₁²/n₁ + σ₂²/n₂. The variance is the sum of the variances, i.e. 2σ₁⁴/n₁²f₁ + 2σ₂⁴/n₂²f₂.
If we assume that the distribution is approximately gamma, and remember that mean M = αβ and variance V = αβ², then this is an approximate s² distribution with effective degrees of freedom f, where
f = 2α = 2M²/V
= (σ₁²/n₁ + σ₂²/n₂)²/(σ₁⁴/n₁²f₁ + σ₂⁴/n₂²f₂)
To estimate this f value from the data, we replace the σ²s by the measured s²s.
I think there's a mistake in your formula; in the denominator, the terms of the form (s²/n)/n should be (s²/n)²/(n-1).

Vital · Nov 7, 2018

mjc123 said:

Let's start with some definitions. A gamma distribution with parameters α and β, Γ(α,β) (look up gamma distribution for details) has a mean M = αβ and variance V = αβ².
[snip]
I think there's a mistake in your formula; in the denominator, the terms of the form (s²/n)/n should be (s²/n)²/(n-1).

Thank you very much for such detailed explanation. But it gets away from my question and is too complicated for me) I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there. By meaning I mean what exactly is going on in each step, what it means and why this or that type of math formula is used. For example, by computing s² / n we find the average of variances, and if we take a square root of that we get the standard error which means the average number of standard deviations from the mean.
I am not sure I made it clear what I am looking for - it is sort of a correct reading of the meaning of words in a sentence (what they mean, and why they are combined in such a way, how that combination changes the meaning of a sentence, etc).

mjc123 · Nov 7, 2018

The short answer is that I am trying to answer your question, but it needs context to make sense. Briefly, in the expression for f, the numerator is the square of the mean of S² (the variance of the difference between two means) and the denominator is half the variance of S². Most of my post is just explaining why f = 2M²/V.
To discuss the individual terms, s²/n is not the "average of variances" (whatever that means) but the variance of the mean. If the variance of a single observation is s² (strictly, σ² estimated by s²), the variance of the mean of n observations is (estimated by) s²/n. If you're looking at the difference between the means of two samples, the variance of that is the sum of the variances of the two means, s₁²/n₁ + s₂²/n₂. This is what I have called S².
If s² estimates σ², the mean (expectation) of s² is σ² and its variance is 2σ⁴/f.
The variance of S² is the sum of the variances of its two components, i.e 2s₁⁴/n₁²f₁ + 2s₂⁴/n₂²f₂. And I have assumed f = n-1.

I hope that makes things at least a little clearer.

Stephen Tashi · Nov 10, 2018

Vital said:

I don't have that much knowledge of statistics, and I am merely trying to understand the meaning of each part of the formula I gave in my question, as I described there.

If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?

Vital · Nov 30, 2018

Stephen Tashi said:

If you expect every term in a mathematical formula to have a meaning, you are taking the wrong approach. It is handy when individual parts of a formula have specific interpretations. This helps us remember and understand the formula. However, it is unrealistic to expect every formula to have total meaning that can be understood by assigning each part of it an individual meaning.

Thank you.
But is it really so that not all formulas can have an intuitive, or logical, interpretation and meaning? I truly don't think so. Of course, I am very bad at math, and it is not my main subject (though, honestly, I wish I could know and understand it at a much much deeper level - I love math), but it is created to explain and in many cases to simplify events, subjects, matters. Hence, I believe that every formula has a meaning and can be explained and interpreted (as in the simplest example of (10 / 3) - 1 tells us how many times 3 is repeated in 10, and by subtracting 1 we see what percentage 3 has in 10). And if the formula is a complicated long one with many parts, then I always though that it can be deconstructed and each part should have the meaning, thus gradually forming the meaning of the whole formula.
As you mentioned, that what truly helps us to remember and understand math operations. It doesn't make sense to simply memorise steps, if one doesn't understand what is going on in this or that math operation, and what it does.

Stephen Tashi said:

You should first understand the "big picture" of what's going on in statistical testing. A fundamental question is "What is a statistic"? Do you understand that concept?

I hope I do, as I am not a student, and have a more than 20 years of working experience. What I am very weak at, is understanding and interpreting math operations when I encounter them, especially now as I am learning new material.

Stephen Tashi · Nov 30, 2018

Vital said:

But is it really so that not all formulas can have an intuitive, or logical, interpretation and meaning?

Yes, it is really true that, at a person's current level of knowledge and experience, some mathematical formulas have no intutiive or logical interpretation.

It's possible that by gaining more experience and knowledge, one can attain an intuitive or logical interpretation of an unfamiliar formula. But doing that can involve more than picking up a few details. It can involve adopting a completely different view of the world.

I hope I do, as I am not a student, and have a more than 20 years of working experience.

Experienced users of statistics get the mis-impression that a "statistic" (in the abstract sense) must be some quantity that is significant or useful in interpreting data. This is not the mathematical definition of a statistic.

What I am very weak at, is understanding and interpreting math operations when I encounter them, especially now as I am learning new material.

Statistics differs from other mathematical topics in that with other topics, it is possible to present them so new concepts are introduced gradually.

For example, learning calculus doesn't start on page 1 with the notion of a derivative. FIrst, we present functions, limits, etc. Likewise linear algebra doesn't start on page 1 with finding the minimal polynomial of a matrix. It begins by teaching the concepts of vectors spaces, bases, etc.

By contrast, the simplest problems in statistics involve the concepts of random variables, functions of random variables, sample distributions, statistics, estimators etc. Statistics texts may begin slowly by teaching about random variables and probability distributions. But after that, most them rush into presenting practical applications that involve sophisticated concepts before students appreciate those concepts. Fortunately for the students, it is possible to work such problems after seeing a few examples without fully appreciating the concepts behind them.

Vital · Dec 1, 2018

Stephen Tashi said:

[snip]
.

Thank you) Can you, please, recommend books, which help to develop mathematical thinking, understanding, intuition, etc?

Degrees of freedom for t-test for 2 samples, 2 variances

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight