Hypothesis Testing: Comparing Gaussian Distributions

Click For Summary

Discussion Overview

The discussion revolves around hypothesis testing for comparing two Gaussian distributions, specifically focusing on how to formulate and test hypotheses regarding the mean of the distributions. Participants explore the implications of using maximum likelihood ratios and the challenges of distinguishing between very close means.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes using the maximum likelihood ratio to define a test statistic for discriminating between two hypotheses regarding Gaussian distributions.
  • Another participant argues that the proposed hypotheses are not practically distinguishable without a large sample size, suggesting that a more reasonable alternative hypothesis should be formulated.
  • A third participant emphasizes the need for specificity in hypothesis formulation, noting that vague statements about the mean do not allow for effective hypothesis testing.
  • This participant also discusses the differences between Bayesian and frequentist approaches to hypothesis testing, highlighting the challenges of computing probabilities under certain assumptions.
  • One suggestion includes the use of power curves to analyze the effectiveness of a statistical test, although it is noted that this does not constitute a hypothesis test.
  • A later reply suggests that a one-tailed t-test might be appropriate for the scenario described, referencing external resources for further clarification.

Areas of Agreement / Disagreement

Participants express differing views on the formulation of hypotheses and the practicality of distinguishing between them, indicating that multiple competing perspectives remain unresolved.

Contextual Notes

Limitations include the lack of specificity in the alternative hypothesis, the dependence on sample size for distinguishing between means, and the unresolved nature of the mathematical steps involved in hypothesis testing.

Gaussian97
Homework Helper
Messages
683
Reaction score
412
TL;DR
How to formalize the hypothesis of having a mean bigger than some value.
Hi, I have some set of data and I want to use Hypothesis Testing to discriminate between two hypotheses:
H0: My data follows a Gaussian distribution with a given mean and a given std (the actual values are ugly, so let's say mean = 0 and std = 1).
H1: My data follows a Gaussian distribution with mean > 0 and std = 1 (the same as before).

So, I want to use the maximum likelihood ratio to define my test statistic as
$$t(\vec{x})=\frac{f(\vec{x}|H_1)}{f(\vec{x}|H_0)}$$
So, for ##H_0## its clear that ##f(x|H_0)=N(0,1)##, but how do I find the expresion for ##f(x|H_1)?##.

Would be valid to compute the sample mean and, since it's actually bigger than 0, use $$f(x|H_1)=N(\bar{x},1)$$?

Thanks.
 
Physics news on Phys.org
The two cases that you describe can not have a useful test. The cases of mean=0 versus mean=0.0000000000001 will not be distinguishable without billions of samples. You must start with an alternative hypothesis where a reasonable sample has a chance of being convincing. It is not usually necessary to determine the distribution associated with the alternative hypothesis. The assumption of the null hypothesis gives you the distribution that you will use and a sample that is out of line with that distribution allows you to convincingly state that the alternative hypothesis is the better choice.

CORRECTION: In your stated cases, if the sample mean is large enough, you can reject the null hypothesis.
 
Last edited:
  • Like
Likes   Reactions: WWGD and Stephen Tashi
Gaussian97 said:
Summary: How to formalize the hypothesis of having a mean bigger than some value.

A hypothesis for a hypothesis test must be specific enough to allow computing the probability of the observed data using the assumption that the hypothesis is correct. A statement like "The mean of the distribution is greater than 1" is not specific enough. So you must add additional assumptions.

A Bayesian approach is to treat the mean as a random variable that has a "prior" probability distribution. You must assume a particular distribution for the mean. (This like saying that Nature picked the value of the mean in some random way when she created the population.)

If you wish to stay with the "frequentist" outlook that the mean has a "fixed but unknown value" (i.e. it is not a random variable) then you can't compute the probability of the data only knowing that the mean is greater than 1. So no hypothesis test is possible.

A frequentist might resort to looking at "power curves". These curves are used to analyze the "power" of a statistical test. In your example, the hypothesis that the mean and standard deviations have given values is specific enough to do a "one tailed" hypothesis test using those values as the null hypothesis. You can look at power curves for that test. However, making a decision based on the behavior of power curves is not a hypothesis test

It's important to keep in mind that a hypothesis test is not a mathematical deduction and, unless a lot more additional information is assumed than in your example, it is not a procedure that produces an optimal decision. (What function would we be optimizing?) A hypothesis test is simply a procedure. Hypothesis tests have proven empirically useful in many fields of study. However, the mathematical theory (Optimal Statistical Decisions) that justifies the use of particular hypothesis tests requires assuming more structure and given information than is found in the garden variety problems (like yours) that occur in introductory books on statistics.
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
3K
Replies
20
Views
3K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
1
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K