Hypothesis Testing: Comparing Gaussian Distributions

In summary, the conversation discusses the use of Hypothesis Testing to discriminate between two hypotheses: H0, which assumes a Gaussian distribution with a given mean and standard deviation, and H1, which assumes a Gaussian distribution with a mean greater than 0 and a standard deviation of 1. The maximum likelihood ratio is used as the test statistic, and the question is raised on how to find the expression for f(x|H1). The expert explains that a hypothesis for a hypothesis test must be specific enough to allow computing the probability of the observed data, and that a Bayesian approach or looking at power curves may be necessary. It is also noted that a hypothesis test is not a mathematical deduction and may not produce an optimal decision.
  • #1
Gaussian97
Homework Helper
683
412
TL;DR Summary
How to formalize the hypothesis of having a mean bigger than some value.
Hi, I have some set of data and I want to use Hypothesis Testing to discriminate between two hypotheses:
H0: My data follows a Gaussian distribution with a given mean and a given std (the actual values are ugly, so let's say mean = 0 and std = 1).
H1: My data follows a Gaussian distribution with mean > 0 and std = 1 (the same as before).

So, I want to use the maximum likelihood ratio to define my test statistic as
$$t(\vec{x})=\frac{f(\vec{x}|H_1)}{f(\vec{x}|H_0)}$$
So, for ##H_0## its clear that ##f(x|H_0)=N(0,1)##, but how do I find the expresion for ##f(x|H_1)?##.

Would be valid to compute the sample mean and, since it's actually bigger than 0, use $$f(x|H_1)=N(\bar{x},1)$$?

Thanks.
 
Physics news on Phys.org
  • #2
The two cases that you describe can not have a useful test. The cases of mean=0 versus mean=0.0000000000001 will not be distinguishable without billions of samples. You must start with an alternative hypothesis where a reasonable sample has a chance of being convincing. It is not usually necessary to determine the distribution associated with the alternative hypothesis. The assumption of the null hypothesis gives you the distribution that you will use and a sample that is out of line with that distribution allows you to convincingly state that the alternative hypothesis is the better choice.

CORRECTION: In your stated cases, if the sample mean is large enough, you can reject the null hypothesis.
 
Last edited:
  • Like
Likes WWGD and Stephen Tashi
  • #3
Gaussian97 said:
Summary: How to formalize the hypothesis of having a mean bigger than some value.

A hypothesis for a hypothesis test must be specific enough to allow computing the probability of the observed data using the assumption that the hypothesis is correct. A statement like "The mean of the distribution is greater than 1" is not specific enough. So you must add additional assumptions.

A Bayesian approach is to treat the mean as a random variable that has a "prior" probability distribution. You must assume a particular distribution for the mean. (This like saying that Nature picked the value of the mean in some random way when she created the population.)

If you wish to stay with the "frequentist" outlook that the mean has a "fixed but unknown value" (i.e. it is not a random variable) then you can't compute the probability of the data only knowing that the mean is greater than 1. So no hypothesis test is possible.

A frequentist might resort to looking at "power curves". These curves are used to analyze the "power" of a statistical test. In your example, the hypothesis that the mean and standard deviations have given values is specific enough to do a "one tailed" hypothesis test using those values as the null hypothesis. You can look at power curves for that test. However, making a decision based on the behavior of power curves is not a hypothesis test

It's important to keep in mind that a hypothesis test is not a mathematical deduction and, unless a lot more additional information is assumed than in your example, it is not a procedure that produces an optimal decision. (What function would we be optimizing?) A hypothesis test is simply a procedure. Hypothesis tests have proven empirically useful in many fields of study. However, the mathematical theory (Optimal Statistical Decisions) that justifies the use of particular hypothesis tests requires assuming more structure and given information than is found in the garden variety problems (like yours) that occur in introductory books on statistics.
 

1. What is a hypothesis test?

A hypothesis test is a statistical method used to determine whether a certain assumption about a population is true or not. It involves comparing a sample data to a known or assumed population distribution to see if there is enough evidence to support or reject the hypothesis.

2. What is a Gaussian distribution?

A Gaussian distribution, also known as a normal distribution, is a type of probability distribution that is symmetric and bell-shaped. It is characterized by its mean and standard deviation, and many natural phenomena can be described by this distribution.

3. How do you compare two Gaussian distributions?

To compare two Gaussian distributions, you can use a hypothesis test, such as the t-test or ANOVA. These tests will determine if there is a significant difference between the means of the two distributions. You can also visually compare the distributions by plotting them on the same graph and looking for any noticeable differences.

4. What is the null hypothesis in hypothesis testing?

The null hypothesis is the assumption that there is no significant difference between the two populations being compared. It is the default hypothesis that is assumed to be true unless there is enough evidence to reject it. In Gaussian distribution comparison, the null hypothesis would state that the means of the two populations are equal.

5. How do you interpret the results of a hypothesis test for comparing Gaussian distributions?

The results of a hypothesis test for comparing Gaussian distributions will provide a p-value, which represents the probability of obtaining the observed data if the null hypothesis is true. If the p-value is less than the chosen significance level (usually 0.05), then there is enough evidence to reject the null hypothesis and conclude that there is a significant difference between the two distributions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
919
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
464
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
857
Replies
4
Views
1K
Back
Top