# Hypothesis Testing

• I

## Summary:

How to formalize the hypothesis of having a mean bigger than some value.
Hi, I have some set of data and I want to use Hypothesis Testing to discriminate between two hypotheses:
H0: My data follows a Gaussian distribution with a given mean and a given std (the actual values are ugly, so let's say mean = 0 and std = 1).
H1: My data follows a Gaussian distribution with mean > 0 and std = 1 (the same as before).

So, I want to use the maximum likelihood ratio to define my test statistic as
$$t(\vec{x})=\frac{f(\vec{x}|H_1)}{f(\vec{x}|H_0)}$$
So, for ##H_0## its clear that ##f(x|H_0)=N(0,1)##, but how do I find the expresion for ##f(x|H_1)?##.

Would be valid to compute the sample mean and, since it's actually bigger than 0, use $$f(x|H_1)=N(\bar{x},1)$$?

Thanks.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
FactChecker
Gold Member
The two cases that you describe can not have a useful test. The cases of mean=0 versus mean=0.0000000000001 will not be distinguishable without billions of samples. You must start with an alternative hypothesis where a reasonable sample has a chance of being convincing. It is not usually necessary to determine the distribution associated with the alternative hypothesis. The assumption of the null hypothesis gives you the distribution that you will use and a sample that is out of line with that distribution allows you to convincingly state that the alternative hypothesis is the better choice.

CORRECTION: In your stated cases, if the sample mean is large enough, you can reject the null hypothesis.

Last edited:
• WWGD and Stephen Tashi
Stephen Tashi
Summary: How to formalize the hypothesis of having a mean bigger than some value.
A hypothesis for a hypothesis test must be specific enough to allow computing the probability of the observed data using the assumption that the hypothesis is correct. A statement like "The mean of the distribution is greater than 1" is not specific enough. So you must add additional assumptions.

A Bayesian approach is to treat the mean as a random variable that has a "prior" probability distribution. You must assume a particular distribution for the mean. (This like saying that Nature picked the value of the mean in some random way when she created the population.)

If you wish to stay with the "frequentist" outlook that the mean has a "fixed but unknown value" (i.e. it is not a random variable) then you can't compute the probability of the data only knowing that the mean is greater than 1. So no hypothesis test is possible.

A frequentist might resort to looking at "power curves". These curves are used to analyze the "power" of a statistical test. In your example, the hypothesis that the mean and standard deviations have given values is specific enough to do a "one tailed" hypothesis test using those values as the null hypothesis. You can look at power curves for that test. However, making a decision based on the behavior of power curves is not a hypothesis test

It's important to keep in mind that a hypothesis test is not a mathematical deduction and, unless a lot more additional information is assumed than in your example, it is not a procedure that produces an optimal decision. (What function would we be optimizing?) A hypothesis test is simply a procedure. Hypothesis tests have proven empirically useful in many fields of study. However, the mathematical theory (Optimal Statistical Decisions) that justifies the use of particular hypothesis tests requires assuming more structure and given information than is found in the garden variety problems (like yours) that occur in introductory books on statistics.