Hornbein said:
Suppose a grad student wants to prove that exposure to the music of Led Zeppelin increases the sexual potency of rats. The null hypothesis is that this is not so. I find this believable.
This is not a typical null hypothesis. This would be called an an alternative hypothesis. So the hypothesis of interest is that the effect is positive, and the alternative hypothesis is that the effect is non-positive (negative or zero). The null hypothesis is that there is no effect, i.e. that the effect is exactly zero.
A point hypothesis is generally not believable. If a parameter, like the effect size, is continuous then the chance that it assumes a specific single value vanishes.
Nevertheless, unbelievable null hypotheses are used because they allow easy calculation of the probability of the observed data under the point hypothesis. In other words it is easy to calculate ##P(D|H)## where ##D## is the data and ##H## is the hypothesis if ##H## is a point hypothesis. For your example it would be just as difficult to calculate ##P(D|H)## for your experimental hypothesis as it is for your alternative hypothesis. There would be no utility in that alternative hypothesis.
So typically your grad student would compare the data to the unbelievable null hypothesis, show that the data is unlikely to have arisen by chance under the null hypothesis, also show that the average effect is positive, and then claim that is evidence supporting the experimental hypothesis.
Hornbein said:
I share this misinterpretation, assuming an experiment is properly designed. A small p value suggests that the sought-for effect is real. Perhaps there is something I am missing. Of course all this depends on proper application of statistical methods.
A small p-value indicates only that the observed data is unlikely to have arisen by chance under the null hypothesis. Any other inference is suspect.
That the observed data is unlikely to have arisen by chance under the null hypothesis does not itself indicate anything anything about the experimental hypothesis. The null hypothesis could be true and the experimenter just was unlucky. The null hypothesis could be true but the sampling non-random. The null hypothesis and the experimental hypothesis could both be false together. The experimental hypothesis could be one of many experimental hypotheses and multiple comparisons were not considered. Etc.
You are by far not alone in your misinterpretation. That is one of the biggest problems with p values.
It is actually kind of sad because when we take statistics they very carefully explain that you say "we reject the null hypothesis" and never that we "accept the experimental hypothesis". In statistics class people are told that the test just rejects the null hypothesis and does not support the experimental hypothesis. And then we publish our first scientific paper and in the results we reject the null hypothesis as we were taught in statistics class, and then immediately in the discussion section we accept the experimental hypothesis anyway.