Error Rate and Its Role in Significance

AI Thread Summary
The discussion revolves around the relationship between p-values, Type II error rates, and significance testing. It clarifies that a p-value does not change simply because results are not significant; it is determined by the specific null hypothesis and results. Increasing the alpha level may reduce Type II error when a specific alternative hypothesis is defined, but in practical scenarios without such specificity, the relationship can be complex. Power curves are introduced as a tool to visualize the probability of Type II error across different scenarios. Ultimately, enlarging the acceptance interval increases the likelihood of retaining the null hypothesis, thereby increasing the probability of Type II error.
Soaring Crane
Messages
461
Reaction score
0
I am trying to understand the relationship between p-values and Type II error rate. Type II error occurs when there is failure to reject the null when it is false. As an example, suppose that I find that my results are not significant. In this case, the p-value is increased, or higher (in relation to the alpha level). If the p-value increases, then we are more likely to retain the null, and doesn't this increase Type II error? If this error increases, then the error rate also increases?

Thanks.
 
Physics news on Phys.org
Soaring Crane said:
As an example, suppose that I find that my results are not significant. In this case, the p-value is increased, or higher (in relation to the alpha level). If the p-value increases, then we are more likely to retain the null, and doesn't this increase Type II error? If this error increases, then the error rate also increases?

In a statistical test, the null hypothesis must be specific enough to let you compute a p-value. So if you have a given set of results, it doesn't make sense to say that the "p-value is increased" when the results aren't significant. For the given null hypothesis and given results, the p-value is whatever it is. It doesn't change if it fails to be significant. Perhaps you want to ask about the effect of increasing alpha.

I think in most practical situations increasing alpha does increase the probability of type II error, but I don't know any mathematical proof that this must always be true.

Unless you have a specific alternative hypothesis, you cannot compute type II error. For example in flipping a coin ten times, if null hypothesis is "the coin is fair" let's you compute the p-value of a given number of heads. But the hypothesis "the coin is not fair" is not specific enough to let you compute any probability. So unless you are very specific about the way in which the null hypothesis is false, you can't compute type II error.

In frequentist statistics people get a feeling for type II error by looking at graphs that represent, in a manner of speaking, all possible alternatives to the null hypothesis. These are called "power curves" for statistical tests. For example, for a specific probability Q of heads, you can compute the probability of a given number of heads in 10 tosses and (for a given alpha) you can compute the probability that a hyothesis test of "the coin is fair" will accept the null hypothesis that the coin is fair when the true probablity of heads is Q. The power curve gives you a probability of type II error for each possible value of Q.
 
Stephen Tashi said:
The power curve gives you a probability of type II error for each possible value of Q.

I should have said "The power curve gives you the probability of rejecting the null hypothesis for each possible value of Q". From one minus that, you can get the probability of type II error.
 
Stephen Tashi said:
I think in most practical situations increasing alpha does increase the probability of type II error, but I don't know any mathematical proof that this must always be true.

On the contrary, when we have a specific alternative hypothesis, P[typeII error] decreases as α (the size of the test) increases.
 
ssd said:
On the contrary, when we have a specific alternative hypothesis, P[typeII error] decreases as α (the size of the test) increases.

I agree with that except for the "on the contrary" because in most practical situations there is no specific alternative hypothesis.
 
Well, let us have a little discussion in the spirit of the game. Hope, no one minds.
Consider a test for sample mean, population normal with known σ. Notations usual.
H is null and K is alt hypothesis.
To test H:μ=μ0, ag, K: μ>μ0. Test statistic Z (follows normal) , size of the test =α, critical region w:Z> Zα.

Since K does not specify a value of μ, we need to compare the power curves corresponding to different α in context of the present problem.

Suppose we plot the power curves for a number of different α values on the same graph paper, with μ (where, μ> μ0, to see the type II error) values in the horizontal axis.
It will be seen that for higher α, the curve is higher. This means, probabilities of type II error is lower for for higher α corresponding to any μ belonging to K.

α increased implies higher probability of rejection of H, other factors remaining unchanged. Therefore higher α implies higher probability of rejection of H, even if μ belongs to K.

Looking forward for discussions and counter statements, which are most welcome for refinement of understanding of the matter.
 
ssd said:
Consider a test for sample mean, population normal with known σ..

I consider a known \sigma to be rare in practical situations.

I agree that your example shows a situation where a power curve argument proves increasing acceptance region increases type II error. It increases it by an unknown amount since we don't know where on the curve we are.

There are many natural and good "beginners" questions about math that we see posted over and over on the forum, but there is one in statistics that I've yet to see. An example of it is this: Suppose we know the variance of a normal distribution is \sigma^2 = 1 and we are doing a test of the hypothesis that its mean \mu = 0 at a 5% significance level. Why must we set the acceptance region (for the sample) mean to be a symmetrical interval that contains zero? After all, we could define the acceptance region to be any sort of intervals that have a 95% probability of containing the sample mean and get the same probability of type I error. We could even define the acceptance region to be two disjoint intervals that don't contain the value 0 at all !

Perhaps the only answer to the above question is to resort to a power curve argument and show that a test with a symmetrical region about 0 is "uniformly most powerful" among all possible tests using the value of the sample mean. I've never read such a proof. (If it exists, I think it would have to be a very technical since "all possible" tests is a big class of tests!)
 
I just gave the simplest (?) example. If σ is unknown, we simply have to think of a 't' test.
The fact about power curves and varying α remains very similar.
 
ssd said:
I just gave the simplest (?) example. If σ is unknown, we simply have to think of a 't' test.
The fact about power curves and varying α remains very similar.

A t-test of whether two distributions known of have the same standard deviation have the same mean has a power curve. If we take the frequently encountered situation of testing whether two distributions, not known to have the same standard deviation have the same mean, then we get a "power surface" - say alpha on the z, difference in means on x and difference in standard deviations on the y.
 
  • #10
Thinking more about the original question
Soaring Crane said:
If the p-value increases, then we are more likely to retain the null, and doesn't this increase Type II error?
I've tried to defog my mind as follows:

Suppose we define an interval such as I_A = {-1.5,1.5}.

And suppose we sign some contracts such as the following:

If the sample mean falls in I_A then I will dance a jig.

If the sample mean falls in I_A then I will accept the teaching of the philosopher Hegel

If the sample mean falls in I_A then I will reject the theory of Evolution

Now define a larger interval I_B = {-2.0, 2.0}.

What can we say about analagous contracts that are based on I_B instead of I_A?

A contract such as

If the sample mean falls in I_B then I will dance a jig

is at least as likely to take effect as the similar contract based on I_A since I_B includes I_A as a subset. The contract based on I_B will be more likely to take effect if increasing the interval to I_B includes an additional event that has positive probability.

If we restrict ourselves to situations not mentioned in the contract, such as particular weather, the same conclusions apply. The contract about dancing a jig didn't mention anything about the weather. So we can say:

If it is raining then the probability that I will dance a jig under the contract using I_B is equal or greater than the probability that I will dance a jig under the contract using I_A.

The contracts don't mention any conditions about the absolute truth of the ideas to be accepted or rejected. Thus we can say:

If the ideas of Hegel are false then the probability that I will dance a jig under the contract using I_B is equal or greater than the probability that I will dance a jig under the contract using I_A.

and

If the ideas of Hegel are false then the probability that I will accept the teaching of the philosopher Hegel under the contract using I_B is equal or greater than the probability that I will accept his teaching under the contract using I_A.

and

If the ideas of Hegel are true then the probability that I will accept the teaching of the philosopher Hegel under the contract using I_B is equal or greater than the probability that I will accept his teaching under the contract using I_A.

and

If the theory of evolution is false then the probability that I will accept the teaching of the philosopher Hegel under the contract using I_B is equal or greater than the probability that I will accept his teaching under the contract using I_A.

and

If the theory of evolution is false then the probabtility that I will reject the theory of Evolution under the contract using I_B is equal or greater than the probability that I will reject it under the contract using I_A.

If the theory of evolution is true then the probabtility that I will reject the theory of Evolution under the contract using I_B is equal or greater than the probability that I will reject it under the contract using I_A.

The answer to original question is that the probability of type II error will increase if you enlarge the acceptance interval and the enlargement includes some event with non-zero probability. But, more generally, the probability of your doing anything ( dancing a jig, etc.) that is triggered by the result falling in a certain region will increase if you enlarge the region to include addtional probability.
 
Back
Top