Statistical Errors, Type I and Type II

  • #1
Agent Smith
231
22
TL;DR Summary
what do statistical errors, Type I and Type II, depend on?
Reached Hypothesis testing in my statistics notes (high school level).

It reads ...

1. Type I Error: Rejecting the null (hypothesis), ##H_0##, when ##H_0## is true. The risk of a Type I error can be reduced by lowering the significance level ##\alpha##. The downside is this increases the risk of a Type II error.

2. Type II Error: Failing to reject the ##H_0## when ##H_0## is false. The risk of a Type II error can be reduced by
a. Raising the significance level ##\alpha##. The downside, this increases the risk of a Type I error
b. Taking a larger sample (sample size)
c. It would be convenient if there's less variation in the parent population.
d. I think I'm forgetting something here ...

I would like to know what the justifications for 2b, 2c are and is there a 2d?

Gracias, muchas
 
Physics news on Phys.org
  • #2
The effect of raising or lowering the confidence level (significance level?) the you state seems backward to me. A very high confidence level, say 5-sigma, makes it very difficult to reject the null hypothesis, ##H_0##.
 
  • #3
Increasing the sample size will give you a better estimate of the standard deviation and the mean.

The population you use for your study must be chosen carefully and represent the population the study is about. If the population has too many confounding factors that introduce greater variability it will be harder to distinguish your study from the control group.

What is your reason for suggesting 2d?
 
  • #4
FactChecker said:
The effect of raising or lowering the confidence level (significance level?) the you state seems backward to me. A very high confidence level, say 5-sigma, makes it very difficult to reject the null hypothesis, ##H_0##.
I agree with what you say and that's what lowering ##\alpha## means I think. It would reduce the risk of Type I errors.

gleem said:
Increasing the sample size will give you a better estimate of the standard deviation and the mean.

The population you use for your study must be chosen carefully and represent the population the study is about. If the population has too many confounding factors that introduce greater variability it will be harder to distinguish your study from the control group.

What is your reason for suggesting 2d?
Yes I kinda know that increasing sample size reduces variation, but how does that lead to reduced risk of type II errors?
 
  • #5
Agent Smith said:
Yes I kinda know that increasing sample size reduces variation, but how does that lead to reduced risk of type II errors?
The null hypothesis population mean and the alternative hypothesis population mean must be different. Then, when the sample gets large enough, the separation of those two means will be a greater number of standard deviations of the sample mean. That makes it less likely that the sample mean will be near the mean of the wrong hypothesis. A large sample size reduces the probabilities of either type of error.
 
  • Like
  • Skeptical
Likes Agent Smith and Hornbein
  • #6
Agent Smith said:
TL;DR Summary: what do statistical errors, Type I and Type II, depend on?

Reached Hypothesis testing in my statistics notes (high school level).

It reads ...

1. Type I Error: Rejecting the null (hypothesis), ##H_0##, when ##H_0## is true. The risk of a Type I error can be reduced by lowering the significance level ##\alpha##. The downside is this increases the risk of a Type II error.

2. Type II Error: Failing to reject the ##H_0## when ##H_0## is false. The risk of a Type II error can be reduced by
a. Raising the significance level ##\alpha##. The downside, this increases the risk of a Type I error
b. Taking a larger sample (sample size)
c. It would be convenient if there's less variation in the parent population.
d. I think I'm forgetting something here ...

I would like to know what the justifications for 2b, 2c are and is there a 2d?

Gracias, muchas

FactChecker said:
The null hypothesis population mean and the alternative hypothesis population mean must be different. Then, when the sample gets large enough, the separation of those two means will be a greater number of standard deviations of the sample mean. That makes it less likely that the sample mean will be near the mean of the wrong hypothesis. A large sample size reduces the probabilities of either type of error.
I'll quibble that it would be better to say

That makes it less likely that the sample mean will be near the mean of the wrong hypothesis due to sampling error.

The same holds if there's less variation in the parent population. Sampling error will be less.
 
  • Like
Likes FactChecker
  • #7
FactChecker said:
The null hypothesis population mean and the alternative hypothesis population mean must be different. Then, when the sample gets large enough, the separation of those two means will be a greater number of standard deviations of the sample mean. That makes it less likely that the sample mean will be near the mean of the wrong hypothesis. A large sample size reduces the probabilities of either type of error.
So we have ##2## populations??? It's confusing, do you have a link I can read? Gracias
 
  • #8
Agent Smith said:
So we have ##2## populations??? It's confusing, do you have a link I can read? Gracias
We have a hypothetical population mean of .99. We also have the true population mean, which is unknown.
 
  • Like
Likes Agent Smith
  • #9
Agent Smith said:
So we have ##2## populations??? It's confusing, do you have a link I can read? Gracias
You have the known theoretical mean, ##m_0##, of the null hypothesis, ##H_0##. You also have the theoretical mean, ##m_1##, of an alternative hypothesis, ##H_1##, at some distance, ##d=|m_0-m_1]##. Finally, you have the sample mean, ##m_s##. You don't need to worry about ##m_1## being undetermined because all calculations will use the distribution of the null hypothesis. As the sample size increases, the confidence regions around ##m_0## and ##m_1## decrease. If the null hypothesis is correct the possibility of ##m_s## being near ##m_1## decreases.
 
Last edited:
  • Skeptical
Likes Agent Smith
  • #10
FactChecker said:
You have the known theoretical mean, ##m_0##, of the null hypothesis, ##H_0##. You also have the theoretical mean, ##m_1##, of an alternative hypothesis, ##H_1##, at some distance, ##d=|m_0-m_1]##. Finally, you have the sample mean, ##m_s##. You don't need to worry about ##m_1## being undetermined because all calculations will use the distribution of the null hypothesis. As the sample size increases, the confidence regions around ##m_0## and ##m_1## decrease. If the null hypothesis is correct the possibility of ##m_s## being near ##m_1## decreases.
Here is an example question. Maybe you can use it to clarify my doubts.

A swimming pool is regularly checked for contaminants. If contaminant concentration ##\geq 400## ppm, the pool is unfit for swimming and isclosed for decontamination.
The Null Hypothesis ##H_0##: The contaminant concentration ##< 400## ppm
The Alternative Hypothesis ##H_a##: The contaminant concentration ##\geq 400## ppm

I take a sample of the pool's water and measure contaminant concentration.

A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.

Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking ##1## sample ( a ##1## sample statistic) or are we supposed to take multiple samples and compute their mean? 🤔
 
  • #11
Agent Smith said:
A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.
Good question. That shows how important it is to pick the right null and alternative hypotheses. That choice of ##H_0## and ##H_1## is bad. They should be swapped. The null hypothesis should be the the choice which will be assumed unless the test and data prove otherwise (to a skeptical audience). Therefore, the pool should be assumed to be unsafe until the test and data prove that it is safe. The higher confidence levels, like 99.5%, should require very strong evidence that the pool is safe. That will only work if the null hypothesis is that the pool is unsafe until very strong evidence indicates that the pool is safe (the alternate hypothesis). A type 1 error would then be to conclude that the pool is safe when it is really dangerous.

EDIT: The proper choice of null and alternative hypothesis will make Type 1 errors the errors that you really want/need to avoid (swimming in an unsafe pool). Type 2 errors are more like an opportunity lost (not swimming in a pool that is actually safe).
 
Last edited:
  • Like
Likes Hornbein
  • #12
Agent Smith said:
Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking 1 sample ( a 1 sample statistic) or are we supposed to take multiple samples and compute their mean?
To discuss illustrative examples and learn from them they should be realistic since the devil is in the details. Some are not statistical per se, like your pool scenario since it involves a single observation. How is the safe limit measured? Parts per million typically means the concentration of the mass of a substance. For example, the limit of Arsenic in drinking water is 10 parts per billion or 10ug/ L. Of concern is the accuracy of measuring such a small amount of material. One way is to precipitate and weigh the soluble arsenic from the solution. Any instrument has an inherent uncertainty in its reading which is often constant which may be due to the calibration. Other uncertainties can be due to preparing the sample or the use of the equipment like reading a meter. If your weighing uncertainty is fixed at ±1 ug and you need to measure 10 ug/L then you need to take a larger sample to get a more precise reading. If your sampling method has some inherent variations then it becomes statistical since any sample depends on how you take the sample. Now you need to take a number of samples and average to get a mean and determine the sampling uncertainty. This example is like measuring a length with a ruler where you need to place and read a scale.

On the other hand, if you are determining the amount of Radium in drinking water the safe limit is 0.185 disintegrations per sec per liter. Here you count the decays of Radium to ensure that they do not exceed this to within some acceptable limit. One would usually count for a length of time so that the decay rate can be determined sufficiently accurately. The standard deviation for N radioactive decays is the square root of the number of decays. 0.185dps = 666 ±26 d/h. To ensure that you do not exceed the limit, your measurement plus all uncertainties should not exceed the regulatory limit by some acceptable degree. For this example, your measurement N+ 2 (total standard deviation) ≤ 666 d/hr using the 5% level of significance.

Determining the number of bacteria per volume of water would probably be more of a statistical problem.
 
  • #13
@gleem as I notified you in the previous post, this is a simplified version of statistics meant to introduce someone to the basics of statistics. Some assumptions regarding prelim conditions to draw a statistical inference have been made.

I wonder how sample size affects the probability of type II errors? 🤔
 
  • #14
Regarding the Type II error and sample size let me give it a try. Type II errors occur when the H0 is wrongly accepted; a false positive. When you compare two samples (sets of observations) you are trying to determine if the data likely comes from the same parent population. Your sample provides you with one possible value of the mean and standard deviation. You compare their mean separation to a certain number (m) of standard deviations (SD) depending on how much risk you are willing to take for being wrong. If they are closer than m SDs you conclude that the chance you are wrong is small and that they are from the same parent distribution. The trouble starts when the difference is nearly m SD apart.

The question is how close are these estimates to the actual mean and SD of the population from which the samples were taken? Repeating the sample will provide you with different values of the mean and SD.
Any variation in the size of the SD or the mean affects the chance that the two studies are from the same parent distribution. The variation in the SD which follows a Chi-Square distribution is not a usual topic of basic statistics but the variation of the mean which follows a normal distribution is. This is where the standard error (SE) of the mean comes in. It is the estimate of the standard deviation of the mean value and is related to the standard deviation of the sample given by
$$ SE\: = \frac{SD}{\sqrt{N}}$$
where N is the number of observations in the sample. The variation of the SD is of the order of that of the SE. The larger the N, the closer the measured mean is to the true mean and the more trustworthy the limit of the SD becomes.

The following example. A value of a certain variable is reported to be 79.1. You perform an experiment with 50 observations and get 72.4±3.8 with a SE of±0.54. So is the value of the reported variable likely to be a member of your population as described by your data? You compare the means using the mSD as a yardstick. If the difference between the two means ( in this case 6.7) is less than the yardstick then you conclude it could be a member at least within the level of risk you have assumed. You take 2SD ( =7.6) as your yardstick and find that the difference is less than the yardstick so it is possible to be within the level of risk you have taken the reported value and your value are consistent.

Another group repeats your experiment and gets a result of 71.8±3.6 and concludes that their results are not consistent with the reported data.

A third group wants to resolve this discrepancy and repeats the experiment with a larger sample of 200 observations (considered to be of a higher power). They expect that the range of variation of the value of the mean and the range of the SD will decrease. Their results give a mean value of 71.9±3.4 with a SE of ±0.24. The difference between the reported values is now 7.2 compared to their yardstick of 6.8. They conclude that the reported value is not a value consistent with their data.

A higher number of observations (high power) reduces the possibility of making a type II error. The power of a study is defined as the probability of not making a type II error I.e. Power = 1 - β. Power is not a subject of introductory statistics since there is so much more to learn first.
 
  • #15
Agent Smith said:
Here is an example question. Maybe you can use it to clarify my doubts.

A swimming pool is regularly checked for contaminants. If contaminant concentration ##\geq 400## ppm, the pool is unfit for swimming and isclosed for decontamination.
The Null Hypothesis ##H_0##: The contaminant concentration ##< 400## ppm
The Alternative Hypothesis ##H_a##: The contaminant concentration ##\geq 400## ppm

I take a sample of the pool's water and measure contaminant concentration.

A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.

Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking ##1## sample ( a ##1## sample statistic) or are we supposed to take multiple samples and compute their mean? 🤔
Short comment: there should never be any statement of equality in your alternative hypothesis, only strict inequality.
 
  • Like
Likes Agent Smith
  • #16
Agent Smith said:
Here is an example question. Maybe you can use it to clarify my doubts.

A swimming pool is regularly checked for contaminants. If contaminant concentration ##\geq 400## ppm, the pool is unfit for swimming and isclosed for decontamination.
The Null Hypothesis ##H_0##: The contaminant concentration ##< 400## ppm
The Alternative Hypothesis ##H_a##: The contaminant concentration ##\geq 400## ppm

I take a sample of the pool's water and measure contaminant concentration.

A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.

Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking ##1## sample ( a ##1## sample statistic) or are we supposed to take multiple samples and compute their mean? 🤔
"A type II error would mean (the null hypothesis) is false"

No, it doesn't. Just as we never talk about "accepting" either hypothesis, remember that

- rejecting the null does not prove the null value is false, and does not show the alternative value is true
- failing to reject the null, does not prove the null value is true and the alternative value is false

The results of classical testing should be taken as INDICATORS, given the data at hand, which value is more likely to hold. It's always dangerous to make decisions based only on the often misinterpreted p-value: some measure(s) of variation of the response should be discussed as well: in classical methods this is most often a standard error, better to be a confidence interval (confidence intervals and tests are inverse procedures and always give the same indication: the benefit of an estimate is the included display of variability.)
 
  • Like
Likes Agent Smith
  • #17
@statdad So how should I word it?

The null hypothesis is either rejectable or not rejectable.

A type II statistical error occurs when the null is rejectable but is not rejected?

The lessons I took does stress on the conclusion we can draw from ##\text{P-value} > \alpha##. We do not accept the null, we simply fail to reject it. It's a very complicated concept. Can you refer me to a source I can consult? Gracias.
 
  • #18
Agent Smith said:
A type II statistical error occurs when the null is rejectable but is not rejected?
correct
 
  • #20
Agent Smith said:
@statdad So how should I word it?

The null hypothesis is either rejectable or not rejectable.

A type II statistical error occurs when the null is rejectable but is not rejected?

The lessons I took does stress on the conclusion we can draw from ##\text{P-value} > \alpha##. We do not accept the null, we simply fail to reject it. It's a very complicated concept. Can you refer me to a source I can consult? Gracias.
We completely avoid saying that we have shown anything to be true. Instead we say we have failed to show strong evidence that the null hypothesis is false.
 
  • #21
@gleem
As far as I know, Type I Error is rejecting a true null hypothesis and Type II Error is failing to reject a false null.
 
  • #22
Agent Smith said:
@gleem
As far as I know, Type I Error is rejecting a true null hypothesis and Type II Error is failing to reject a false null.
No, we never conclude that the null hypothesis is true. That's why we have all that weird "failed to reject" dialogue.
 
  • #23
Hornbein said:
No, we never conclude that the null hypothesis is true. That's why we have all that weird "failed to reject" dialogue.
So, a null hypothesis is neither true nor false.
We can either reject it or fail to reject it 🤔

What exactly is the terminology we use here.

We're proffering ##2## hypotheses: The Null ##H_0## and The Alternative ##H_a##.

Then we compute the P-value. If ##\text{P-value} \leq \alpha##, we reject ##H_0## and accept ##H_a##. If ##\text{P-value} > \alpha## we fail to reject ##H_0##

Perhaps I should've said ...
Type I Error: Rejecting ##H_0## when ##H_0## shouldn't be rejected.
Type II Error: Failing to reject ##H_0## when ##H_0## should be rejected.
 
  • #24
Hornbein said:
No, we never conclude that the null hypothesis is true. That's why we have all that weird "failed to reject" dialogue.
So, a null hypothesis is neither true nor false.
We can either reject it or fail to reject it 🤔

What exactly is the terminology we use here.

We're proffering ##2## hypotheses: The Null ##H_0## and The Alternative ##H_a##.

Then we compute the P-value. If ##\text{P-value} \leq \alpha##, we reject ##H_0## and accept ##H_a##. If ##\text{P-value} > \alpha## we fail to reject ##H_0##

Perhaps I should've said
Type I Error: Rejecting ##H_0## when ##H_0## shouldn't be rejected. ##\text{P-value} \geq \alpha##
Type II Error: Failing to reject ##H_0## when ##H_0## should be rejected. ##\text{P-value} < \alpha##
 
  • #25
The important take-away is that a Type I Error is something that you really do not want to make. It would be dangerous, embarrassing, or otherwise bad. The null hypothesis is given every benefit of the doubt and will not be rejected unless you have convincing evidence that it is unlikely. You assume that the null hypothesis is true by using its probability parameter values and setting a high (95%, 97.5%, etc.) standard for rejecting it. In extreme cases, like ##H_1=##claiming a new particle in particle physics versus ##H_0=## not enough evidence of a new particle, the evidence must be 5##\sigma## to reject the null hypothesis.

A Type II Error is just an opportunity lost or delayed. It is less consequential and means that you just need more evidence if you want to reject the null hypothesis.
 
Last edited:
  • #26
FactChecker said:
A Type II Error is just an opportunity lost or delayed. It is less consequential and means that you just need more evidence if you want to reject the null hypothesis.
Less consequential? In a drug trial, you would lose an opportunity to advance a new treatment. How do you know you made a type II error? Yes it certainly would be a lost opportunity.

For a type I error you would waste time/resources but eventually discover your error.
 
  • #27
Those are good points, but as a consumer I look at it from the other direction. IMO, you have it reversed.

gleem said:
Less consequential? In a drug trial, you would lose an opportunity to advance a new treatment. How do you know you made a type II error? Yes it certainly would be a lost opportunity.
For a Type II error, a drug company can collect more data to reach the required significance of the result to show that it is safe and effective. If a drug company is not willing to do that, then it might not be so important. Or the drug may not be good enough.
gleem said:
For a type I error you would waste time/resources but eventually discover your error.
For a drug company, with Type I error the drug may kill people.
 
Last edited:
  • #28
FactChecker said:
For a drug company, with Type I error the drug may kill people
Drugs are usually tested for toxicity first. If you make a type I error you reject the drug. If it passes it goes to further studies to determine dose and effectiveness if you make a type I error and wrongly reject the null hypothesis you are most likely seeing a marginal benefit so why market it?
FactChecker said:
For a Type II error, a drug company can collect more data to reach the required significance of the result to show that it is safe and effective. If a drug company is not willing to do that, then it might not be so important. Or the drug may not be good enough.
You do not know that you made an error that's why it is an error. This error might be more dangerous since it can pass through a dangerous drug albeit at a low level but this is not likely since one serious reaction should obviate the reevaluation of the study.

Whoever is involved in a study must know the effects of their not making he correct assumptions of the risk that they are willing to take if they are wrong.
 
  • #29
gleem said:
Drugs are usually tested for toxicity first. If you make a type I error you reject the drug.
That sounds backward. Type I Error: Reject the null hypothesis erroneously.
If the null hypothesis is that the drug is nontoxic, then it will be assumed to be nontoxic unless there is strong evidence at 95% or higher that it is toxic. That makes it easy for a toxic drug to remain assumed to be nontoxic. And a stricter test level like 99% would make it even easier for a toxic drug to pass the test.

Wouldn't you want it the other way? The sample should show at a 95% confidence level that the drug is not toxic for it to pass the test. And the stricter the test level is the more proof would be required for the drug to be assumed nontoxic.
 
Last edited:
  • Like
Likes Hornbein
  • #30
FactChecker said:
That sounds backward. Type I Error: Reject the null hypothesis erroneously.
The H0 is the drug is not toxic. If the drug is truly not toxic and you reject H0 there is no direct harm a type I error. if the drug is toxic and you accept H0 then you make a type II error.
 
  • #31
gleem said:
The H0 is the drug is not toxic. If the drug is truly not toxic and you reject H0 there is no direct harm a type I error. if the drug is toxic and you accept H0 then you make a type II error.
Suppose the null hypothesis is that the drug is not toxic.
Requiring a test level significance of 0.05 (95%) makes it hard to reject the assumption that the drug is not toxic. A significance of 0.01 (99%) requires much more substantial evidence that the drug is not toxic.
That would allow a great many toxic drugs to be released on the public.
That is the opposite of what you want.
The test should be requiring substantial evidence that the drug is not toxic. It should not be assuming it is not toxic unless it is convincingly proven otherwise.
 
  • Like
Likes Hornbein
  • #32
It's possible to have a null hypothesis that s drug is toxic.
 
  • Like
  • Haha
Likes FactChecker and Agent Smith
  • #33
  • #34
Hornbein said:
It's possible to have a null hypothesis that s drug is toxic.
Yes, That is how it is (always?) done. For instance, the expiration date on a drug is set by the length of time that it has been proven by tests to still be safe and effective. It may be effective much longer, but the tests are expensive and are stopped after a reasonable time. So they have not been proven to be effective past that date. The drugs are assumed (null hypothesis) to be expired past that date.

EDIT: I think of the null hypothesis as the assumption that will be made if no testing is done. A drug will not be approved for use without any testing to prove it is safe; it will be assumed to be dangerous. A new nuclear particle will not be assumed to exist without any testing to prove it; it will be assumed to not exist.
 
Last edited:
  • #35
Is there any difference in assuming something is or isn't safe? If you compare it to a safe population the process is the same. in one case you look for evidence that it can be a member of the safe population and there is assumed to be safe and in the other you look for evidence that it is not a member of the safe population and therefore is assumed to be unsafe.
 

Similar threads

Replies
6
Views
2K
Replies
9
Views
3K
Replies
7
Views
1K
Replies
5
Views
1K
Replies
30
Views
622
Replies
26
Views
3K
Replies
9
Views
3K
Back
Top