B Statistical Errors, Type I and Type II

Agent Smith
Messages
345
Reaction score
36
TL;DR Summary
what do statistical errors, Type I and Type II, depend on?
Reached Hypothesis testing in my statistics notes (high school level).

It reads ...

1. Type I Error: Rejecting the null (hypothesis), ##H_0##, when ##H_0## is true. The risk of a Type I error can be reduced by lowering the significance level ##\alpha##. The downside is this increases the risk of a Type II error.

2. Type II Error: Failing to reject the ##H_0## when ##H_0## is false. The risk of a Type II error can be reduced by
a. Raising the significance level ##\alpha##. The downside, this increases the risk of a Type I error
b. Taking a larger sample (sample size)
c. It would be convenient if there's less variation in the parent population.
d. I think I'm forgetting something here ...

I would like to know what the justifications for 2b, 2c are and is there a 2d?

Gracias, muchas
 
Physics news on Phys.org
The effect of raising or lowering the confidence level (significance level?) the you state seems backward to me. A very high confidence level, say 5-sigma, makes it very difficult to reject the null hypothesis, ##H_0##.
 
Increasing the sample size will give you a better estimate of the standard deviation and the mean.

The population you use for your study must be chosen carefully and represent the population the study is about. If the population has too many confounding factors that introduce greater variability it will be harder to distinguish your study from the control group.

What is your reason for suggesting 2d?
 
FactChecker said:
The effect of raising or lowering the confidence level (significance level?) the you state seems backward to me. A very high confidence level, say 5-sigma, makes it very difficult to reject the null hypothesis, ##H_0##.
I agree with what you say and that's what lowering ##\alpha## means I think. It would reduce the risk of Type I errors.

gleem said:
Increasing the sample size will give you a better estimate of the standard deviation and the mean.

The population you use for your study must be chosen carefully and represent the population the study is about. If the population has too many confounding factors that introduce greater variability it will be harder to distinguish your study from the control group.

What is your reason for suggesting 2d?
Yes I kinda know that increasing sample size reduces variation, but how does that lead to reduced risk of type II errors?
 
Agent Smith said:
Yes I kinda know that increasing sample size reduces variation, but how does that lead to reduced risk of type II errors?
The null hypothesis population mean and the alternative hypothesis population mean must be different. Then, when the sample gets large enough, the separation of those two means will be a greater number of standard deviations of the sample mean. That makes it less likely that the sample mean will be near the mean of the wrong hypothesis. A large sample size reduces the probabilities of either type of error.
 
  • Like
  • Skeptical
Likes Agent Smith and Hornbein
Agent Smith said:
TL;DR Summary: what do statistical errors, Type I and Type II, depend on?

Reached Hypothesis testing in my statistics notes (high school level).

It reads ...

1. Type I Error: Rejecting the null (hypothesis), ##H_0##, when ##H_0## is true. The risk of a Type I error can be reduced by lowering the significance level ##\alpha##. The downside is this increases the risk of a Type II error.

2. Type II Error: Failing to reject the ##H_0## when ##H_0## is false. The risk of a Type II error can be reduced by
a. Raising the significance level ##\alpha##. The downside, this increases the risk of a Type I error
b. Taking a larger sample (sample size)
c. It would be convenient if there's less variation in the parent population.
d. I think I'm forgetting something here ...

I would like to know what the justifications for 2b, 2c are and is there a 2d?

Gracias, muchas

FactChecker said:
The null hypothesis population mean and the alternative hypothesis population mean must be different. Then, when the sample gets large enough, the separation of those two means will be a greater number of standard deviations of the sample mean. That makes it less likely that the sample mean will be near the mean of the wrong hypothesis. A large sample size reduces the probabilities of either type of error.
I'll quibble that it would be better to say

That makes it less likely that the sample mean will be near the mean of the wrong hypothesis due to sampling error.

The same holds if there's less variation in the parent population. Sampling error will be less.
 
  • Like
Likes FactChecker
FactChecker said:
The null hypothesis population mean and the alternative hypothesis population mean must be different. Then, when the sample gets large enough, the separation of those two means will be a greater number of standard deviations of the sample mean. That makes it less likely that the sample mean will be near the mean of the wrong hypothesis. A large sample size reduces the probabilities of either type of error.
So we have ##2## populations??? It's confusing, do you have a link I can read? Gracias
 
Agent Smith said:
So we have ##2## populations??? It's confusing, do you have a link I can read? Gracias
We have a hypothetical population mean of .99. We also have the true population mean, which is unknown.
 
  • Like
Likes Agent Smith
Agent Smith said:
So we have ##2## populations??? It's confusing, do you have a link I can read? Gracias
You have the known theoretical mean, ##m_0##, of the null hypothesis, ##H_0##. You also have the theoretical mean, ##m_1##, of an alternative hypothesis, ##H_1##, at some distance, ##d=|m_0-m_1]##. Finally, you have the sample mean, ##m_s##. You don't need to worry about ##m_1## being undetermined because all calculations will use the distribution of the null hypothesis. As the sample size increases, the confidence regions around ##m_0## and ##m_1## decrease. If the null hypothesis is correct the possibility of ##m_s## being near ##m_1## decreases.
 
Last edited:
  • Skeptical
Likes Agent Smith
  • #10
FactChecker said:
You have the known theoretical mean, ##m_0##, of the null hypothesis, ##H_0##. You also have the theoretical mean, ##m_1##, of an alternative hypothesis, ##H_1##, at some distance, ##d=|m_0-m_1]##. Finally, you have the sample mean, ##m_s##. You don't need to worry about ##m_1## being undetermined because all calculations will use the distribution of the null hypothesis. As the sample size increases, the confidence regions around ##m_0## and ##m_1## decrease. If the null hypothesis is correct the possibility of ##m_s## being near ##m_1## decreases.
Here is an example question. Maybe you can use it to clarify my doubts.

A swimming pool is regularly checked for contaminants. If contaminant concentration ##\geq 400## ppm, the pool is unfit for swimming and isclosed for decontamination.
The Null Hypothesis ##H_0##: The contaminant concentration ##< 400## ppm
The Alternative Hypothesis ##H_a##: The contaminant concentration ##\geq 400## ppm

I take a sample of the pool's water and measure contaminant concentration.

A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.

Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking ##1## sample ( a ##1## sample statistic) or are we supposed to take multiple samples and compute their mean? 🤔
 
  • #11
Agent Smith said:
A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.
Good question. That shows how important it is to pick the right null and alternative hypotheses. That choice of ##H_0## and ##H_1## is bad. They should be swapped. The null hypothesis should be the the choice which will be assumed unless the test and data prove otherwise (to a skeptical audience). Therefore, the pool should be assumed to be unsafe until the test and data prove that it is safe. The higher confidence levels, like 99.5%, should require very strong evidence that the pool is safe. That will only work if the null hypothesis is that the pool is unsafe until very strong evidence indicates that the pool is safe (the alternate hypothesis). A type 1 error would then be to conclude that the pool is safe when it is really dangerous.

EDIT: The proper choice of null and alternative hypothesis will make Type 1 errors the errors that you really want/need to avoid (swimming in an unsafe pool). Type 2 errors are more like an opportunity lost (not swimming in a pool that is actually safe).
 
Last edited:
  • #12
Agent Smith said:
Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking 1 sample ( a 1 sample statistic) or are we supposed to take multiple samples and compute their mean?
To discuss illustrative examples and learn from them they should be realistic since the devil is in the details. Some are not statistical per se, like your pool scenario since it involves a single observation. How is the safe limit measured? Parts per million typically means the concentration of the mass of a substance. For example, the limit of Arsenic in drinking water is 10 parts per billion or 10ug/ L. Of concern is the accuracy of measuring such a small amount of material. One way is to precipitate and weigh the soluble arsenic from the solution. Any instrument has an inherent uncertainty in its reading which is often constant which may be due to the calibration. Other uncertainties can be due to preparing the sample or the use of the equipment like reading a meter. If your weighing uncertainty is fixed at ±1 ug and you need to measure 10 ug/L then you need to take a larger sample to get a more precise reading. If your sampling method has some inherent variations then it becomes statistical since any sample depends on how you take the sample. Now you need to take a number of samples and average to get a mean and determine the sampling uncertainty. This example is like measuring a length with a ruler where you need to place and read a scale.

On the other hand, if you are determining the amount of Radium in drinking water the safe limit is 0.185 disintegrations per sec per liter. Here you count the decays of Radium to ensure that they do not exceed this to within some acceptable limit. One would usually count for a length of time so that the decay rate can be determined sufficiently accurately. The standard deviation for N radioactive decays is the square root of the number of decays. 0.185dps = 666 ±26 d/h. To ensure that you do not exceed the limit, your measurement plus all uncertainties should not exceed the regulatory limit by some acceptable degree. For this example, your measurement N+ 2 (total standard deviation) ≤ 666 d/hr using the 5% level of significance.

Determining the number of bacteria per volume of water would probably be more of a statistical problem.
 
  • #13
@gleem as I notified you in the previous post, this is a simplified version of statistics meant to introduce someone to the basics of statistics. Some assumptions regarding prelim conditions to draw a statistical inference have been made.

I wonder how sample size affects the probability of type II errors? 🤔
 
  • #14
Regarding the Type II error and sample size let me give it a try. Type II errors occur when the H0 is wrongly accepted; a false positive. When you compare two samples (sets of observations) you are trying to determine if the data likely comes from the same parent population. Your sample provides you with one possible value of the mean and standard deviation. You compare their mean separation to a certain number (m) of standard deviations (SD) depending on how much risk you are willing to take for being wrong. If they are closer than m SDs you conclude that the chance you are wrong is small and that they are from the same parent distribution. The trouble starts when the difference is nearly m SD apart.

The question is how close are these estimates to the actual mean and SD of the population from which the samples were taken? Repeating the sample will provide you with different values of the mean and SD.
Any variation in the size of the SD or the mean affects the chance that the two studies are from the same parent distribution. The variation in the SD which follows a Chi-Square distribution is not a usual topic of basic statistics but the variation of the mean which follows a normal distribution is. This is where the standard error (SE) of the mean comes in. It is the estimate of the standard deviation of the mean value and is related to the standard deviation of the sample given by
$$ SE\: = \frac{SD}{\sqrt{N}}$$
where N is the number of observations in the sample. The variation of the SD is of the order of that of the SE. The larger the N, the closer the measured mean is to the true mean and the more trustworthy the limit of the SD becomes.

The following example. A value of a certain variable is reported to be 79.1. You perform an experiment with 50 observations and get 72.4±3.8 with a SE of±0.54. So is the value of the reported variable likely to be a member of your population as described by your data? You compare the means using the mSD as a yardstick. If the difference between the two means ( in this case 6.7) is less than the yardstick then you conclude it could be a member at least within the level of risk you have assumed. You take 2SD ( =7.6) as your yardstick and find that the difference is less than the yardstick so it is possible to be within the level of risk you have taken the reported value and your value are consistent.

Another group repeats your experiment and gets a result of 71.8±3.6 and concludes that their results are not consistent with the reported data.

A third group wants to resolve this discrepancy and repeats the experiment with a larger sample of 200 observations (considered to be of a higher power). They expect that the range of variation of the value of the mean and the range of the SD will decrease. Their results give a mean value of 71.9±3.4 with a SE of ±0.24. The difference between the reported values is now 7.2 compared to their yardstick of 6.8. They conclude that the reported value is not a value consistent with their data.

A higher number of observations (high power) reduces the possibility of making a type II error. The power of a study is defined as the probability of not making a type II error I.e. Power = 1 - β. Power is not a subject of introductory statistics since there is so much more to learn first.
 
  • #15
Agent Smith said:
Here is an example question. Maybe you can use it to clarify my doubts.

A swimming pool is regularly checked for contaminants. If contaminant concentration ##\geq 400## ppm, the pool is unfit for swimming and isclosed for decontamination.
The Null Hypothesis ##H_0##: The contaminant concentration ##< 400## ppm
The Alternative Hypothesis ##H_a##: The contaminant concentration ##\geq 400## ppm

I take a sample of the pool's water and measure contaminant concentration.

A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.

Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking ##1## sample ( a ##1## sample statistic) or are we supposed to take multiple samples and compute their mean? 🤔
Short comment: there should never be any statement of equality in your alternative hypothesis, only strict inequality.
 
  • Like
Likes Agent Smith
  • #16
Agent Smith said:
Here is an example question. Maybe you can use it to clarify my doubts.

A swimming pool is regularly checked for contaminants. If contaminant concentration ##\geq 400## ppm, the pool is unfit for swimming and isclosed for decontamination.
The Null Hypothesis ##H_0##: The contaminant concentration ##< 400## ppm
The Alternative Hypothesis ##H_a##: The contaminant concentration ##\geq 400## ppm

I take a sample of the pool's water and measure contaminant concentration.

A type II error would mean ##H_0## is false, but I fail to reject ##H_0##, a dangerous situation because the pool water is not fit for swimming but I let people swim in the pool. So we have to reduce the risk of a type II error. I can do that by raising the significance level ##\alpha##.

Also, my notes say that I can reduce the risk of a type II error by increasing my sample size. What does this mean in this context? I take a larger volume of pool water for my sample? Are we counting the number of particles (since we're measuring contaminants as ppm) and hence volume is a surrogate for number of particles in my sample? Are we taking ##1## sample ( a ##1## sample statistic) or are we supposed to take multiple samples and compute their mean? 🤔
"A type II error would mean (the null hypothesis) is false"

No, it doesn't. Just as we never talk about "accepting" either hypothesis, remember that

- rejecting the null does not prove the null value is false, and does not show the alternative value is true
- failing to reject the null, does not prove the null value is true and the alternative value is false

The results of classical testing should be taken as INDICATORS, given the data at hand, which value is more likely to hold. It's always dangerous to make decisions based only on the often misinterpreted p-value: some measure(s) of variation of the response should be discussed as well: in classical methods this is most often a standard error, better to be a confidence interval (confidence intervals and tests are inverse procedures and always give the same indication: the benefit of an estimate is the included display of variability.)
 
  • Like
Likes Agent Smith
  • #17
@statdad So how should I word it?

The null hypothesis is either rejectable or not rejectable.

A type II statistical error occurs when the null is rejectable but is not rejected?

The lessons I took does stress on the conclusion we can draw from ##\text{P-value} > \alpha##. We do not accept the null, we simply fail to reject it. It's a very complicated concept. Can you refer me to a source I can consult? Gracias.
 
  • #18
Agent Smith said:
A type II statistical error occurs when the null is rejectable but is not rejected?
correct
 
  • #20
Agent Smith said:
@statdad So how should I word it?

The null hypothesis is either rejectable or not rejectable.

A type II statistical error occurs when the null is rejectable but is not rejected?

The lessons I took does stress on the conclusion we can draw from ##\text{P-value} > \alpha##. We do not accept the null, we simply fail to reject it. It's a very complicated concept. Can you refer me to a source I can consult? Gracias.
We completely avoid saying that we have shown anything to be true. Instead we say we have failed to show strong evidence that the null hypothesis is false.
 
  • #21
@gleem
As far as I know, Type I Error is rejecting a true null hypothesis and Type II Error is failing to reject a false null.
 
  • #22
Agent Smith said:
@gleem
As far as I know, Type I Error is rejecting a true null hypothesis and Type II Error is failing to reject a false null.
No, we never conclude that the null hypothesis is true. That's why we have all that weird "failed to reject" dialogue.
 
  • #23
Hornbein said:
No, we never conclude that the null hypothesis is true. That's why we have all that weird "failed to reject" dialogue.
So, a null hypothesis is neither true nor false.
We can either reject it or fail to reject it 🤔

What exactly is the terminology we use here.

We're proffering ##2## hypotheses: The Null ##H_0## and The Alternative ##H_a##.

Then we compute the P-value. If ##\text{P-value} \leq \alpha##, we reject ##H_0## and accept ##H_a##. If ##\text{P-value} > \alpha## we fail to reject ##H_0##

Perhaps I should've said ...
Type I Error: Rejecting ##H_0## when ##H_0## shouldn't be rejected.
Type II Error: Failing to reject ##H_0## when ##H_0## should be rejected.
 
  • #24
Hornbein said:
No, we never conclude that the null hypothesis is true. That's why we have all that weird "failed to reject" dialogue.
So, a null hypothesis is neither true nor false.
We can either reject it or fail to reject it 🤔

What exactly is the terminology we use here.

We're proffering ##2## hypotheses: The Null ##H_0## and The Alternative ##H_a##.

Then we compute the P-value. If ##\text{P-value} \leq \alpha##, we reject ##H_0## and accept ##H_a##. If ##\text{P-value} > \alpha## we fail to reject ##H_0##

Perhaps I should've said
Type I Error: Rejecting ##H_0## when ##H_0## shouldn't be rejected. ##\text{P-value} \geq \alpha##
Type II Error: Failing to reject ##H_0## when ##H_0## should be rejected. ##\text{P-value} < \alpha##
 
  • #25
The important take-away is that a Type I Error is something that you really do not want to make. It would be dangerous, embarrassing, or otherwise bad. The null hypothesis is given every benefit of the doubt and will not be rejected unless you have convincing evidence that it is unlikely. You assume that the null hypothesis is true by using its probability parameter values and setting a high (95%, 97.5%, etc.) standard for rejecting it. In extreme cases, like ##H_1=##claiming a new particle in particle physics versus ##H_0=## not enough evidence of a new particle, the evidence must be 5##\sigma## to reject the null hypothesis.

A Type II Error is just an opportunity lost or delayed. It is less consequential and means that you just need more evidence if you want to reject the null hypothesis.
 
Last edited:
  • #26
FactChecker said:
A Type II Error is just an opportunity lost or delayed. It is less consequential and means that you just need more evidence if you want to reject the null hypothesis.
Less consequential? In a drug trial, you would lose an opportunity to advance a new treatment. How do you know you made a type II error? Yes it certainly would be a lost opportunity.

For a type I error you would waste time/resources but eventually discover your error.
 
  • #27
Those are good points, but as a consumer I look at it from the other direction. IMO, you have it reversed.

gleem said:
Less consequential? In a drug trial, you would lose an opportunity to advance a new treatment. How do you know you made a type II error? Yes it certainly would be a lost opportunity.
For a Type II error, a drug company can collect more data to reach the required significance of the result to show that it is safe and effective. If a drug company is not willing to do that, then it might not be so important. Or the drug may not be good enough.
gleem said:
For a type I error you would waste time/resources but eventually discover your error.
For a drug company, with Type I error the drug may kill people.
 
Last edited:
  • #28
FactChecker said:
For a drug company, with Type I error the drug may kill people
Drugs are usually tested for toxicity first. If you make a type I error you reject the drug. If it passes it goes to further studies to determine dose and effectiveness if you make a type I error and wrongly reject the null hypothesis you are most likely seeing a marginal benefit so why market it?
FactChecker said:
For a Type II error, a drug company can collect more data to reach the required significance of the result to show that it is safe and effective. If a drug company is not willing to do that, then it might not be so important. Or the drug may not be good enough.
You do not know that you made an error that's why it is an error. This error might be more dangerous since it can pass through a dangerous drug albeit at a low level but this is not likely since one serious reaction should obviate the reevaluation of the study.

Whoever is involved in a study must know the effects of their not making he correct assumptions of the risk that they are willing to take if they are wrong.
 
  • #29
gleem said:
Drugs are usually tested for toxicity first. If you make a type I error you reject the drug.
That sounds backward. Type I Error: Reject the null hypothesis erroneously.
If the null hypothesis is that the drug is nontoxic, then it will be assumed to be nontoxic unless there is strong evidence at 95% or higher that it is toxic. That makes it easy for a toxic drug to remain assumed to be nontoxic. And a stricter test level like 99% would make it even easier for a toxic drug to pass the test.

Wouldn't you want it the other way? The sample should show at a 95% confidence level that the drug is not toxic for it to pass the test. And the stricter the test level is the more proof would be required for the drug to be assumed nontoxic.
 
Last edited:
  • #30
FactChecker said:
That sounds backward. Type I Error: Reject the null hypothesis erroneously.
The H0 is the drug is not toxic. If the drug is truly not toxic and you reject H0 there is no direct harm a type I error. if the drug is toxic and you accept H0 then you make a type II error.
 
  • #31
gleem said:
The H0 is the drug is not toxic. If the drug is truly not toxic and you reject H0 there is no direct harm a type I error. if the drug is toxic and you accept H0 then you make a type II error.
Suppose the null hypothesis is that the drug is not toxic.
Requiring a test level significance of 0.05 (95%) makes it hard to reject the assumption that the drug is not toxic. A significance of 0.01 (99%) requires much more substantial evidence that the drug is not toxic.
That would allow a great many toxic drugs to be released on the public.
That is the opposite of what you want.
The test should be requiring substantial evidence that the drug is not toxic. It should not be assuming it is not toxic unless it is convincingly proven otherwise.
 
  • #32
It's possible to have a null hypothesis that s drug is toxic.
 
  • Like
  • Haha
Likes FactChecker and Agent Smith
  • #33
  • #34
Hornbein said:
It's possible to have a null hypothesis that s drug is toxic.
Yes, That is how it is (always?) done. For instance, the expiration date on a drug is set by the length of time that it has been proven by tests to still be safe and effective. It may be effective much longer, but the tests are expensive and are stopped after a reasonable time. So they have not been proven to be effective past that date. The drugs are assumed (null hypothesis) to be expired past that date.

EDIT: I think of the null hypothesis as the assumption that will be made if no testing is done. A drug will not be approved for use without any testing to prove it is safe; it will be assumed to be dangerous. A new nuclear particle will not be assumed to exist without any testing to prove it; it will be assumed to not exist.
 
Last edited:
  • #35
Is there any difference in assuming something is or isn't safe? If you compare it to a safe population the process is the same. in one case you look for evidence that it can be a member of the safe population and there is assumed to be safe and in the other you look for evidence that it is not a member of the safe population and therefore is assumed to be unsafe.
 
  • #36
gleem said:
Is there any difference in assuming something is or isn't safe? If you compare it to a safe population the process is the same. in one case you look for evidence that it can be a member of the safe population and there is assumed to be safe and in the other you look for evidence that it is not a member of the safe population and therefore is assumed to be unsafe.
Good question. There is a big difference because you are giving the null hypothesis every advantage. You start by picking one hypothesis as the null hypothesis, giving it all the benefit of the doubt by using its distribution and parameters and saying that you will only change that assumption if there is strong test indications (over 95%, 99%, etc.) that it might be wrong.
In the case of testing the safety and effectiveness of a drug, they should assume that it is not safe or not effective and run tests that would convince even a skeptical audience that it is safe and effective. The burden of proof must be on the drug company to prove its drug is probably (95%, 99%, etc.) safe and effective. Otherwise, many unsafe and/or ineffective drugs would pass a minimal test and be approved for public use.
 
  • #37
We are talking about a procedure, I don't think, that is used by drug companies that the endpoint of toxicity studies is death or estimates of death but instead some noticeable change in a physiological characteristic that could be detrimental if excessive like anemias, constipation, vomiting, reduced liver or kidney function. Typically they will start at a dose believed to have no untoward effects and increase the dose until side effects occur. They decide on what they think are acceptable side effects at an effective dose. If they do a comparison between equivalent populations of those who take the drug and those who don't at a reasonable confidence level and see no difference then what? There are side effects some serious but there is a benefit from taking the drug, Sometimes the risk can result in death. The patient must make a choice under guidance from their physician.
 
  • Like
Likes FactChecker
  • #38
Good point. The actual decision process is complicated. My point is that, in the simplest terms, the drug company has the burden of proof to convince a skeptical audience that their drug has a net benefit. They can only do that if the original assumption, the null hypotheses, is that the drug is not beneficial and then show that the data results are strong enough to convince the skeptics otherwise.

On the other hand, if they start by assuming that the drug is beneficial and then set a very high standard (95%, 99%, etc.) to statistically indicate otherwise, they will not convince anyone.
 
Last edited:
  • Like
Likes Agent Smith and Hornbein
  • #39
Would I be correct to say that ##H_0## = The drug is not beneficial?
 
  • #40
Agent Smith said:
Would I be correct to say that ##H_0## = The drug is not beneficial?
Yes. The null hypothesis, ##H_0##, is the option that you are willing to assume if there is none or inadequate data to prove the alternative hypothesis, ##H_1##, because that is what you will statistically recommend in those cases. There is no level of confidence required for the null hypothesis.

In the case of a drug, if it is presented with no testing you would not recommend its use for the general public with no further testing. Instead, you would say that the drug company has the burden of proof and must conduct enough testing to indicate at a level of 95% (or higher) that it is safe and effective. That is, they must statistically support the alternative hypothesis, ##H_1##= "the drug is safe", at some level of confidence.
 
Last edited:
  • Like
Likes Agent Smith
  • #41
FactChecker said:
Good point. The actual decision process is complicated. My point is that, in the simplest terms, the drug company has the burden of proof to convince a skeptical audience that their drug has a net benefit. They can only do that if the original assumption, the null hypotheses, is that the drug is not beneficial and then show that the data results are strong enough to convince the skeptics otherwise.

On the other hand, if they start by assuming that the drug is beneficial and then set a very high standard (95%, 99%, etc.) to statistically indicate otherwise, they will not convince anyone.
The null hypothesis contains an equality statement for a given population parameter (high school level statistics)
 
  • #42
Agent Smith said:
So, a null hypothesis is neither true nor false.
We can either reject it or fail to reject it 🤔

What exactly is the terminology we use here.

We're proffering ##2## hypotheses: The Null ##H_0## and The Alternative ##H_a##.

Then we compute the P-value. If ##\text{P-value} \leq \alpha##, we reject ##H_0## and accept ##H_a##. If ##\text{P-value} > \alpha## we fail to reject ##H_0##

Perhaps I should've said
Type I Error: Rejecting ##H_0## when ##H_0## shouldn't be rejected. ##\text{P-value} \geq \alpha##
Type II Error: Failing to reject ##H_0## when ##H_0## should be rejected. ##\text{P-value} < \alpha##

Imagine you want to test H0: mu = 98.6F versus Ha: mu < 98.6F

where the quantity we're concerned with is body temp of a "normal healthy" adult.

Concerning your "neither true nor false" question, think about it this way: are we trying to say that the mean temperature is EXACTLY 98.6 degrees F? Certainly not -- we're saying that the true mean temp is close enough to that value to make it a very usable description, so is H0 true? Not with that interpretation. Is H0 false? Not in the sense that we want to know if 98.6 is a good usable reference value. The purpose of this hypothesis test is this: determining whether the true mean temp is close enough to 98.6 that we can continue to use it or whether it is enough smaller than 98.6 that we need to move on to a new value. Remember that hypothesis testing is ALWAYS about examining the evidence against H0, not the evidence it its favor.

The possible decisions of the test are:
- Reject H0: this means the data indicates to us that the true mean is noticeably smaller than 98.6. We might have a sample mean of 98.48, but after sample size and sample standard deviation are taken into account we decide that is not far enough from 98.6 to convince us Ha makes more sense than H0
- Do not reject H0 -- here the data indicates that the true mean is smaller than 98.6

Why not say "Accept H0" in the first case? Because the word "Accept" indicates we've proven H0 to be true.

A final comment: imagine how the US justice system is supposed to work -- the philosophy of it. Please don't comment on views of how people believe it actually works: I won't respond and it isn't appropriate.

To avoid language awkwardness I'll assume the verdict comes from a judge instead of a jury.

There are two possibilities about the defendant: The defendant is Guilty: DG, or the defendant is innocent: DI
One tenet of the US judicial system is to assume DNG. The goal of the prosecutor is to convince the judge, that DG is the correct assumption: in other words, the prosecutor has to convince the judge to reject the assumption of DNG

There are two possible verdicts: Guilty, G, or Not Guilty, NG. (Note that there is no such thing as a verdict of "Innocent".) Again, as noted above, to obtain a verdict of G the prosecutor has to convince the judge, beyond a reasonable doubt, to drop the assumption that the defendant is guilty. The prosector's job is not to present evidence showing the defendant is innocent, it's to present evidence of the defendant's guilt

There are four possibilities for what can happen at the end of the trial:

The verdict is G and the defendant is Guilty: thus G and DG is a correct outcome
The verdict is G and the defendant is Innocent: thus G and DI is an incorrect output. Since this corresponds to the prosecutor convincing the judge to reject a basic assumption when that assumption should not be rejected this can be considered a Type I Error
The verdict is NG and the defendant is innocent: here NG and DI is a correct outcome
The verdict is NG and the defendant is guilty: here NG and DG can be considered a Type II error: the judge should have rejected the assumption about the defendant but did not

In hypothesis testing:
- the null hypothesis corresponds to DI
-the alternative hypothesis corresponds to DG
- rejecting H0 corresponds to a verdict of guilty
- failing to reject H0 corresponds to a verdict of not guilty
 
  • Like
Likes FactChecker
  • #43
statdad said:
There are two possibilities about the defendant: The defendant is Guilty: DG, or the defendant is innocent: DI
One tenet of the US judicial system is to assume DNG. The goal of the prosecutor is to convince the judge, that DG is the correct assumption: in other words, the prosecutor has to convince the judge to reject the assumption of DNG
That's a good analogy. The common hypothesis testing is set up to require that the alternative hypothesis will only be accepted if it is proven "beyond a reasonable doubt" to some level (95%, 99%, 5 sigma, etc.)
So the first thing to ask is: What hypothesis really requires proof?
That should be the alternative hypothesis.
 
  • #44
FactChecker said:
Good point. The actual decision process is complicated. My point is that, in the simplest terms, the drug company has the burden of proof to convince a skeptical audience that their drug has a net benefit. They can only do that if the original assumption, the null hypotheses, is that the drug is not beneficial and then show that the data results are strong enough to convince the skeptics otherwise.

On the other hand, if they start by assuming that the drug is beneficial and then set a very high standard (95%, 99%, etc.) to statistically indicate otherwise, they will not convince anyone.
The null hypothesis contains an equality statement for a given population parameter.
 

Similar threads

Replies
5
Views
2K
Replies
6
Views
2K
Replies
20
Views
2K
Replies
9
Views
4K
Replies
7
Views
2K
Replies
5
Views
2K
Replies
10
Views
1K
Replies
30
Views
2K
Back
Top