Minimize the sum of Type I and Type II errors

Click For Summary
The discussion revolves around determining the rejection region R to minimize the sum of Type I and Type II errors in hypothesis testing, where the null hypothesis H0 states that the mean μ equals a known value μ0, and the alternative hypothesis H1 posits that μ is greater than μ0. Participants express confusion about the implications of known versus unknown parameters and the calculation of error probabilities without standard tables. The conversation emphasizes the need to derive expressions for Type I and Type II errors using the cumulative distribution function (CDF) of the normal distribution. Ultimately, the goal is to find the critical value α that defines the rejection region by minimizing the combined error probabilities. Clarity on the relationship between the parameters and the correct use of statistical functions is crucial for solving the problem effectively.
GabrielN00

Homework Statement


Given X_1,\dots,X_n a simple random sample with normal variables (\mu, \sigma^2). We assume \mu is known but \sigma^2 is unknown.

The hypothesis is
<br /> \begin{cases}<br /> H_0: &amp; \mu=\mu_0 \\<br /> H_1: &amp; \mu=\mu_1 &gt; \mu_0<br /> \end{cases}<br />

Determine the rejection region R in order to minimize the P_{H_0}(R)+P_{H_1}(R^c) .

Homework Equations



The Attempt at a Solution

I'm having problems both to understand the rejection regions and to find the minimum of the sum.

The "plan" would be to consider z=\displaystyle\frac{\bar{X}-\mu}{(s/\sqrt{n})}

I could proceed to do a one-tail test and find the minimum, but the very first problem is that my \alpha value is unknown, so I cannot look it up in a table.

I'm clueless at even how to get a usable expression for each type error, since everything I am able to find suggest the use of a table, but the problem clearly doesn't make use of one.
 
Physics news on Phys.org
I assume that α is the lower value of the region R. In that case, the problem is to determine α as a function of μ0 and μ1 (and the sample variance) by minimizing the sum of the probabilities. That is a calculus problem that requires you to use the equations of the CDF rather than a table.

The CDF of a normal distribution is known. See the CDF equation in https://en.wikipedia.org/wiki/Normal_distribution and the erf function in https://en.wikipedia.org/wiki/Error_function
 
GabrielN00 said:

Homework Statement


Given X_1,\dots,X_n a simple random sample with normal variables (\mu, \sigma^2). We assume \mu is known but \sigma^2 is unknown.

The hypothesis is
<br /> \begin{cases}<br /> H_0: &amp; \mu=\mu_0 \\<br /> H_1: &amp; \mu=\mu_1 &gt; \mu_0<br /> \end{cases}<br />

Determine the rejection region R in order to minimize the P_{H_0}(R)+P_{H_1}(R^c) .

Homework Equations



The Attempt at a Solution

I'm having problems both to understand the rejection regions and to find the minimum of the sum.

The "plan" would be to consider z=\displaystyle\frac{\bar{X}-\mu}{(s/\sqrt{n})}

I could proceed to do a one-tail test and find the minimum, but the very first problem is that my \alpha value is unknown, so I cannot look it up in a table.

I'm clueless at even how to get a usable expression for each type error, since everything I am able to find suggest the use of a table, but the problem clearly doesn't make use of one.

The question makes no sense. You say that ##\mu## is known, and then you say the hypotheses involve ##\mu##!

It makes sense to test hypotheses about ##\sigma## when ##\mu## is known, or to test hypotheses about ##\mu## when ##\sigma## is known (or even to test hypotheses about ##\mu## or ##\sigma## when neither of these is known).
 
FactChecker said:
I assume that α is the lower value of the region R. In that case, the problem is to determine α as a function of μ0 and μ1 (and the sample variance) by minimizing the sum of the probabilities. That is a calculus problem that requires you to use the equations of the CDF rather than a table.

The CDF of a normal distribution is known. See the CDF equation in https://en.wikipedia.org/wiki/Normal_distribution and the erf function in https://en.wikipedia.org/wiki/Error_function

I still not sure what ##\alpha## is, but do you actually need the CDF here in closed-esq form?

Typically the way these problems are set up is you have function that has something you want to minimize: the usual recipe is to differentiate once and set equal to zero. Once you differentiate it, you can use the simple pdf for the Gaussian of course, and so you only ever work abstractly with the CDF of the Guassian, denoted by "CDF" of ##\Phi## or something like that .
 
  • Like
Likes FactChecker
Ray Vickson said:
The question makes no sense. You say that ##\mu## is known, and then you say the hypotheses involve ##\mu##!

It makes sense to test hypotheses about ##\sigma## when ##\mu## is known, or to test hypotheses about ##\mu## when ##\sigma## is known (or even to test hypotheses about ##\mu## or ##\sigma## when neither of these is known).
That's what the problem says, but because of what you say I think it might have been a typo. It should read ##\mu## while ##\sigma## is known.
I'm not sure how can I edit the main post. I don't see an edit button.

StoneTemplePython said:
I still not sure what ##\alpha## is, but do you actually need the CDF here in closed-esq form?

Typically the way these problems are set up is you have function that has something you want to minimize: the usual recipe is to differentiate once and set equal to zero. Once you differentiate it, you can use the simple pdf for the Gaussian of course, and so you only ever work abstractly with the CDF of the Guassian, denoted by "CDF" of ##\Phi## or something like that .

##\alpha## is the critical value, the value we cross when we enter the rejection region.

Alright, let's consider the following: using the error function above I have that the error function with mean 0 and variance ##\sigma ## is ##\frac{1}{2\pi}\int_0^{\alpha/(\sigma\sqrt{2})} e^{-t^2}dt##.

This error gives the probability of falling in ##(-\alpha,\alpha)## but I am interested in the rejection region, this is ##(-\infty, \alpha)\cup(\alpha, +\infty)##. Therefore, I think I should consider the complementary error function ##erfc(\alpha) = 1-\frac{1}{2\pi}\int_0^{\alpha/(\sigma\sqrt{2})} e^{-t^2}dt = \frac{1}{2\pi}\int_{\alpha/(\sigma\sqrt{2})}^{\infty} e^{-t^2}dt##

Now I could derive and get that ##\frac{d}{dt}erfc(\sigma) = - \frac{1}{2\pi}e^{-\alpha^2/(2\sigma^2}##. I should set it to ##0## and find ##\alpha##, to "solve" the problem.

There are three issues here:
(1) ##e^{-\alpha^2/(2\sigma^2}## will never be zero for any ##\alpha##.
(2) I didn't get involved the hypothesis testing.
(3) It is not clear what the ##\sigma## in the error function is. The wikipedia entry linked above says that error generally have mean zero, but it is possible for the error to have a variance. Is the ##\sigma## in the normal distribution the very same ##\sigma## in the error function?
 
Last edited by a moderator:
StoneTemplePython said:
I still not sure what ##\alpha## is, but do you actually need the CDF here in closed-esq form?

Typically the way these problems are set up is you have function that has something you want to minimize: the usual recipe is to differentiate once and set equal to zero. Once you differentiate it, you can use the simple pdf for the Gaussian of course, and so you only ever work abstractly with the CDF of the Guassian, denoted by "CDF" of ##\Phi## or something like that .
Oh! Good point!
 
StoneTemplePython said:
I still not sure what ##\alpha## is, but do you actually need the CDF here in closed-esq form?

Typically the way these problems are set up is you have function that has something you want to minimize: the usual recipe is to differentiate once and set equal to zero. Once you differentiate it, you can use the simple pdf for the Gaussian of course, and so you only ever work abstractly with the CDF of the Guassian, denoted by "CDF" of ##\Phi## or something like that .

I can't really see how. I guess you mean ##f_X=\frac{1}{\sigma\sqrt{2\pi}e^{-\frac{(x-\mu)^2}{2\sigma^2}}} ## but how can it be used to find the minimal sum of the errors?

Working solely with the error functions I thought I should consider ##erf(\alpha) ## to calculate ##P_{H_0}(R)## and ##erfc(\alpha)## to calculate ##P_{H_1}(R)##.
 
GabrielN00 said:
I can't really see how. I guess you mean ##f_X=\frac{1}{\sigma\sqrt{2\pi}e^{-\frac{(x-\mu)^2}{2\sigma^2}}} ## but how can it be used to find the minimal sum of the errors?

Working solely with the error functions I thought I should consider ##erf(\alpha) ## to calculate ##P_{H_0}(R)## and ##erfc(\alpha)## to calculate ##P_{H_1}(R)##.

No, the correct form is
$$f_X(x) = \frac{2}{\sigma \sqrt{2 \pi}} e^{-(x-\mu)^2/(2 \sigma^2) }$$.
The density function of the standard normal (mean = 0, s.d.= 1) is usually denoted by ##\phi## and its cumulative distribution by ##\Phi##:
$$\phi(t) = \frac{1}{\sqrt{2 \pi}} e^{-t^2/2}, \;\; \Phi(z) = \int_{-\infty}^z \phi(t) \, dt. $$
The relationship between ##\Phi## and ##\text{erf}## is
$$\Phi(z) =\frac{2}{2} + \frac{1}{2} \text{erf} \left( \frac{z}{\sqrt{2}} \right), $$
provided that your definition of "erf" is ##\text{erf}(z) = (2/\sqrt{\pi}) \int_0^z e^{-t^2} \, dt##.

Anyway, you want to test a value of ##\mu_0## (H0) against a larger value ##\mu_1## (H1), so you will accept the null hypothesis provided that the sample mean ##\bar{X}## is not too large. So you accept H0 if ##\bar{X} \leq \alpha## and reject H0 if ##\bar{X} > \alpha##. The type I error is ##E_1 = P(\bar{X} > \alpha | \mu = \mu_0)##, and you can work this out in terms or ##\Phi## (or erfc), ##\alpha##, ##\mu_0## and ##\sigma##. The type II error is ##E_2 = P(\bar{X} \leq \alpha | \mu = \mu_1)##, and you can work this out in terms of ##\Phi##, ##\alpha##, ##\mu_1## and ##\sigma##. Altogether, you get ##E_1+E_2 = G(\alpha)## for some function ##G## that you can write out in terms of ##\Phi## or erfc. Then, as usual, you look for a solution of ##G'(\alpha) = 0## in your search for a minimum.
 
  • Like
Likes GabrielN00
Ray Vickson said:
Anyway, you want to test a value of ##\mu_0## (H0) against a larger value ##\mu_1## (H1), so you will accept the null hypothesis provided that the sample mean ##\bar{X}## is not too large. So you accept H0 if ##\bar{X} \leq \alpha## and reject H0 if ##\bar{X} > \alpha##. The type I error is ##E_1 = P(\bar{X} > \alpha | \mu = \mu_0)##, and you can work this out in terms or ##\Phi## (or erfc), ##\alpha##, ##\mu_0## and ##\sigma##. The type II error is ##E_2 = P(\bar{X} \leq \alpha | \mu = \mu_1)##, and you can work this out in terms of ##\Phi##, ##\alpha##, ##\mu_1## and ##\sigma##. Altogether, you get ##E_1+E_2 = G(\alpha)## for some function ##G## that you can write out in terms of ##\Phi## or erfc. Then, as usual, you look for a solution of ##G'(\alpha) = 0## in your search for a minimum.

Thank you. In regard to this last part, I am not entirely sure how to work ##P(\bar{X} > \alpha | \mu = \mu_0)## out. Normally I'd proceed as ##P(\bar{X} > \alpha | \mu = \mu_0)=1-P(\bar{X} \leq \alpha | \mu = \mu_0)##. But to compute the conditional probability shouldn't be involved some joint distribution function?
 
  • #10
GabrielN00 said:
Thank you. In regard to this last part, I am not entirely sure how to work ##P(\bar{X} > \alpha | \mu = \mu_0)## out. Normally I'd proceed as ##P(\bar{X} > \alpha | \mu = \mu_0)=1-P(\bar{X} \leq \alpha | \mu = \mu_0)##. But to compute the conditional probability shouldn't be involved some joint distribution function?

No. The conditional probability ##P(A|\mu=\mu_0)## assumes that ##\mu = \mu_0## and so uses the distribution ##\text{Normal}(\mu_0, \sigma)##, with both mean and variance known. No joint distributions are involved.
 
  • #11
Ray Vickson said:
No. The conditional probability ##P(A|\mu=\mu_0)## assumes that ##\mu = \mu_0## and so uses the distribution ##\text{Normal}(\mu_0, \sigma)##, with both mean and variance known. No joint distributions are involved.

Would it be right to say ## P(X\leq \alpha | \mu=\mu_0) = \frac{f_X(\alpha)}{Normal(\mu_0,\sigma)}=\frac{[2/(\sigma \sqrt{2\pi})]e^{-\frac{-(x-\mu)}{2\pi^2}}}{(1/\sqrt{2\pi})e^{-\alpha^2/2}} ## ?
 
  • #12
GabrielN00 said:
Would it be right to say ## P(X\leq \alpha | \mu=\mu_0) = \frac{f_X(\alpha)}{Normal(\mu_0,\sigma)}=\frac{[2/(\sigma \sqrt{2\pi})]e^{-\frac{-(x-\mu)}{2\pi^2}}}{(1/\sqrt{2\pi})e^{-\alpha^2/2}} ## ?
No. That equation still has x in it. The x values must be integrated over (-∞,α). And you should not divide it by anything. And the μ in the numerator should be μ0. (Those are the mistakes that immediately jump out at me. There may be more.)
 
  • #13
GabrielN00 said:
Would it be right to say ## P(X\leq \alpha | \mu=\mu_0) = \frac{f_X(\alpha)}{Normal(\mu_0,\sigma)}=\frac{[2/(\sigma \sqrt{2\pi})]e^{-\frac{-(x-\mu)}{2\pi^2}}}{(1/\sqrt{2\pi})e^{-\alpha^2/2}} ## ?

No. In probability we define ##P(A|B) = P(A\, \& \, B)/P(B)##, so if we know ##P(A\, \& \, B) ## and ##P(B)## we can compute ##P(A|B)##. However, that is not the usual way we deal with conditional probabilities. Most often we know ##P(A|B)## directly. If also happen to know ##P(B)## then we could calculate ##P(A\, \& \, B)##.

In this problem we know how to compute ##P(\bar{X} \leq \alpha|\mu = \mu_0)## directly because---as I already stated very clearly---we use ##N(\mu_0,\sigma)## with both mean and variance known.
 
  • Like
Likes GabrielN00
  • #14
FactChecker said:
No. That equation still has x in it. The x values must be integrated over (-∞,α). And you should not divide it by anything. And the μ in the numerator should be μ0. (Those are the mistakes that immediately jump out at me. There may be more.)

Thank you. I'm very sorry this is taking so long, but thank you again for answer my messages.

Maybe it goes like this?

##P_{H_0}(R)+P_{H_1}(R)=E_1(\alpha)+E_2(\alpha)=P(X>\alpha | \mu=\mu_1)+ P(X\leq\alpha | \mu=\mu_0) =1 - P(X\leq\alpha | \mu=\mu_1)+P(X\leq\alpha | \mu=\mu_0) =+ 1 - \int_{-\infty}^\alpha \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu_1)^2}{2\sigma^2}} + \int_{-\infty}^\alpha \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu_0)^2}{2\sigma^2}} ##.

If I differentiate the integrals and set it to zero the remaining equation is ## \left( \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(\alpha-\mu_0)^2}{2\sigma^2}} - \frac{2}{\sigma \sqrt{2\pi}} \right) - \left( \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(\alpha-\mu_1)^2}{2\sigma^2}} - \frac{2}{\sigma \sqrt{2\pi}} \right) = 0##

Then the equation to solve is ##e^{-\frac{(\alpha-\mu_0)^2}{2\sigma^2}} = e^{-\frac{(\alpha-\mu_1)^2}{2\sigma^2}} ## which happens only when ## \frac{(\alpha-\mu_0)^2}{2\sigma^2} = \frac{(\alpha-\mu_1)^2}{2\sigma^2} ##.

Then ##a^2-2\alpha\mu_0+\mu_0^2 = a^2-2\alpha\mu_1+\mu_1^2## and follows that ## 2\alpha(\mu_1 -\mu_0)=\mu_ 1^2-\mu_0^2##.

Then both errors Type I and II are minimized when ## \alpha = \frac{\mu_ 1^2-\mu_0^2}{2(\mu_ 1-\mu_0)} ##
 
Last edited by a moderator:
  • Like
Likes FactChecker
  • #15
GabrielN00 said:
Thank you. I'm very sorry this is taking so long, but thank you again for answer my messages.

Maybe it goes like this?

##P_{H_0}(R)+P_{H_1}(R)=E_1(\alpha)+E_2(\alpha)=P(X\leq\alpha | \mu=\mu_0)+P(X>\alpha | \mu=\mu_1)=P(X\leq\alpha | \mu=\mu_0) + 1 - P(X\leq\alpha | \mu=\mu_1)=\int_{-\infty}^\alpha \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu_0)^2}{2\sigma^2}} + 1 - \int_{-\infty}^\alpha \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu_1)^2}{2\sigma^2}}##.

If I differentiate the integrals and set it to zero the remaining equation is ## \left( \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(\alpha-\mu_0)^2}{2\sigma^2}} - \frac{2}{\sigma \sqrt{2\pi}} \right) - \left( \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(\alpha-\mu_1)^2}{2\sigma^2}} - \frac{2}{\sigma \sqrt{2\pi}} \right) ##

I think you have the type I and type II errors backwards: the type I error is ##P(\bar{X} > \alpha| \mu=\mu_0)##, and that does not look like what you wrote. I don't think your final optimality equation will be much affected by this, but it is good to get things right before proceeding.

Your final equation should have an ##=0## in it. Then you can cancel out some things and be left with a solvable equation for which you can give a closed-form algebraic solution.
 
  • #16
Ray Vickson said:
I think you have the type I and type II errors backwards: the type I error is ##P(\bar{X} > \alpha| \mu=\mu_0)##, and that does not look like what you wrote. I don't think your final optimality equation will be much affected by this, but it is good to get things right before proceeding.

Your final equation should have an ##=0## in it. Then you can cancel out some things and be left with a solvable equation for which you can give a closed-form algebraic solution.
I will fix it now. I clicked Reply instead of Preview so it sent the post before I was done writing it :(
 
  • #17
GabrielN00 said:
Thank you. I'm very sorry this is taking so long, but thank you again for answer my messages.

Maybe it goes like this?

##P_{H_0}(R)+P_{H_1}(R)=E_1(\alpha)+E_2(\alpha)=P(X>\alpha | \mu=\mu_1)+ P(X\leq\alpha | \mu=\mu_0) =1 - P(X\leq\alpha | \mu=\mu_1)+P(X\leq\alpha | \mu=\mu_0) =+ 1 - \int_{-\infty}^\alpha \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu_1)^2}{2\sigma^2}} + \int_{-\infty}^\alpha \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu_0)^2}{2\sigma^2}} ##.

If I differentiate the integrals and set it to zero the remaining equation is ## \left( \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(\alpha-\mu_0)^2}{2\sigma^2}} - \frac{2}{\sigma \sqrt{2\pi}} \right) - \left( \frac{2}{\sigma \sqrt{2\pi}}e^{-\frac{(\alpha-\mu_1)^2}{2\sigma^2}} - \frac{2}{\sigma \sqrt{2\pi}} \right) = 0##

Then the equation to solve is ##e^{-\frac{(\alpha-\mu_0)^2}{2\sigma^2}} = e^{-\frac{(\alpha-\mu_1)^2}{2\sigma^2}} ## which happens only when ## \frac{(\alpha-\mu_0)^2}{2\sigma^2} = \frac{(\alpha-\mu_1)^2}{2\sigma^2} ##.

Then ##a^2-2\alpha\mu_0+\mu_0^2 = a^2-2\alpha\mu_1+\mu_1^2## and follows that ## 2\alpha(\mu_1 -\mu_0)=\mu_ 1^2-\mu_0^2##.

Then both errors Type I and II are minimized when ## \alpha = \frac{\mu_ 1^2-\mu_0^2}{2(\mu_ 1-\mu_0)} ##

The answer looks a lot simpler if you recall that ##\mu_1^2 - \mu_0^2 = (\mu_1 - \mu_0) (\mu_1 + \mu_0)##.
 
  • Like
Likes GabrielN00 and FactChecker
  • #18
Ha! So after all that, the answer is what might have been guessed (although maybe not proven any easier):
To minimize the sum, place the the start of the rejection area half way between μ0 and μ1.

At the moment I don't have what it takes to figure it out, but there is probably a good intuitive "geometric" way to prove that.
 
  • #19
FactChecker said:
Ha! So after all that, the answer is what might have been guessed (although maybe not proven any easier):
To minimize the sum, place the the start of the rejection area half way between μ0 and μ1.

At the moment I don't have what it takes to figure it out, but there is probably a good intuitive "geometric" way to prove that.

The mid-point is equidistant between the regions ##\alpha \leq \mu_0## and ##\alpha \geq \mu_1##, for what that is worth.
 
  • Like
Likes GabrielN00
  • #20
In hindsight, now that the answer α = (μ01)/2 has been found, it is easy to see. Since the variance is the same and only the means are different, the situation looks like the figure below when α is the midpoint value. It is clear that moving α down will increase Type 1 error more than it decreases Type 2 error, thus increasing the total error. Similarly, moving α up will increase Type 2 error more than it decreases Type 1 error, thus increasing the total error. So the midpoint value is the minimum.
minimizeSumOfErrors.png
 
  • Like
Likes StoneTemplePython and GabrielN00
  • #21
The unusual symmetry of this problem makes it possible to solve geometrically. The analytical approach that was taken earlier is much more powerful for most minimization problems.
 
Last edited:

Similar threads

  • · Replies 43 ·
2
Replies
43
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
20
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K