MHB Verifying Potato Weight: Hypothesis Testing and Beta Error Analysis

  • Thread starter Thread starter mathmari
  • Start date Start date
  • Tags Tags
    Beta Error
mathmari
Gold Member
MHB
Messages
4,984
Reaction score
7
Hey! :o

A vegetable trader sells potatoes in 5 kg bags. Since potatoes differ in size, it is difficult to ensure that in each bag are exactly 5 kg. The dealer claims that the average weight of the potatoes was 5 kilograms. This has to be checked with a sample.
The sample (n = 25) gives an average value of 5.2 kg and an empirical standard deviation of 0.5 kg. By that sample can we say that that the average weight of the potatoes in the bag differs from 5 kg? The significance level is 0.05.We have the following:

Null hypothesis ($H_0$): The average weight of the potatoes is 5 kg.

Alternative hypothesis ($H_1$): The average weight of the potatoes is not 5 kg, but 5.2kg.

Is this correct? (Wondering)
The Beta error is the possibility that the null hypothesis is hold although it is false.

Then it s asked the following:

How big is the Beta error?
a) 0,25
b) 0,05
c) 0,95
d) 0,1
e) cannot be given I have done the following:

We have that the significance level is 0.05.

So, $P(Z < z) = 0.05$ and we get a z-value of $-1.645$.

We convert this to an X-value: $5-1.645\frac{0.5}{\sqrt{25}}=5-1.645\cdot 0.1=4.8355$

Then we have the follwoing: $P(X > 4.8355) = P\left [z > \frac{3.069-5.2}{0.1}\right ] = P(z > -3.645)=1-P(x\leq -3.645$ , now from a Normal table we get $beta = 1-P(x\leq -3.645 = 1-0.00014=0.99986$.

I must have done something wrong, since this answer is not one of the choices. Bur what? (Wondering)
 
Physics news on Phys.org
mathmari said:
Hey! :o

A vegetable trader sells potatoes in 5 kg bags. Since potatoes differ in size, it is difficult to ensure that in each bag are exactly 5 kg. The dealer claims that the average weight of the potatoes was 5 kilograms. This has to be checked with a sample.
The sample (n = 25) gives an average value of 5.2 kg and an empirical standard deviation of 0.5 kg. By that sample can we say that that the average weight of the potatoes in the bag differs from 5 kg? The significance level is 0.05.

We have the following:

Null hypothesis ($H_0$): The average weight of the potatoes is 5 kg.

Alternative hypothesis ($H_1$): The average weight of the potatoes is not 5 kg, but 5.2kg.

Is this correct? (Wondering)

Hey mathmari! ;)

A hypothesis is not supposed to include a sample measurement - only a statement about the population that we want to test.

mathmari said:
The Beta error is the possibility that the null hypothesis is hold although it is false.

Then it s asked the following:

How big is the Beta error?
a) 0,25
b) 0,05
c) 0,95
d) 0,1
e) cannot be given I have done the following:

We have that the significance level is 0.05.

So, $P(Z < z) = 0.05$ and we get a z-value of $-1.645$.

Since the alternative hypothesis is an inequality, shouldn't that be P(Z<z)<0.05 OR P(Z>z)<0.05? (Wondering)

Where or how did you get that z-value?

mathmari said:
We convert this to an X-value: $5-1.645\frac{0.5}{\sqrt{25}}=5-1.645\cdot 0.1=4.8355$

Then we have the follwoing: $P(X > 4.8355) = P\left [z > \frac{3.069-5.2}{0.1}\right ] = P(z > -3.645)=1-P(x\leq -3.645$ , now from a Normal table we get $beta = 1-P(x\leq -3.645 = 1-0.00014=0.99986$.

I must have done something wrong, since this answer is not one of the choices. Bur what? (Wondering)

To say anything about beta, we basically need to know what the real distribution of the population is, so that we can estimate whether the null hypothesis holds. That distribution is not given is it? (Wondering)
 
I like Serena said:
A hypothesis is not supposed to include a sample measurement - only a statement about the population that we want to test.

So, the hypothesis doesn't say anything about the average weight of the potatoes, does it? Is the null hypothesis then:
"The weight of one single bag is equal to the average weight. So, the weight of one single bag is 5kg." ? (Wondering)

I like Serena said:
Where or how did you get that z-value?

From the table we get the result $0.05$ for $z=-1.645$. Is this wrong? (Wondering)
I like Serena said:
To say anything about beta, we basically need to know what the real distribution of the population is, so that we can estimate whether the null hypothesis holds. That distribution is not given is it? (Wondering)

So, we cannot say anything about beta, can we? (Wondering)

What additional information about the distribution do have to know to compute it? For example if the distribution is normal? (Wondering)
 
Last edited by a moderator:
mathmari said:
So, the hypothesis doesn't say anything about the average weight of the potatoes, does it? Is the null hypothesis then:
"The weight of one single bag is equal to the average weight. So, the weight of one single bag is 5kg." ? (Wondering)

The null hypotheses is that the population mean $\mu$ is equal to a certain number.
The alternative hypothesis is that the population mean $\mu$ somehow differs from that certain number.

So we should have:
$$H_0: \mu = 5\text{ kg} \\ H_1: \mu \ne 5 \text{ kg}$$
This form of $H_1$ is called 2-sided, since the real population mean $\mu$ could be either higher or lower (from the word 'differs') than $5\text{ kg}$. (Nerd)
mathmari said:
From the table we get the result $0.05$ for $z=-1.645$. Is this wrong? (Wondering)

Ah. I see what you mean. This is the so called critical z-value for $\alpha=0.05$ of a 1-sided alternative hypothesis (like $H_1: \mu > 5 \text{ kg}$).
However, since we have a 2-sided alternative hypothesis, we should look up the z-value for $\frac\alpha 2=0.025$, which is $z^*=1.96$ ($z^*$ to denote the critical z-value).
mathmari said:
So, we cannot say anything about beta, can we? (Wondering)

What additional information about the distribution do have to know to compute it? For example if the distribution is normal? (Wondering)

To calculate $\beta$ we typically need the real $\mu$ and $\sigma$ of the population that should be such that the alternative hypothesis is satisfied.
With those we can calculate the probability $\beta$ that we keep $H_0$ even though the real population distribution matches the $H_1$ hypothesis. (Thinking)
 
I like Serena said:
The null hypotheses is that the population mean $\mu$ is equal to a certain number.
The alternative hypothesis is that the population mean $\mu$ somehow differs from that certain number.

So we should have:
$$H_0: \mu = 5\text{ kg} \\ H_1: \mu \ne 5 \text{ kg}$$
This form of $H_1$ is called 2-sided, since the real population mean $\mu$ could be either higher or lower (from the word 'differs') than $5\text{ kg}$. (Nerd)

Ah ok. I see! (Smile)
I like Serena said:
Ah. I see what you mean. This is the so called critical z-value for $\alpha=0.05$ of a 1-sided alternative hypothesis (like $H_1: \mu > 5 \text{ kg}$).
However, since we have a 2-sided alternative hypothesis, we should look up the z-value for $\frac\alpha 2=0.025$, which is $z^*=1.96$ ($z^*$ to denote the critical z-value).

Ah ok. (Thinking)
I like Serena said:
To calculate $\beta$ we typically need the real $\mu$ and $\sigma$ of the population that should be such that the alternative hypothesis is satisfied.
With those we can calculate the probability $\beta$ that we keep $H_0$ even though the real population distribution matches the $H_1$ hypothesis. (Thinking)

We have that "The sample (n = 25) gives an average value of 5.2 kg and an empirical standard deviation of 0.5 kg.".
Do we not get from that that $\mu=5.2$ and $\sigma=0.5$ ? (Wondering)
 
mathmari said:
We have that "The sample (n = 25) gives an average value of 5.2 kg and an empirical standard deviation of 0.5 kg.".
Do we not get from that that $\mu=5.2$ and $\sigma=0.5$ ? (Wondering)

mathmari said:
The sample (n = 25) gives an average value of 5.2 kg and an empirical standard deviation of 0.5 kg. By that sample can we say that that the average weight of the potatoes in the bag differs from 5 kg? The significance level is 0.05.

The problem statement asks whether we the alternative hypothesis is true.
The way to do that is to calculate the z-score given by:
$$SE = \frac{\sigma}{\sqrt n} \\ z = \frac{\bar x - 5\text{ kg}}{SE}$$
where $\sigma$ is the empirical standard deviation (and not the standard deviation of the sample), and $SE$ is the so called standard error.

We didn't do that yet did we? (Wondering)

mathmari said:
The Beta error is the possibility that the null hypothesis is hold although it is false.

Then it s asked the following:

How big is the Beta error?

Well... the actual population mean is not given, so I don't think we can calculate $\beta$.
Alternatively, perhaps we're supposed to assume that the sample mean is somehow representative of the real distribution, which is a bit of a stretch... (Thinking)
If so we should assume that $\mu = 5.2\text{ kg}$ and $\sigma=0.5\text{ kg}$, after which we should calculate:
$$\beta = P\big((X < 5 + z^* \cdot SE) \land (X > 5 - z^* \cdot SE)\big)$$
or:
$$\beta = P\left(\left(Z < \frac{5 - 5.2}{SE} + z^*\right) \land \left(Z > \frac{5 - 5.2}{SE} - z^*\right)\right)$$
which is the probability that we keep $H_0$ even though the population distribution is assumed to be given by the sample mean and the empirical standard deviation.
 
Last edited:
I like Serena said:
Alternatively, perhaps we're supposed to assume that the sample mean is somehow representative of the real distribution, which is a bit of a stretch... (Thinking)
If so we should assume that $\mu = 5.2\text{ kg}$ and $\sigma=0.5\text{ kg}$, after which we should calculate:
$$\beta = P\big((X < 5 + z^* \cdot SE) \land (X > 5 - z^* \cdot SE)\big)$$
or:
$$\beta = P\left(\left(Z < \frac{5 - 5.2}{SE} + z^*\right) \land \left(Z < \frac{5 - 5.2}{SE} - z^*\right)\right)$$
which is the probability that we keep $H_0$ even though the population distribution is assumed to be given by the sample mean and the empirical standard deviation.

What is $z^*$ ? The one that you defined above, $z = \frac{\bar x - 5\text{ kg}}{SE}$ ? (Wondering)
 
mathmari said:
What is $z^*$ ? The one that you defined above, $z = \frac{\bar x - 5\text{ kg}}{SE}$ ? (Wondering)

No. $z^*$ is the so called critical z-value.

I like Serena said:
mathmari said:
From the table we get the result $0.05$ for $z=-1.645$. Is this wrong? (Wondering)

Ah. I see what you mean. This is the so called critical z-value for $\alpha=0.05$ of a 1-sided alternative hypothesis (like $H_1: \mu > 5 \text{ kg}$).
However, since we have a 2-sided alternative hypothesis, we should look up the z-value for $\frac\alpha 2=0.025$, which is $z^*=1.96$ ($z^*$ to denote the critical z-value).

In this problem we have $z^* = 1.96$, which is in your table, and which is the most common critical z-value. (Nerd)
 
I like Serena said:
No. $z^*$ is the so called critical z-value.
In this problem we have $z^* = 1.96$, which is in your table, and which is the most common critical z-value. (Nerd)
Ah ok. So, do we have the following? (Wondering)

\begin{align*}\beta &= P\left(\left(Z < \frac{5 - 5.2}{SE} + z^*\right) \land \left(Z < \frac{5 - 5.2}{SE} - z^*\right)\right) \\ & =P\left(\left(Z < \frac{0.2}{0.1} + 1.96\right) \land \left(Z < \frac{0.2}{0.1} - 1.96\right)\right) \\ & =P\left(\left(Z < 3.96\right) \land \left(Z < 0.04\right)\right) \\ & =P\left(Z < 0.04\right) \\ & =0.51595\end{align*}
 
  • #10
mathmari said:
Ah ok. So, do we have the following? (Wondering)

\begin{align*}\beta &= P\left(\left(Z < \frac{5 - 5.2}{SE} + z^*\right) \land \left(Z < \frac{5 - 5.2}{SE} - z^*\right)\right) \\ & =P\left(\left(Z < \frac{0.2}{0.1} + 1.96\right) \land \left(Z < \frac{0.2}{0.1} - 1.96\right)\right) \\ & =P\left(\left(Z < 3.96\right) \land \left(Z < 0.04\right)\right) \\ & =P\left(Z < 0.04\right) \\ & =0.51595\end{align*}

Erm... (Blush)

That should be:
\begin{align*}
\beta &= P\left(\left(Z < \frac{5 - 5.2}{SE} + z^*\right) \land \left(Z > \frac{5 - 5.2}{SE} - z^*\right)\right) \\
& =P\left(\left(Z < \frac{-0.2}{0.1} + 1.96\right) \land \left(Z > \frac{-0.2}{0.1} - 1.96\right)\right) \\
& =P\left(\left(Z < -0.04\right) \land \left(Z > -3.96\right)\right) \\
& \approx P\left(Z < -0.04\right)
\end{align*}
(Thinking)
 
  • #11
I like Serena said:
Erm... (Blush)

That should be:
\begin{align*}
\beta &= P\left(\left(Z < \frac{5 - 5.2}{SE} + z^*\right) \land \left(Z > \frac{5 - 5.2}{SE} - z^*\right)\right) \\
& =P\left(\left(Z < \frac{-0.2}{0.1} + 1.96\right) \land \left(Z > \frac{-0.2}{0.1} - 1.96\right)\right) \\
& =P\left(\left(Z < -0.04\right) \land \left(Z > -3.96\right)\right) \\
& \approx P\left(Z < -0.04\right)
\end{align*}
(Thinking)

Oh yes... (Blush)

So, the result is then $0.48405$, or not? (Wondering)
 
  • #12
mathmari said:
Oh yes... (Blush)

So, the result is then $0.48405$, or not? (Wondering)

That looks about right. (Nod)

And since it is not in the list of given answers, perhaps the answer should be "cannot be given" after all? (Wondering)
 
  • #13
I like Serena said:
That looks about right. (Nod)

And since it is not in the list of given answers, perhaps the answer should be "cannot be given" after all? (Wondering)

It must be so.

Thank you very much! (Happy)
 
  • #14
For the record, this is what the distributions look like when trying to determine $\beta$.
\begin{tikzpicture}
%preamble \usepackage{pgfplots}
\pgfmathdeclarefunction{gauss}{3}{%
\pgfmathparse{1/(#3*sqrt(2*pi))*exp(-((#1-#2)^2)/(2*#3^2))}%
}
\begin{axis}[
no markers, domain=4.5:5.7, samples=100,
axis lines*=left, xlabel=$x$, ylabel=$p$,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm, width=12cm,
xtick={5,5.2}, ytick=\empty,
enlargelimits=false, clip=false, axis on top,
grid = major
]
\addplot [fill=cyan!30, draw=none, domain=4.5:5.19] {gauss(x,5.2,0.1)} \closedcycle;
\addplot [fill=red!30, draw=none, domain=5.19:5.7] {gauss(x,5,0.1)} \closedcycle;
\addplot [fill=red!30, draw=none, domain=4.5:4.81] {gauss(x,5,0.1)} \closedcycle;
\addplot [very thick,cyan!50!black] {gauss(x,5,0.1)};
\addplot [very thick,cyan!50!black] {gauss(x,5.2,0.1)};

\draw [yshift=-0.6cm, latex-latex](axis cs:5,0) -- node [fill=white] {$1.96\sigma$} (axis cs:5.19,0);
\node at (axis cs:5.12, 1.1) {$\beta$};
\node at (axis cs:4.78, 0.16) {$\alpha/2$};
\node at (axis cs:5.22, 0.16) {$\alpha/2$};
\node at (axis cs:5, 4.3) {$N(5,SE)$};
\node at (axis cs:5.2, 4.3) {$N(\mu,SE)$};
\end{axis}
\end{tikzpicture}

The light-red part is $\alpha/2=0.025$, which is where we reject the null hypothesis.
And the cyan part is $\beta \approx 0.48$ where we keep the null hypothesis even though it should have been rejected based on the actual population distribution (which is usually not known). (Thinking)
 
Back
Top