MHB Performance of students - Hypothesis testing

mathmari · Apr 17, 2018

Hey!

A teacher wants to find out if the order of the exam tasks has an impact on the performance of the students. Therefore, he creates two versions ($ X $ and $ Y $) of an exam in which the exam tasks are arranged differently. The versions are randomly distributed so that $ n $ students receive version $ X $, and $ m = n $ receive version $ Y $ from them. We call the expected score at $ X $ with $ \mu_X $, and the expected score at $ Y $ with $ \mu_Y $. The variances are denoted $ \sigma_X^2 $ and $ \sigma_Y^2 $; it is assumed normal distribution. (a) Formulate a suitable null hypothesis for the question of the teacher.
(b) Consider that $n = 30, \overline{X} = 79, \overline{Y}= 74, S_X' = 14, S_Y' = 20$. Check the null hypothesis of (a) with significance level $\alpha=5\%$.
(c) Consider that $\overline{X} = 79, \overline{Y}= 80, S_X' = 14, S_Y' = 20$. For which sample size $n$ can we reject the null hypothesis with significance level $\alpha=1\%$ ? I have done the following:

(a) The null hypothesis is $H_0: \mu_X=\mu_Y$, right? (Wondering) (b) Since we don't know if we have the same or different variances, we have to test if we have the same $\sigma$ with a F-test.

If $\sigma_x=\sigma_y$ then we apply a two-samples t-test.
If $\sigma_x<\sigma_y$ then we apply a Welch-Test

The test is the following:

The null hypothesis and the alternative hypothesis is $H_0:\sigma_Y^2=\sigma_X^2$ and $H_1:\sigma_Y^2>\sigma_X^2$, respectively.

The test statistic is \begin{equation*}F=\frac{{S_Y'}^2}{{S_X'}^2}=\frac{20^2}{14^2}=\frac{400}{196}\approx 2.0408\end{equation*}
$F$ is F-distributed with degres of freedom $\nu_Y=n_Y-1=30-1=29$, $\nu_X=n_X-130-1=29$.

We have that $1-\alpha=95\%$.

The null hypothesis will be rejected if $F>F_{1-\alpha}(\nu_Y, \nu_X)=F_{0.95}(29, 29)$.

It holds that $F_{0.95}(29, 29)=1,86$.

Since $F=2.0408>1.86=F_{0.95}(29, 29)$, we reject the null hypothesis. So, we apply a Welch-Test. The zero-hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1:\mu_X-\mu_Y\neq 0$.

The test statistic $T$ for the t-Test with unknown variances \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}}}=\frac{79-74}{\sqrt{\frac{14^2}{30}+\frac{20^2}{30}}}=\frac{5}{\sqrt{\frac{196}{30}+\frac{400}{30}}}=\frac{5}{\sqrt{\frac{298}{15}}}\approx 1.1218\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}$.

The number od degrees of freedom is\begin{align*}k&=\frac{\left (\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}\right )^2}{\frac{1}{n_X-1}\left (\frac{{S_X'}^2}{n_X}\right )^2+\frac{1}{n_Y-1}\left (\frac{{S_Y'}^2}{n_Y}\right )^2}=\frac{\left (\frac{14^2}{30}+\frac{20^2}{30}\right )^2}{\frac{1}{30-1}\left (\frac{14^2}{30}\right )^2+\frac{1}{30-1}\left (\frac{20^2}{30}\right )^2}=\frac{\left (\frac{196}{30}+\frac{400}{30}\right )^2}{\frac{1}{29}\left (\frac{196}{30}\right )^2+\frac{1}{29}\left (\frac{400}{30}\right )^2} \\ & =\frac{\left (\frac{596}{30}\right )^2}{\frac{1}{29}\left (\frac{38416}{900}+\frac{160000}{900}\right )}=\frac{\frac{355216}{900}}{\frac{1}{29}\cdot \frac{198416}{900}}=\frac{355216\cdot 29}{ 198416}=\frac{10301264}{ 198416}\approx 51.9175\end{align*} so $k=52$.

So we get the critical value $t_{k;1-\alpha/2}=t_{52;0.975}=1.67$.

Since $|T|=1.1218<1.67=t_{52;0.975}$ we don't reject the null hypothesis. Is everything correct? (Wondering)

(c) Do we have to do the same as in (b) just with unknown n? (Wondering)

I like Serena · Apr 17, 2018

mathmari said:

The number od degrees of freedom is $k\approx 51.9175$ so $k=52$.

Hey mathmari!

Just a nitpick. Generally we round degrees of freedom down, so I believe we should pick $k=51$.
That's because we want to be sure with a confidence of 'at least' $1-\alpha$ before we reject the null hypothesis.
In case of doubt, we can't.
So we should round to the safe side. (Nerd)

mathmari said:

So we get the critical value $t_{k;1-\alpha/2}=t_{52;0.975}=1.67$.

Since $|T|=1.1218<1.67=t_{52;0.975}$ we don't reject the null hypothesis. Is everything correct? (Wondering)

(c) Do we have to do the same as in (b) just with unknown n? (Wondering)

Yep. Yep. (Nod)

mathmari · Apr 17, 2018

I like Serena said:

Just a nitpick. Generally we round degrees of freedom down, so I believe we should pick $k=51$.
That's because we want to be sure with a confidence of 'at least' $1-\alpha$ before we reject the null hypothesis.
In case of doubt, we can't.
So we should round to the safe side. (Nerd)

Ah ok, I understand! (Nerd)

I like Serena said:

Yep. Yep. (Nod)

We have the following at (c) :

We check again with an F-test if the variances are equal. Or is it not neccesary and it holds the same as at (b) ? (Wondering)

The F-test would be the following:

The null hypothesis is $H_0:\sigma_Y^2=\sigma_X^2$ and the alternative hypothesis is $H_1:\sigma_Y^2>\sigma_X^2$.

The test statistic is \begin{equation*}F=\frac{{S_Y'}^2}{{S_X'}^2}=\frac{20^2}{14^2}=\frac{400}{196}\approx 2.0408\end{equation*}
$F$ is F-distributed with degress of freedom $\nu_Y=\nu_X=n-1$.

We have that $1-\alpha=99\%$.

The null hypothesis will be rejected if $F>F_{1-\alpha}(\nu_Y, \nu_X)=F_{0.99}(n-1, n-1)$.

How can we determine $F_{0.99}(n-1, n-1)$ without knowing $n$ ? (Wondering)

I like Serena · Apr 17, 2018

We already know that we'll need a bigger $n$ than we had for (b) don't we?

Let's inspect the F-table with the smaller $\alpha$ and with the same $F$-value (since the variances are the same).
What happens if we increase the degrees of freedom of both the numerator and the denominator?
Is there a possibility that we can assume equal variances after all? (Wondering)

mathmari · Apr 17, 2018

I like Serena said:

We already know that we'll need a bigger $n$ than we had for (b) don't we?

Let's inspect the F-table with the smaller $\alpha$ and with the same $F$-value (since the variances are the same).
What happens if we increase the degrees of freedom of both the numerator and the denominator?
Is there a possibility that we can assume equal variances after all? (Wondering)

So we have to check for which n we have at this table. For $n-1\geq 30$ do we nit get values smaller than $F=2.0408$ and so the null hypothesis is rejected, or not?

So we have to apply again a Welch-test.

Or am I wrong? (Wondering)

I like Serena · Apr 17, 2018

mathmari said:

So we have to check for which n we have at this table. For $n-1\geq 30$ do we nit get values smaller than $F=2.0408$ and so the null hypothesis is rejected, or not?

So we have to apply again a Welch-test.

Or am I wrong? (Wondering)

There seem to be mistakes in that table. For instance $F_{0.99}(31,31)=1.98$ is lower than the values to the left and right of it, which is not possible. (Worried)
I think we should use another table.

In R we can do:

Code:

> qf(0.99, 43:45, 43:45)
[1] 2.056934 2.039508 2.022824

So for $n-1\ge 44$ the critical $F$-values are below our $F=2.0408$, so we will have to reject the null hypothesis for those $n$, and apply the Welch-test. (Thinking)

mathmari · Apr 17, 2018

I like Serena said:

There seem to be mistakes in that table. For instance $F_{0.99}(31,31)=1.98$ is lower than the values to the left and right of it, which is not possible. (Worried)
I think we should use another table.

Ah ok!

I like Serena said:
In R we can do:
Code:
> qf(0.99, 43:45, 43:45)
[1] 2.056934 2.039508 2.022824
So for $n-1\ge 44$ the critical $F$-values are below our $F=2.0408$, so we will have to reject the null hypothesis for those $n$, and apply the Welch-test. (Thinking)

We want for which $n$ the null hypothesis of (a) can be rejected.
So do we have to take cases and find $n$ if $\sigma_x=\sigma_y$, i.e. with a two-samples t-test and also if $\sigma_x<\sigma_y$, i.e. with a Welch-test?

(Wondering)

I like Serena · Apr 17, 2018

mathmari said:

We want for which $n$ the null hypothesis of (a) can be rejected.
So do we have to take cases and find $n$ if $\sigma_x=\sigma_y$, i.e. with a two-samples t-test and also if $\sigma_x<\sigma_y$, i.e. with a Welch-test?

Yep. We can do that.
So for $n-1 < 44$ we should assume equal variances, find the critical $n$, and verify that it indeed satisfies $n-1 < 44$.
And for $n-1 \ge 44$ we should assume unequal variances, find the critical $n$, and verify that it indeed satisfies $n-1 \ge 44$. (Thinking)

mathmari · Apr 17, 2018

For $n-1\ge 44$ we apply the Welch-Test.

The null hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1:\mu_X-\mu_Y\neq 0$.

The test statistic is $T$ for the t-test with unknown variances is \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}}}=\frac{79-80}{\sqrt{\frac{14^2}{n}+\frac{20^2}{n}}}=\frac{1}{\sqrt{\frac{196}{n}+\frac{400}{n}}}=\frac{1}{\sqrt{\frac{596}{n}}}=\frac{\sqrt{n}}{2\sqrt{149}}\geq \frac{\sqrt{45}}{2\sqrt{149}}\approx 0.2748\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}$.

The degree of freedom is \begin{align*}k&=\frac{\left (\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}\right )^2}{\frac{1}{n_X-1}\left (\frac{{S_X'}^2}{n_X}\right )^2+\frac{1}{n_Y-1}\left (\frac{{S_Y'}^2}{n_Y}\right )^2}=\frac{\left (\frac{14^2}{n}+\frac{20^2}{n}\right )^2}{\frac{1}{n-1}\left (\frac{14^2}{n}\right )^2+\frac{1}{n-1}\left (\frac{20^2}{n}\right )^2}=\frac{\left (\frac{196}{n}+\frac{400}{n}\right )^2}{\frac{1}{n-1}\left (\frac{196}{n}\right )^2+\frac{1}{n-1}\left (\frac{400}{n}\right )^2} \\ & =\frac{\left (\frac{596}{n}\right )^2}{\frac{1}{n-1}\left (\frac{38416}{n^2}+\frac{160000}{n^2}\right )}=\frac{\frac{355216}{n^2}}{\frac{1}{n-1}\cdot \frac{198416}{n^2}}=\frac{355216\cdot (n-1)}{ 198416}\geq \frac{355216\cdot 44}{ 198416}=78.7714\end{align*} so $k=78$.

The critical value is therefore $t_{k;1-\alpha/2}=t_{78;0.995}=2.375$.

How can we compare $|T|\geq 0.2748$ and $t_{78;0.995}=2.375$ where we have inequalities? (Wondering)

I like Serena · Apr 17, 2018

They are not inequalities if we pick a specific n.
In fact we have found that for n=45, we cannot reject H0.
We will need a bigger n.
How about n=100? Or n=1000? (Wondering)

mathmari · Apr 17, 2018

I like Serena said:

They are not inequalities if we pick a specific n.
In fact we have found that for n=45, we cannot reject H0.
We will need a bigger n.
How about n=100? Or n=1000? (Wondering)

Ahh.. We reject the null hypothesis if $$|T|>t_{78;0.995}\Rightarrow \frac{\sqrt{n}}{2\sqrt{149}}>2.375\Rightarrow n>3361.81$$ right? (Wondering)

I like Serena · Apr 17, 2018

mathmari said:

Ahh.. We reject the null hypothesis if $$|T|>t_{78;0.995}\Rightarrow \frac{\sqrt{n}}{2\sqrt{149}}>2.375\Rightarrow n>3361.81$$ right? (Wondering)

If n is bigger, doesn't the degrees of freedom k also become bigger? (Wondering)
Then the critical t-value becomes smaller until it approaches the critical z-value. Doesn't it?

mathmari · Apr 17, 2018

I like Serena said:

If n is bigger, doesn't the degrees of freedom k also become bigger? (Wondering)
Then the critical t-value becomes smaller until it approaches the critical z-value. Doesn't it?

So, for big degree of freedom the crtitical t-value approximates the critical z-value and so $t_{k;0.995}\approx z_{0.995}= 2.575$ ? (Wondering)

I like Serena · Apr 18, 2018

mathmari said:

So, for big degree of freedom the crtitical t-value approximates the critical z-value and so $t_{k;0.995}\approx z_{0.995}= 2.575$ ? (Wondering)

Ah. You already had the critical z-value! (Blush)
Then it's all correct.

mathmari · Apr 18, 2018

For $n-1\ge 44$ we apply the Welch-test.

The null hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1: \mu_X-\mu_Y\neq 0$.

The test statistic $T$ for the t-test is \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{S_X'^2}{n_X}+\frac{S_Y'^2}{n_Y}}}=\frac{79-80}{\sqrt{\frac{14^2}{n}+\frac{20^2}{n}}}=\frac{-1}{\sqrt{\frac{596}{n}}}=-\frac{\sqrt{n}}{\sqrt{596}}\approx -0.04096\sqrt{n}\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}=t_{k;0.995}$.

From $n\geq 30$ the t-distribution can be approximated by the normal distribution.

Since this holds in this case, $n\geq 45$, we have that $t_{k;0.995}\approx z_{0.995}=2.575$.

Therefore, so that the null hypothesis is rejected it must hold the following: \begin{equation*}|T|>t_{k;0.995}\Rightarrow 0.04096\sqrt{n}>2.575 \Rightarrow n>3952.16\end{equation*}
So, the null hypothesis will be rejected for a sample of size $n\geq 3953$. Is everything correct? (Wondering)
Let's consider the case $n-1<44$. We apply here a two-samples t-test.

The test statistic is $T=\frac{\overline{X}-\overline{Y}}{S\cdot \sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}}$ with $S=\sqrt{\frac{(n_X-1)S_X^2+(n_Y-1)S_Y^2}{n_X+n_Y-2}}$, right?

So, we have that $S=\sqrt{\frac{(n-1)14^2+(n-1)20^2}{n+n-2}}=\sqrt{\frac{(n-1)196+(n-1)400}{2n-2}}=\sqrt{\frac{596(n-1)}{2(n-1)}}=\sqrt{\frac{596}{2}}\approx 17.2627$.

Thereforee we get $T=\frac{79-80}{17.2627\cdot \sqrt{\frac{1}{n}+\frac{1}{n}}}=\frac{-\sqrt{n}}{17.2627\cdot \sqrt{2}}=-0.0409615 \sqrt{n}$.

How could we deterine here $t_{k;0.995}$? We cannot approximate the t-distribution by a normal distribution for $n<30$.

(Wondering)

I like Serena · Apr 18, 2018

mathmari said:

For $n-1\ge 44$ we apply the Welch-test.
...
So, the null hypothesis will be rejected for a sample of size $n\geq 3953$.

Is everything correct?

It looks correct to me. (Nod)

mathmari said:

Let's consider the case $n-1<44$. We apply here a two-samples t-test.

The test statistic is $T=\frac{\overline{X}-\overline{Y}}{S\cdot \sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}}$ with $S=\sqrt{\frac{(n_X-1)S_X^2+(n_Y-1)S_Y^2}{n_X+n_Y-2}}$, right?

So, we have that $S=\sqrt{\frac{(n-1)14^2+(n-1)20^2}{n+n-2}}=\sqrt{\frac{(n-1)196+(n-1)400}{2n-2}}=\sqrt{\frac{596(n-1)}{2(n-1)}}=\sqrt{\frac{596}{2}}\approx 17.2627$.

Thereforee we get $T=\frac{79-80}{17.2627\cdot \sqrt{\frac{1}{n}+\frac{1}{n}}}=\frac{-\sqrt{n}}{17.2627\cdot \sqrt{2}}=-0.0409615 \sqrt{n}$.

How could we deterine here $t_{k;0.995}$? We cannot approximate the t-distribution by a normal distribution for $n<30$.

(Wondering)

How about checking every $n$ between $1$ and $44$?
Maybe we can find a pattern so that we have to check fewer values. (Wondering)

mathmari · Apr 18, 2018

I like Serena said:

How about checking every $n$ between $1$ and $44$?
Maybe we can find a pattern so that we have to check fewer values. (Wondering)

As I read now, the null hypothesis is rejected if $|T|>t_{1-\alpha/2;n_X+n_Y-2}$, isn't it?

Then we have the ollowing:
$$|T|>t_{1-\alpha/2;n_X+n_Y-2}\Rightarrow |T|>t_{0.995;n+n-2}\Rightarrow 0.0409615 \sqrt{n}>t_{0.995;2n-2}$$

To check that for every $n$ between $1$ and $44$ (i.e. for every $2n-2$ between $0$ and $86$) using the R-compiler do we write [m]qt(0.01, 0 : 86)[/m] ? If yes, we get only negativ values, and that would mean that the above holds for every $n$.

(Wondering)

I like Serena · Apr 18, 2018

mathmari said:

As I read now, the null hypothesis is rejected if $|T|>t_{1-\alpha/2;n_X+n_Y-2}$, isn't it?

Then we have the ollowing:
$$|T|>t_{1-\alpha/2;n_X+n_Y-2}\Rightarrow |T|>t_{0.995;n+n-2}\Rightarrow 0.0409615 \sqrt{n}>t_{0.995;2n-2}$$

To check that for every $n$ between $1$ and $44$ (i.e. for every $2n-2$ between $0$ and $86$) using the R-compiler do we write [m]qt(0.01, 0 : 86)[/m] ? If yes, we get only negativ values, and that would mean that the above holds for every $n$.

(Wondering)

Shouldn't we check [m]qt(0.995, 0 : 86)[/m]? (Wondering)

mathmari · Apr 18, 2018

I like Serena said:

Shouldn't we check [m]qt(0.995, 0 : 86)[/m]? (Wondering)

Oh yes (Blush)

So, at the left side of the inequation the lergst number that we get, i.e. for $n=44$, is about $0.271708$. At the right side every number is greater than $2.634212$.
So that inequality doesn't hold for any $n$.

Is this correct? (Wondering)

I like Serena · Apr 18, 2018

mathmari said:

Oh yes (Blush)

So, at the left side of the inequation the lergst number that we get, i.e. for $n=44$, is about $0.271708$. At the right side every number is greater than $2.634212$.
So that inequality doesn't hold for any $n$.

Is this correct? (Wondering)

Looks correct to me. (Nod)

mathmari · Apr 18, 2018

I like Serena said:

Looks correct to me. (Nod)

Great! Thank you so much! (Yes)

MHB Performance of students - Hypothesis testing

Similar threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

A Distribution of Range of Samples taken from N(0,1)

B How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers