Performance of students - Hypothesis testing

mathmari · Apr 17, 2018

Hey!

A teacher wants to find out if the order of the exam tasks has an impact on the performance of the students. Therefore, he creates two versions ($ X $ and $ Y $) of an exam in which the exam tasks are arranged differently. The versions are randomly distributed so that $ n $ students receive version $ X $, and $ m = n $ receive version $ Y $ from them. We call the expected score at $ X $ with $ \mu_X $, and the expected score at $ Y $ with $ \mu_Y $. The variances are denoted $ \sigma_X^2 $ and $ \sigma_Y^2 $; it is assumed normal distribution. (a) Formulate a suitable null hypothesis for the question of the teacher.
(b) Consider that $n = 30, \overline{X} = 79, \overline{Y}= 74, S_X' = 14, S_Y' = 20$. Check the null hypothesis of (a) with significance level $\alpha=5\%$.
(c) Consider that $\overline{X} = 79, \overline{Y}= 80, S_X' = 14, S_Y' = 20$. For which sample size $n$ can we reject the null hypothesis with significance level $\alpha=1\%$ ? I have done the following:

(a) The null hypothesis is $H_0: \mu_X=\mu_Y$, right? (Wondering) (b) Since we don't know if we have the same or different variances, we have to test if we have the same $\sigma$ with a F-test.

If $\sigma_x=\sigma_y$ then we apply a two-samples t-test.
If $\sigma_x<\sigma_y$ then we apply a Welch-Test

The test is the following:

The null hypothesis and the alternative hypothesis is $H_0:\sigma_Y^2=\sigma_X^2$ and $H_1:\sigma_Y^2>\sigma_X^2$, respectively.

The test statistic is \begin{equation*}F=\frac{{S_Y'}^2}{{S_X'}^2}=\frac{20^2}{14^2}=\frac{400}{196}\approx 2.0408\end{equation*}
$F$ is F-distributed with degres of freedom $\nu_Y=n_Y-1=30-1=29$, $\nu_X=n_X-130-1=29$.

We have that $1-\alpha=95\%$.

The null hypothesis will be rejected if $F>F_{1-\alpha}(\nu_Y, \nu_X)=F_{0.95}(29, 29)$.

It holds that $F_{0.95}(29, 29)=1,86$.

Since $F=2.0408>1.86=F_{0.95}(29, 29)$, we reject the null hypothesis. So, we apply a Welch-Test. The zero-hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1:\mu_X-\mu_Y\neq 0$.

The test statistic $T$ for the t-Test with unknown variances \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}}}=\frac{79-74}{\sqrt{\frac{14^2}{30}+\frac{20^2}{30}}}=\frac{5}{\sqrt{\frac{196}{30}+\frac{400}{30}}}=\frac{5}{\sqrt{\frac{298}{15}}}\approx 1.1218\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}$.

The number od degrees of freedom is\begin{align*}k&=\frac{\left (\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}\right )^2}{\frac{1}{n_X-1}\left (\frac{{S_X'}^2}{n_X}\right )^2+\frac{1}{n_Y-1}\left (\frac{{S_Y'}^2}{n_Y}\right )^2}=\frac{\left (\frac{14^2}{30}+\frac{20^2}{30}\right )^2}{\frac{1}{30-1}\left (\frac{14^2}{30}\right )^2+\frac{1}{30-1}\left (\frac{20^2}{30}\right )^2}=\frac{\left (\frac{196}{30}+\frac{400}{30}\right )^2}{\frac{1}{29}\left (\frac{196}{30}\right )^2+\frac{1}{29}\left (\frac{400}{30}\right )^2} \\ & =\frac{\left (\frac{596}{30}\right )^2}{\frac{1}{29}\left (\frac{38416}{900}+\frac{160000}{900}\right )}=\frac{\frac{355216}{900}}{\frac{1}{29}\cdot \frac{198416}{900}}=\frac{355216\cdot 29}{ 198416}=\frac{10301264}{ 198416}\approx 51.9175\end{align*} so $k=52$.

So we get the critical value $t_{k;1-\alpha/2}=t_{52;0.975}=1.67$.

Since $|T|=1.1218<1.67=t_{52;0.975}$ we don't reject the null hypothesis. Is everything correct? (Wondering)

(c) Do we have to do the same as in (b) just with unknown n? (Wondering)

I like Serena · Apr 17, 2018

mathmari said:

The number od degrees of freedom is $k\approx 51.9175$ so $k=52$.

Hey mathmari!

Just a nitpick. Generally we round degrees of freedom down, so I believe we should pick $k=51$.
That's because we want to be sure with a confidence of 'at least' $1-\alpha$ before we reject the null hypothesis.
In case of doubt, we can't.
So we should round to the safe side. (Nerd)

mathmari said:

So we get the critical value $t_{k;1-\alpha/2}=t_{52;0.975}=1.67$.

Since $|T|=1.1218<1.67=t_{52;0.975}$ we don't reject the null hypothesis. Is everything correct? (Wondering)

(c) Do we have to do the same as in (b) just with unknown n? (Wondering)

Yep. Yep. (Nod)

mathmari · Apr 17, 2018

I like Serena said:

Just a nitpick. Generally we round degrees of freedom down, so I believe we should pick $k=51$.
That's because we want to be sure with a confidence of 'at least' $1-\alpha$ before we reject the null hypothesis.
In case of doubt, we can't.
So we should round to the safe side. (Nerd)

Ah ok, I understand! (Nerd)

I like Serena said:

Yep. Yep. (Nod)

We have the following at (c) :

We check again with an F-test if the variances are equal. Or is it not neccesary and it holds the same as at (b) ? (Wondering)

The F-test would be the following:

The null hypothesis is $H_0:\sigma_Y^2=\sigma_X^2$ and the alternative hypothesis is $H_1:\sigma_Y^2>\sigma_X^2$.

The test statistic is \begin{equation*}F=\frac{{S_Y'}^2}{{S_X'}^2}=\frac{20^2}{14^2}=\frac{400}{196}\approx 2.0408\end{equation*}
$F$ is F-distributed with degress of freedom $\nu_Y=\nu_X=n-1$.

We have that $1-\alpha=99\%$.

The null hypothesis will be rejected if $F>F_{1-\alpha}(\nu_Y, \nu_X)=F_{0.99}(n-1, n-1)$.

How can we determine $F_{0.99}(n-1, n-1)$ without knowing $n$ ? (Wondering)

I like Serena · Apr 17, 2018

We already know that we'll need a bigger $n$ than we had for (b) don't we?

Let's inspect the F-table with the smaller $\alpha$ and with the same $F$-value (since the variances are the same).
What happens if we increase the degrees of freedom of both the numerator and the denominator?
Is there a possibility that we can assume equal variances after all? (Wondering)

mathmari · Apr 17, 2018

I like Serena said:

We already know that we'll need a bigger $n$ than we had for (b) don't we?

Let's inspect the F-table with the smaller $\alpha$ and with the same $F$-value (since the variances are the same).
What happens if we increase the degrees of freedom of both the numerator and the denominator?
Is there a possibility that we can assume equal variances after all? (Wondering)

So we have to check for which n we have at this table. For $n-1\geq 30$ do we nit get values smaller than $F=2.0408$ and so the null hypothesis is rejected, or not?

So we have to apply again a Welch-test.

Or am I wrong? (Wondering)

I like Serena · Apr 17, 2018

mathmari said:

So we have to check for which n we have at this table. For $n-1\geq 30$ do we nit get values smaller than $F=2.0408$ and so the null hypothesis is rejected, or not?

So we have to apply again a Welch-test.

Or am I wrong? (Wondering)

There seem to be mistakes in that table. For instance $F_{0.99}(31,31)=1.98$ is lower than the values to the left and right of it, which is not possible. (Worried)
I think we should use another table.

In R we can do:

Code:

> qf(0.99, 43:45, 43:45)
[1] 2.056934 2.039508 2.022824

So for $n-1\ge 44$ the critical $F$-values are below our $F=2.0408$, so we will have to reject the null hypothesis for those $n$, and apply the Welch-test. (Thinking)

mathmari · Apr 17, 2018

I like Serena said:

There seem to be mistakes in that table. For instance $F_{0.99}(31,31)=1.98$ is lower than the values to the left and right of it, which is not possible. (Worried)
I think we should use another table.

Ah ok!

I like Serena said:
In R we can do:
Code:
> qf(0.99, 43:45, 43:45)
[1] 2.056934 2.039508 2.022824
So for $n-1\ge 44$ the critical $F$-values are below our $F=2.0408$, so we will have to reject the null hypothesis for those $n$, and apply the Welch-test. (Thinking)

We want for which $n$ the null hypothesis of (a) can be rejected.
So do we have to take cases and find $n$ if $\sigma_x=\sigma_y$, i.e. with a two-samples t-test and also if $\sigma_x<\sigma_y$, i.e. with a Welch-test?

(Wondering)

I like Serena · Apr 17, 2018

mathmari said:

We want for which $n$ the null hypothesis of (a) can be rejected.
So do we have to take cases and find $n$ if $\sigma_x=\sigma_y$, i.e. with a two-samples t-test and also if $\sigma_x<\sigma_y$, i.e. with a Welch-test?

Yep. We can do that.
So for $n-1 < 44$ we should assume equal variances, find the critical $n$, and verify that it indeed satisfies $n-1 < 44$.
And for $n-1 \ge 44$ we should assume unequal variances, find the critical $n$, and verify that it indeed satisfies $n-1 \ge 44$. (Thinking)

mathmari · Apr 17, 2018

For $n-1\ge 44$ we apply the Welch-Test.

The null hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1:\mu_X-\mu_Y\neq 0$.

The test statistic is $T$ for the t-test with unknown variances is \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}}}=\frac{79-80}{\sqrt{\frac{14^2}{n}+\frac{20^2}{n}}}=\frac{1}{\sqrt{\frac{196}{n}+\frac{400}{n}}}=\frac{1}{\sqrt{\frac{596}{n}}}=\frac{\sqrt{n}}{2\sqrt{149}}\geq \frac{\sqrt{45}}{2\sqrt{149}}\approx 0.2748\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}$.

The degree of freedom is \begin{align*}k&=\frac{\left (\frac{{S_X'}^2}{n_X}+\frac{{S_Y'}^2}{n_Y}\right )^2}{\frac{1}{n_X-1}\left (\frac{{S_X'}^2}{n_X}\right )^2+\frac{1}{n_Y-1}\left (\frac{{S_Y'}^2}{n_Y}\right )^2}=\frac{\left (\frac{14^2}{n}+\frac{20^2}{n}\right )^2}{\frac{1}{n-1}\left (\frac{14^2}{n}\right )^2+\frac{1}{n-1}\left (\frac{20^2}{n}\right )^2}=\frac{\left (\frac{196}{n}+\frac{400}{n}\right )^2}{\frac{1}{n-1}\left (\frac{196}{n}\right )^2+\frac{1}{n-1}\left (\frac{400}{n}\right )^2} \\ & =\frac{\left (\frac{596}{n}\right )^2}{\frac{1}{n-1}\left (\frac{38416}{n^2}+\frac{160000}{n^2}\right )}=\frac{\frac{355216}{n^2}}{\frac{1}{n-1}\cdot \frac{198416}{n^2}}=\frac{355216\cdot (n-1)}{ 198416}\geq \frac{355216\cdot 44}{ 198416}=78.7714\end{align*} so $k=78$.

The critical value is therefore $t_{k;1-\alpha/2}=t_{78;0.995}=2.375$.

How can we compare $|T|\geq 0.2748$ and $t_{78;0.995}=2.375$ where we have inequalities? (Wondering)

I like Serena · Apr 17, 2018

They are not inequalities if we pick a specific n.
In fact we have found that for n=45, we cannot reject H0.
We will need a bigger n.
How about n=100? Or n=1000? (Wondering)

mathmari · Apr 17, 2018

I like Serena said:

They are not inequalities if we pick a specific n.
In fact we have found that for n=45, we cannot reject H0.
We will need a bigger n.
How about n=100? Or n=1000? (Wondering)

Ahh.. We reject the null hypothesis if $$|T|>t_{78;0.995}\Rightarrow \frac{\sqrt{n}}{2\sqrt{149}}>2.375\Rightarrow n>3361.81$$ right? (Wondering)

I like Serena · Apr 17, 2018

mathmari said:

Ahh.. We reject the null hypothesis if $$|T|>t_{78;0.995}\Rightarrow \frac{\sqrt{n}}{2\sqrt{149}}>2.375\Rightarrow n>3361.81$$ right? (Wondering)

If n is bigger, doesn't the degrees of freedom k also become bigger? (Wondering)
Then the critical t-value becomes smaller until it approaches the critical z-value. Doesn't it?

mathmari · Apr 17, 2018

I like Serena said:

If n is bigger, doesn't the degrees of freedom k also become bigger? (Wondering)
Then the critical t-value becomes smaller until it approaches the critical z-value. Doesn't it?

So, for big degree of freedom the crtitical t-value approximates the critical z-value and so $t_{k;0.995}\approx z_{0.995}= 2.575$ ? (Wondering)

I like Serena · Apr 18, 2018

mathmari said:

So, for big degree of freedom the crtitical t-value approximates the critical z-value and so $t_{k;0.995}\approx z_{0.995}= 2.575$ ? (Wondering)

Ah. You already had the critical z-value! (Blush)
Then it's all correct.

mathmari · Apr 18, 2018

For $n-1\ge 44$ we apply the Welch-test.

The null hypothesis is $H_0: \mu_X-\mu_Y=0$ and the alternative hypothesis is $H_1: \mu_X-\mu_Y\neq 0$.

The test statistic $T$ for the t-test is \begin{equation*}T=\frac{\overline{X}-\overline{Y}-0}{\sqrt{\frac{S_X'^2}{n_X}+\frac{S_Y'^2}{n_Y}}}=\frac{79-80}{\sqrt{\frac{14^2}{n}+\frac{20^2}{n}}}=\frac{-1}{\sqrt{\frac{596}{n}}}=-\frac{\sqrt{n}}{\sqrt{596}}\approx -0.04096\sqrt{n}\end{equation*}

The null hypothesis will be rejected if $|T|>t_{k;1-\alpha/2}=t_{k;0.995}$.

From $n\geq 30$ the t-distribution can be approximated by the normal distribution.

Since this holds in this case, $n\geq 45$, we have that $t_{k;0.995}\approx z_{0.995}=2.575$.

Therefore, so that the null hypothesis is rejected it must hold the following: \begin{equation*}|T|>t_{k;0.995}\Rightarrow 0.04096\sqrt{n}>2.575 \Rightarrow n>3952.16\end{equation*}
So, the null hypothesis will be rejected for a sample of size $n\geq 3953$. Is everything correct? (Wondering)
Let's consider the case $n-1<44$. We apply here a two-samples t-test.

The test statistic is $T=\frac{\overline{X}-\overline{Y}}{S\cdot \sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}}$ with $S=\sqrt{\frac{(n_X-1)S_X^2+(n_Y-1)S_Y^2}{n_X+n_Y-2}}$, right?

So, we have that $S=\sqrt{\frac{(n-1)14^2+(n-1)20^2}{n+n-2}}=\sqrt{\frac{(n-1)196+(n-1)400}{2n-2}}=\sqrt{\frac{596(n-1)}{2(n-1)}}=\sqrt{\frac{596}{2}}\approx 17.2627$.

Thereforee we get $T=\frac{79-80}{17.2627\cdot \sqrt{\frac{1}{n}+\frac{1}{n}}}=\frac{-\sqrt{n}}{17.2627\cdot \sqrt{2}}=-0.0409615 \sqrt{n}$.

How could we deterine here $t_{k;0.995}$? We cannot approximate the t-distribution by a normal distribution for $n<30$.

(Wondering)

I like Serena · Apr 18, 2018

mathmari said:

For $n-1\ge 44$ we apply the Welch-test.
...
So, the null hypothesis will be rejected for a sample of size $n\geq 3953$.

Is everything correct?

It looks correct to me. (Nod)

mathmari said:

Let's consider the case $n-1<44$. We apply here a two-samples t-test.

The test statistic is $T=\frac{\overline{X}-\overline{Y}}{S\cdot \sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}}$ with $S=\sqrt{\frac{(n_X-1)S_X^2+(n_Y-1)S_Y^2}{n_X+n_Y-2}}$, right?

So, we have that $S=\sqrt{\frac{(n-1)14^2+(n-1)20^2}{n+n-2}}=\sqrt{\frac{(n-1)196+(n-1)400}{2n-2}}=\sqrt{\frac{596(n-1)}{2(n-1)}}=\sqrt{\frac{596}{2}}\approx 17.2627$.

Thereforee we get $T=\frac{79-80}{17.2627\cdot \sqrt{\frac{1}{n}+\frac{1}{n}}}=\frac{-\sqrt{n}}{17.2627\cdot \sqrt{2}}=-0.0409615 \sqrt{n}$.

How could we deterine here $t_{k;0.995}$? We cannot approximate the t-distribution by a normal distribution for $n<30$.

(Wondering)

How about checking every $n$ between $1$ and $44$?
Maybe we can find a pattern so that we have to check fewer values. (Wondering)

mathmari · Apr 18, 2018

I like Serena said:

How about checking every $n$ between $1$ and $44$?
Maybe we can find a pattern so that we have to check fewer values. (Wondering)

As I read now, the null hypothesis is rejected if $|T|>t_{1-\alpha/2;n_X+n_Y-2}$, isn't it?

Then we have the ollowing:
$$|T|>t_{1-\alpha/2;n_X+n_Y-2}\Rightarrow |T|>t_{0.995;n+n-2}\Rightarrow 0.0409615 \sqrt{n}>t_{0.995;2n-2}$$

To check that for every $n$ between $1$ and $44$ (i.e. for every $2n-2$ between $0$ and $86$) using the R-compiler do we write [m]qt(0.01, 0 : 86)[/m] ? If yes, we get only negativ values, and that would mean that the above holds for every $n$.

(Wondering)

I like Serena · Apr 18, 2018

mathmari said:

As I read now, the null hypothesis is rejected if $|T|>t_{1-\alpha/2;n_X+n_Y-2}$, isn't it?

Then we have the ollowing:
$$|T|>t_{1-\alpha/2;n_X+n_Y-2}\Rightarrow |T|>t_{0.995;n+n-2}\Rightarrow 0.0409615 \sqrt{n}>t_{0.995;2n-2}$$

To check that for every $n$ between $1$ and $44$ (i.e. for every $2n-2$ between $0$ and $86$) using the R-compiler do we write [m]qt(0.01, 0 : 86)[/m] ? If yes, we get only negativ values, and that would mean that the above holds for every $n$.

(Wondering)

Shouldn't we check [m]qt(0.995, 0 : 86)[/m]? (Wondering)

mathmari · Apr 18, 2018

I like Serena said:

Shouldn't we check [m]qt(0.995, 0 : 86)[/m]? (Wondering)

Oh yes (Blush)

So, at the left side of the inequation the lergst number that we get, i.e. for $n=44$, is about $0.271708$. At the right side every number is greater than $2.634212$.
So that inequality doesn't hold for any $n$.

Is this correct? (Wondering)

I like Serena · Apr 18, 2018

mathmari said:

Oh yes (Blush)

So, at the left side of the inequation the lergst number that we get, i.e. for $n=44$, is about $0.271708$. At the right side every number is greater than $2.634212$.
So that inequality doesn't hold for any $n$.

Is this correct? (Wondering)

Looks correct to me. (Nod)

mathmari · Apr 18, 2018

I like Serena said:

Looks correct to me. (Nod)

Great! Thank you so much! (Yes)

Performance of students - Hypothesis testing

1. What is hypothesis testing?

2. How does hypothesis testing relate to performance of students?

3. What is a p-value and why is it important in hypothesis testing?

4. How do you choose the appropriate statistical test for hypothesis testing?

5. What are some potential limitations of using hypothesis testing to evaluate student performance?

Similar threads

Hot Threads

Recent Insights