# Stats: Approximating a binomial with a normal distribution

1. Mar 16, 2016

### Amcote

1. The problem statement, all variables and given/known data
A multiple choice test consists of a series of questions, each with four possible answers.

How many questions are needed in order to be 99% confident that a student who guesses blindly at each question scores no more than 35% on the test?

2. Relevant equations

So I know that this is a binomial setting with p=0.25 and 'n' is what we are trying to solve for.
for binomial, μ=n*p, σ=sqrt(n*p(1-p))
P(B(n,0.25)≤0.35*n)=0.99
And because of the binomial setting, we must use a correction factor, in this case '+0.5'
Z= (x - μ)/σ

3. The attempt at a solution

First I should say I know the answer is suppose to be n≥92

So how I start this problem is I use the standardizing formula

Z= (x - μ)/σ

which in this case would be

Z= (0.35*n + 0.5 - n*0.25)/(sqrt(n*0.25*0.75)

This simplifies to

Z=(0.1*n + 0.5)/(sqrt(n)*sqrt(0.1875))

I think what I have done so far is correct. But where I get confused is finding a value for Z,

I thought what I have to do is something like :

P(B(n,0.25)≤0.35*n)=Φ((0.1*n + 0.5)/(sqrt(n)*sqrt(0.1875))) = 0.99

so, Φ((0.1*n + 0.5)/(sqrt(n)*sqrt(0.1875))) = 0.99

look up 0.01 on the table which gives me Φ(2.33)=Φ((0.1*n + 0.5)/(sqrt(n)*sqrt(0.1875)))

and so I should be able to set those equal and solve for n:

2.33 = ((0.1*n + 0.5)/(sqrt(n)*sqrt(0.1875))

When I solve for n I get a quadratic formula but neither answers I get is the correct answer.

Any help would be appreciated.

Thanks!

Last edited: Mar 16, 2016
2. Mar 16, 2016

### Ray Vickson

When you write P(B(n,0.25)≥0.35*n)=0.99 you are asking to be 99% sure that the student scores at least 35% (i.e., 35% or better). That was not what you were asked!

3. Mar 16, 2016

### Amcote

Sorry I made a typo just in that sentence, I meant P(B(n,0.25)≤0.35*n)=0.99.

4. Mar 16, 2016

### Ray Vickson

In the context of this particular problem, just blindly using the "1/2 correction" is a mistake. What would be correct would be to ask for the solution of
$$\Phi \left( \frac{.5 + \lfloor 0.35 n \rfloor -.25 n}{.433 \sqrt{n}} \right) = 0.99,$$
where "$\lfloor 0.35 n \rfloor$" is the largest integer $\leq 0.35 n$.

Personally, I would not try to solve that exactly as written; instead (if I were doing the question) I would drop rounding-down and the 1/2-correction altogether and just solve the resulting very simple problem. Then, if I really wanted to be sure of the solution, I would check manually one or two values of $n$ surrounding the solution, either using the exact binomial or the normal approximation with the more involved form of 1/2-correction indicated above.

Last edited: Mar 16, 2016
5. Mar 16, 2016

### Amcote

It was made clear by my instructor that we should "apply a normal approximation with a continuity correction" for this problem. But even if I ignore all that and do what you suggest (as it should work that way) I get:

$$\Phi \left( \frac{ 0.1 n }{.433 \sqrt{n}} \right) = \Phi \left( 2.33 \right) = 0.99,$$

so,

$$\frac{ 0.1 n }{.433 \sqrt{n}} = 2.33$$

with all this I get $$n= 101.786$$

and if I actually include the correction I get $$n_1=101.539$$ and $$n_2=0.2453$$ from the quadratic.

None of these are the correct answer so I'm wondering what it is I am doing wrong.

Thanks

6. Mar 16, 2016

### Ray Vickson

Further to my response in #4: for $X_n \sim \text{Binom}(n, .25)$ the probabilities $P(X_n \leq .35 n)$ are not monotone in $n$ over short intervals of $n$, so if you plot $P(X_n \leq .35 n)$ vs. $n$ you get a graph with a "sawtooth" behavior, which rises over the long run but wiggles in the short run. The normal approximation with modified 1/2- correction behaves that way too. The reason is that as $n$ increases the integer values $N_n$ in the events $\{ X_n \leq .35 n \} = \{ X_n \leq N_n \}$ are non-decreasing but sometimes remain constant for a few neighboring values of $n$. If $N_n = N_{n+1}$ the probability $P(X_k \leq N_k)$ can go down as $k$ increases from $n$ to $n+1$. That happens because the distribution of $X_{n+1}$ is shifted to the right, but is narrower than that of $X_n$, so the end result could be a decrease or an increase in the probability for "$\leq N_n$". For example, for $n$ going from 79 to 86 the values of $N_n$, P_exact = $P(\text{Binom}(n, .25n) \leq N_n)$ and P_normal = $\Phi((.5 + \lfloor .35 n \rfloor -.25 n)/\sqrt{.1875n})$ are
$$\begin{array}{cccc} n & N_n & \text{P_exact} & \text{ P_normal} \\ 79 & 27 & 0.975007 & 0.977978 \\ 80 & 28 & 0.983370 & 0.985907 \\ 81 & 28 & 0.980154 & 0.982868 \\ 82 & 28 & 0.976467 & 0.979337 \\ 83 & 29 & 0.984286 & 0.986724 \\ 84 & 29 & 0.981281 & 0.983895 \\ 85 & 29 & 0.977840 & 0.980611 \\ 86 & 30 & 0.985153 & 0.987495 \end{array}$$

Last edited: Mar 16, 2016
7. Mar 16, 2016

### Ray Vickson

Without the continuity correction: if you use z = 2.33 you get your n = 101.786, but if you use the more accurate value z = 2.326 you get n = 101.436. Ok, they are not that different, but one of them rounds to 102 while the other rounds to 101.

More seriously, though, is the non-monotone behavior of the probability, as explained in post #6. That means that you can have several nearby solutions to the required inequality $P(\text{Binom}(n,.25n) \leq .35 n) = P(\text{Binom}(n,.25n) \leq \lfloor .35 n \rfloor) \geq 0.99$. This happens in both the exact analysis and in the normal approximation (with 1/2-correction included after the rounding down operation).

By the way: your statement "First I should say I know the answer is suppose to be n≥92" is misleading: n around 92 is too small to achieve the 99% probability.

Last edited: Mar 16, 2016