# Binomial Distribution Question

Hi, I am new here, and my name is Jonas. I'm a CS major at a university in the Northeast US. I'm a senior and wrapping up degree requirements which include a science track. I chose Chemistry because Physics was full.

The chemistry exams are multiple choice (because you couldn't grade 300 exams in a timely fashion any other way), choices A-E, and have 25 questions.

It turns out that the answers to these exams have, it seems to me, an unlikely distribution of A's, B's, C's, D's, and E's.

I was wondering how I would find out the likelihood that an exam has no more than 6 of the same choices and no less than 4.

I was thinking it'd be easier to find out 1 - (probability of 7 or greater of the same selections + probability of 3 or less of the same answers). So, to do that I just do 1 - ((1/5 * summation from k=7 to k= 25 of (25 choose k)) + (1/5 * summation from k = 0 to k = 3 of (25 choose k)).

However, it seems that this would be the likelihood that there are no more than 6, no less than 4 of one specific choice (A-E), rather than all choices. Is this correct, or am I totally off track?

Here is what made me so curious. The reason there are four forms per exam is to prevent cheating. Both the person to the right of you and the left of you will be working from different exams, as will the person in front and behind of you will. Could someone point me in the right direction here?

Thanks!

http://img600.imageshack.us/img600/1001/47634578.png [Broken]'

Here, I have an excel table with the probabilities I mentioned. The probability that 1 choice (A-E) will have 4, 5, or 6 occurrences is .546042. The probability that all would, should at best be .048543 (that's .54602 ^ 5). But this discounts the fact that there are only 25 questions, so it is impossible for all choices to have 6 occurrences. I am not sure how to rectify this, but maybe you guys will be.

http://img577.imageshack.us/img577/3518/binomialdistributions.png [Broken]

Last edited by a moderator:

Related Set Theory, Logic, Probability, Statistics News on Phys.org
chiro
Hey JonasJSchreibe.

My recommendation is you use what is called a multinomial distribution and a Pearson Chi-Square Goodness of Fit (GOF) test.

You can estimate proportions by using the sample estimates of p_i = #Number of times ith possibility occurs / # of Total Counts.

Have you use a Chi-Square GOF before?

I don't believe I have. However, I realized that the only 3 possibilities for between 4 and 6 occurrences for choices A-E with 25 trials are {5,5,5,5,5}, {5,5,5,4,6}, or {5,4,6,4,6}. Should this help me in any way?

I suppose I could do p(5,5,5,5,5) + all permutations of p(5,5,5,4,6) + all permutations of (5,4,6,4,6) all divided by 5^25. Would that yield the correct result?

chiro
If you want to see whether your observed (i.e. sample) data is different from some expected distribution, a typical way to test this is to use the Chi-Square Goodness of Fit.

What you can do is to construct a variety of these tests: one for each distribution under your criteria and see if you can reject all of them (or a significant amount).

This will basically give you a way to statistically gauge the answer to your question.

Looking at the Wikipedia page for Goodness of Fit scares me. I seem to remember a least-squares regression analysis which was used to determine causality vs correlation that looked a bit like this. I balked at it when I saw the Wikipedia page. Hopefully there is a better resource online to help me understand this or just run the calculations with marginal participation on my part. It's just something I had an interest in, not for work or school or anything, just curiosity.

EDIT: I have done multinomial distribution in a probability in computing class, and I recognize the probability mass function as something that could help me. Is there any reason to utilize more advanced methods of solving this problem?

chiro
The GOF is a lot simpler than it looks.

You basically calculate the frequency cells for the expected distribution and then use the formula to get the X^2 statistic. (In other words add the (Oi-Ei)^2/Ei terms to get X^2).

Then you compare this test statistic to a Chi-Square distribution with the right degrees of freedom (for multinomial with n choices the DF is n -1).

If P(Chi-Square > X^2) < alpha (usually 0.05) then you reject the hypothesis that the two distributions are considered to be statistically similar.

That's all there is to it.

Well, my data determines that the upper bound is .048543, I think that's correct. On two consecutive exams I can say that the upper bound is .048543^2 ~ .0025 or odds of 400:1 i.e. very unlikely. I just wanted to determine whether this freak occurrence is just that, or by design. If it is the latter, I can use this information when I'm stuck on an answer or two in the final exam to gain an edge.

chiro