# Probability problem

1. Jun 12, 2013

### aaaa202

1. The problem statement, all variables and given/known data
The first 18 pages of 269 paged book is examined for typos and 7 are found. Given this find the probability that in the entire book there are 7,8,9... typos in total.

2. Relevant equations
Probably the binomial distribution but I am very unsure.

3. The attempt at a solution
I am very unsure where to start. I might want to use the binomial distrobution somewhere, but on the other hand doesn't really fit to the problem. I am not looking for an answer, merely a hint as to where to start. Thank you :)

2. Jun 12, 2013

### LCKurtz

I would think a logical assumption would be that the errors are uniformly distributed throughout the book. And your data suggests the probability that a given page has an error is 7/18. If you have independence among pages and you let $X_i = 1$ if there is an error on page $i$ and $0$ otherwise, aren't you inquiring about $X = X_1+X_2+...X_{269}$? So...

 The more I think about it, I'm not so sure, hint not guaranteed

Last edited: Jun 12, 2013
3. Jun 12, 2013

### awkward

The Poisson distribution might work well here.

4. Jun 12, 2013

### aaaa202

If the probability that a page has an error is 7/18 what if there were 21 errors. Would it then be 21/18? That wouldn't make sense.

5. Jun 12, 2013

### haruspex

By what logic would you multiply a number of errors per page by a number of errors? Multiplying by a number of pages would be reasonable, and that would indeed give you the expected (average) number of errors in that many pages.
A precise answer is not possible because we're not told how many letters per page, so, as suggested, a Poisson distribution seems appropriate. (Poisson is an approximation to the binomial which works well when there's a large number of trials and relatively few 'successes'.)

6. Jun 13, 2013

### HallsofIvy

Staff Emeritus
If you were told that there were 21 errors in 18 pages then there would, indeed, be an average of 21/18= 7/6 errors per page. Why would that not makes sense? Because it is larger than one? Are you assuming that you cannot have more than one error per page? Why?

7. Jun 13, 2013

### aaaa202

I thought he used 21/18 as a probability. Anyways, as it turns out the correct answer is not found using the poisson distribution.
Rather use Bayes theorem:

P(m errors l 7 errors first 18 pages) = C * P(7 errors first 18 pages l m errors)

where C is a normalization constant.

8. Jun 13, 2013

### Ray Vickson

This tells you precisely nothing. If E_m = {m errors in book} and E_7,18 = {7 errors in first 18 pages} we have
$$P(E_m|E_{7,18}) = P(E_{7,18}|E_m) \frac{P(E_m)}{P(E_{7,18})}$$
so your C = to P(E_m)/P(E_7,18). Now P(E_7,18 | E_m) is computable from a model (say uniform distribution of m errors throughout the book), but we still need to know P(E_m) to get anywhere.

Last edited: Jun 13, 2013
9. Jun 13, 2013

### aaaa202

P(E_m) is just the a priori probability that the book contains m errors given no background information. We can set that to a constant.

10. Jun 13, 2013

### Ray Vickson

You seem to be under the impression that problems of this type have "right" and "wrong" answers. That is not the case.

You are arguing for a Bayesian analysis using a so-called uniform prior, but in problems like this one it is perfectly acceptable for two different people to use two different "priors", and there is not really any way to say for sure that one is right and the other is wrong. For example, an editor or a publisher may have a lot of experience regarding misprints, and might use a prior very different from the uniform one you propose. Besides that, there are the so-called "classical statisticians" who would reject the use of Bayes Theorem entirely in such problems. (Do not misinterpret what I say: of course Bayes Theorem is a true theorem in Probability Theory, but the issue is how you apply it in certain situations---or, rather, whether it applies at all in some contexts.)

Even if we accept the Bayesian viewpoint, you still need a probability model for P(E_1,17|E_m). What model would YOU use? What actual answer would you get?

11. Jun 13, 2013

### aaaa202

P(7 Errors first 18 pagesl m errors) = (18/269)^7 * (1-18/269)^(m-7) * K(m,7), where K is the binomial coefficient. I don't see what other models to use than a binomial distribution. You could then find C by computing an infinite sum, but I dont think you can evalute the sum - its quite ugly.

12. Jun 13, 2013

### haruspex

You're overlooking that pages are arbitrary boundaries here. Typos occur at a much finer granularity. E.g. if there are 2000 letters to a page then the info is effectively that there were 7 errors in 36000 letters. This low hit rate makes Poisson entirely appropriate.

13. Jun 13, 2013

### aaaa202

I think you misunderstand. The probability for a specific typo to occur in the first 18 pages is 18/269. This has nothing to do with whether my granurality is fine enough.

14. Jun 13, 2013

### Joffan

C=18/269 for a uniform prior (all total error counts equally likely).

All right, I admit, it's an empirical result from evaluating the sum in Excel. But completely consistent as I vary the parameters: sum(p(E_found|E_i)) = 1/(proportion_examined).

15. Jun 13, 2013

### haruspex

I was commenting on this statement
which I took to be a general statement about the problem, regardless of approach.