Probability Problem: Find Total # of Typos in 269-Page Book

  • Thread starter Thread starter aaaa202
  • Start date Start date
  • Tags Tags
    Probability
AI Thread Summary
The discussion centers on calculating the probability of finding a specific number of typos in a 269-page book based on the discovery of 7 typos in the first 18 pages. Participants suggest using the binomial distribution but express uncertainty about its applicability, leaning towards the Poisson distribution as a more suitable model due to the low frequency of errors relative to the total number of pages. The conversation shifts to Bayesian analysis, with emphasis on the need for a prior probability model to effectively apply Bayes' theorem. There is debate over the appropriateness of different statistical approaches, including the implications of assuming uniform distribution of errors. Ultimately, the complexity of the problem highlights the challenges in determining an accurate probability model without additional information.
aaaa202
Messages
1,144
Reaction score
2

Homework Statement


The first 18 pages of 269 paged book is examined for typos and 7 are found. Given this find the probability that in the entire book there are 7,8,9... typos in total.

Homework Equations


Probably the binomial distribution but I am very unsure.


The Attempt at a Solution


I am very unsure where to start. I might want to use the binomial distrobution somewhere, but on the other hand doesn't really fit to the problem. I am not looking for an answer, merely a hint as to where to start. Thank you :)
 
Physics news on Phys.org
aaaa202 said:

Homework Statement


The first 18 pages of 269 paged book is examined for typos and 7 are found. Given this find the probability that in the entire book there are 7,8,9... typos in total.

Homework Equations


Probably the binomial distribution but I am very unsure.

The Attempt at a Solution


I am very unsure where to start. I might want to use the binomial distrobution somewhere, but on the other hand doesn't really fit to the problem. I am not looking for an answer, merely a hint as to where to start. Thank you :)

I would think a logical assumption would be that the errors are uniformly distributed throughout the book. And your data suggests the probability that a given page has an error is 7/18. If you have independence among pages and you let ##X_i = 1## if there is an error on page ##i## and ##0## otherwise, aren't you inquiring about ##X = X_1+X_2+...X_{269}##? So...

[Edit] The more I think about it, I'm not so sure, hint not guaranteed :frown:
 
Last edited:
The Poisson distribution might work well here.
 
If the probability that a page has an error is 7/18 what if there were 21 errors. Would it then be 21/18? That wouldn't make sense.
 
aaaa202 said:
If the probability that a page has an error is 7/18 what if there were 21 errors. Would it then be 21/18? That wouldn't make sense.
By what logic would you multiply a number of errors per page by a number of errors? Multiplying by a number of pages would be reasonable, and that would indeed give you the expected (average) number of errors in that many pages.
A precise answer is not possible because we're not told how many letters per page, so, as suggested, a Poisson distribution seems appropriate. (Poisson is an approximation to the binomial which works well when there's a large number of trials and relatively few 'successes'.)
 
aaaa202 said:
If the probability that a page has an error is 7/18 what if there were 21 errors. Would it then be 21/18? That wouldn't make sense.
If you were told that there were 21 errors in 18 pages then there would, indeed, be an average of 21/18= 7/6 errors per page. Why would that not makes sense? Because it is larger than one? Are you assuming that you cannot have more than one error per page? Why?
 
I thought he used 21/18 as a probability. Anyways, as it turns out the correct answer is not found using the poisson distribution.
Rather use Bayes theorem:

P(m errors l 7 errors first 18 pages) = C * P(7 errors first 18 pages l m errors)

where C is a normalization constant.
 
aaaa202 said:
I thought he used 21/18 as a probability. Anyways, as it turns out the correct answer is not found using the poisson distribution.
Rather use Bayes theorem:

P(m errors l 7 errors first 18 pages) = C * P(7 errors first 18 pages l m errors)

where C is a normalization constant.



This tells you precisely nothing. If E_m = {m errors in book} and E_7,18 = {7 errors in first 18 pages} we have
P(E_m|E_{7,18}) = P(E_{7,18}|E_m) \frac{P(E_m)}{P(E_{7,18})}
so your C = to P(E_m)/P(E_7,18). Now P(E_7,18 | E_m) is computable from a model (say uniform distribution of m errors throughout the book), but we still need to know P(E_m) to get anywhere.
 
Last edited:
P(E_m) is just the a priori probability that the book contains m errors given no background information. We can set that to a constant.
 
  • #10
aaaa202 said:
P(E_m) is just the a priori probability that the book contains m errors given no background information. We can set that to a constant.

You seem to be under the impression that problems of this type have "right" and "wrong" answers. That is not the case.

You are arguing for a Bayesian analysis using a so-called uniform prior, but in problems like this one it is perfectly acceptable for two different people to use two different "priors", and there is not really any way to say for sure that one is right and the other is wrong. For example, an editor or a publisher may have a lot of experience regarding misprints, and might use a prior very different from the uniform one you propose. Besides that, there are the so-called "classical statisticians" who would reject the use of Bayes Theorem entirely in such problems. (Do not misinterpret what I say: of course Bayes Theorem is a true theorem in Probability Theory, but the issue is how you apply it in certain situations---or, rather, whether it applies at all in some contexts.)

Even if we accept the Bayesian viewpoint, you still need a probability model for P(E_1,17|E_m). What model would YOU use? What actual answer would you get?
 
  • #11
P(7 Errors first 18 pagesl m errors) = (18/269)^7 * (1-18/269)^(m-7) * K(m,7), where K is the binomial coefficient. I don't see what other models to use than a binomial distribution. You could then find C by computing an infinite sum, but I don't think you can evalute the sum - its quite ugly.
 
  • #12
aaaa202 said:
P(7 Errors first 18 pagesl m errors) = (18/269)^7 * (1-18/269)^(m-7) * K(m,7), where K is the binomial coefficient. I don't see what other models to use than a binomial distribution. You could then find C by computing an infinite sum, but I don't think you can evalute the sum - its quite ugly.
You're overlooking that pages are arbitrary boundaries here. Typos occur at a much finer granularity. E.g. if there are 2000 letters to a page then the info is effectively that there were 7 errors in 36000 letters. This low hit rate makes Poisson entirely appropriate.
 
  • #13
I think you misunderstand. The probability for a specific typo to occur in the first 18 pages is 18/269. This has nothing to do with whether my granurality is fine enough.
 
  • #14
aaaa202 said:
P(7 Errors first 18 pagesl m errors) = (18/269)^7 * (1-18/269)^(m-7) * K(m,7), where K is the binomial coefficient. I don't see what other models to use than a binomial distribution. You could then find C by computing an infinite sum, but I don't think you can evalute the sum - its quite ugly.

C=18/269 for a uniform prior (all total error counts equally likely).

All right, I admit, it's an empirical result from evaluating the sum in Excel. But completely consistent as I vary the parameters: sum(p(E_found|E_i)) = 1/(proportion_examined).
 
  • #15
aaaa202 said:
I think you misunderstand. The probability for a specific typo to occur in the first 18 pages is 18/269. This has nothing to do with whether my granurality is fine enough.
I was commenting on this statement
I don't see what other models to use than a binomial distribution.
which I took to be a general statement about the problem, regardless of approach.
 
Back
Top