Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Elementary probability problem I can't get my head round! Help!

  1. Jun 20, 2011 #1
    There is a bag in front of me. I am told it contains 100 red balls or 50 red balls and 50 blue balls. If I pick n red balls from the bag and no blue balls what is the probability as a function of n (p(n)) that the bag contains 50 red balls and 50 blue balls. Of course, if n is greater than 50 then p(n)=0, but what about for n less than or equal to 50?!
  2. jcsd
  3. Jun 20, 2011 #2
    You actually need a little more information. The following simple question should bring it into relief: what's the answer for the case n=0? You need a probability before you have started looking at the balls. (This is the basis of 'Bayesian inference' in which this probability is called a 'prior'). Without any more information, there is a natural prior; namely that either case is equally likely.

    Once you've got that, it becomes an exercise in conditional probability: what is the probability they are all red given that at least n of them are?
  4. Jun 20, 2011 #3
    Hmm, many thanks for your answer. It certainly provides some relief but also makes things a lot more complicated for me as in the problem I am working on (which is analogous to problem I have given here) it is hard to determine p(0).

    Given the bag contains 50 red balls and 50 blue balls, then probability of picking n red balls and no blue balls is very small for n close to 50. Is this sufficient to say if I picked n red balls and no blue balls then the probability that the bag contains 50 red balls and 50 blue balls is very small?

  5. Jun 20, 2011 #4
    Hi b4826161! :smile:

    Your problem is modelled by something called the hypergeometric distribution. Try to read this article http://en.wikipedia.org/wiki/Hypergeometric_distribution

    The article contains all the information you need for solving this problem. But be sure to ask if something is not clear!
  6. Jun 20, 2011 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    No, it isn't sufficient. As henry_m pointed out, you need to know (or assume) something about the probability that the bag has 50 reds and 50 blues before any balls are drawn in order to draw a conclusion about what that probability is after the results of the drawing.

    And furthermore, the probability of 50 red and 50 blue may not be small. Suppose the "bag" is some kind of cell and normal cells have 50 red things and 50 blue things in them. Suppose that among all the cells that have every been observed in labs, none has been found with 100 red things and no blue things. If a test detects 50 red things in a cell, are the other 50 likely to be blue or red? You have to postulate some probability for nature producing a cell with all 100 things red to answer this question, but it conveys the thought that a rare outcome from the test might be more likely than a mutated cell.

    Unless you are willing to postulate a prior probability, you cannot say anything about the probability that "the bag contains 50 reds and 50 blues given that 50 reds have been drawn". All you can quantify is the probability that "50 reds have been drawn given that the bag contained 50 red and 50 blue" and the probability that "50 reds have been drawn given the bag contained 100 reds".

    This is quite a common situation. When people have an idea and collect data to investigate it, they want to know "What is the probability that the idea is true given the data I collected?" Unless they are willing to use Bayesian statistics (i.e. assume a prior probability) all they can calculate is "What is the probability of the data I collected given that the idea is true?".

    When you read statements of statistical evidence, you often see statements that a certain thing is true "with 95% significance" or "at the 95% level". You may see a statement that quantity is between -10.6 and 20.3 "with 95% confidence". They may sound like they are telling you "the probability that the idea is true given the observed data". But what this type of statistics is actually doing is telling you something about the probability of the data given the assumption that certain ideas are true.

    In many practical problems, you can come up with an reasonable estimate of prior probabilities. You can also take the "Maximum Entropy" approach to prior probabilities advocated by E.T. Jaynes.
  7. Jun 22, 2011 #6
    Thank you for your replies.

    @micro mass: Thank you for the link. I see how it is relevant however, since I do not know the specific number of red and blue balls in the bag I don't see how I can use this distribution.

    @Stephen Tashi: Thank you for your illustrations. I got my head round what henry_m was saying last night and you have helped enforce it.

    Suppose I worked out the probabilities for n=0. How would I go about working out the probability of the bag containing 50 red balls and 50 red balls given I had chosen n red balls in a row? I see how its an exercise in conditional probability but I really am very poor at probability in general!

  8. Jun 22, 2011 #7
    Would the answer goes as follows:

    Let A be the event of the bag containing 50 Red balls, 50 Blue balls and let P(A) = p

    Let N be the event of picking N red balls and no blue balls.

    Suppose I have picked n red balls. Then P(N) = 1.

    Hence P(A|N) = P(N|A)P(A)/P(N) = p x P(N|A). Since we can calculate P(N|A) explicitly as f(N) a function of n then we have:

    P(A|N) = p x f(N)
  9. Jun 22, 2011 #8

    Stephen Tashi

    User Avatar
    Science Advisor

    Based on your previous posts, I'll assume that the alternative possibility is that the bag has 100 Red balls and that this has probability 1-p. The event "not-A" will be the event that the bag has 100 red balls.

    (We have to be careful to distinguish betwen upper case P and lower case p.)

    With this notation:
    P(N|A) = the probability the first N red balls are red given the bag has 50 red and 50 blue
    P(A|N) = the probability the bag has 50 red and 50 blue given that the first N balls drawn are red.

    I would compute P(A|N) this way:

    P(A|N) = P(A and N)/ P(N) = P(N|A)p(A)/P(N) = (P(N|A) p) / P(N)
    = (P(N|A) p) / ( P(N and A) + P( N and not-A)) )
    = (P(N|A) p) / ( P(N|A)P(A) + P(N|not-A)P(not-A) )
    = (P(N|A) p) / ( P(N|A) p + P(N|not_A)(1-p))

    for 0 < N <101, P(N| not_A) = 1

    = (P(N|A) p) / (P(N|A) p + (1)(1-p) )

    If you say P(N|A) = f(N) then we get
    = ( f(N) p) / ( f(N) p + 1-p )

    For N > 50, f(N) = 0 and p(A|N) = 0
  10. Jun 22, 2011 #9
    Very well explained. Many thanks!

    Out of interest, why doesn't my method work?

    Cheers Tashi, you're a star!
  11. Jun 22, 2011 #10

    Stephen Tashi

    User Avatar
    Science Advisor

    Your method brings up a tangle of semantics when you say P(N) = 1. If a random variable does have a certain outcome in one sample, this does not mean that the probability of the outcome is 1 if we are talking about "the space of all possible outcomes". On the other hand, if we are talking about a restricted set of events where N always happens then the event N ceases to be probabilistic. In that context , you can say that P(N) = 1.

    You can't mix up the two contexts. If you are in the space of events where N always happens then P(X|N) = P(X) since P(X) refers to the probability of X given the "background" information for the problem and this would include the information that N always happens. And if N always happens then P(N|X)= 1.
  12. Jun 22, 2011 #11

    Stephen Tashi

    User Avatar
    Science Advisor

    ...to add to that:

    If we were in a space of events where the first 50 balls drawn are always red then we couldn't have a bag with 50 red and 50 blue.
  13. Jun 23, 2011 #12
    Thanks. Very well explained.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook