1. PF Contest - Win "Conquering the Physics GRE" book! Click Here to Enter
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Using a recursive algorithm to find the value of a game

  1. Jun 28, 2017 #1
    1. The problem statement, all variables and given/known data
    Imagine you are playing a game with me, of drawing balls from a box. There are two blue balls and two red balls. They are picked with equal probability, and are drawn without replacement. If you draw a blue ball, I give you $1. If you draw a red ball, you pay me $1.25. What is the expected value of this game to you if you are allowed to stop at any point?

    2. Relevant equations
    I drew a lattice diagram that shows the value of the game in all the possible states of the balls.
    (b,r) means that at this point, there are b blue balls and r red balls.

    That is, something like this:
    The boxed values indicate the value at each point in time, calculated recursively.

    3. The attempt at a solution
    I was able to figure out the value of the game if you were required to play until the box was completely empty using a recursive method.
    the expected value at any point (b,r), V(b,r) = max((b/(b+r))(1 + V(b-1, r)) + (r/(b+r))(-1.25 + V(b, r-1)),0)
    where V(b,r) is the method of value of the game at the situation with b blue balls and r red balls.
    So V(2,2) is the value at the beginning of the complete game, so it was 0.4583 approximately if we were required to play until all the balls had been drawn.

    I'm not sure how to compute the value of the game to you if you have the option of stopping whenever you want. I was told it was (1/3) but I don't know how this number comes about. Can someone explain how the way you value the game given this option is different from the one above, and how it is calculated? Thanks.
  2. jcsd
  3. Jun 28, 2017 #2


    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    Your tree is a 're-combining tree' in which a node can have more than one edge coming in from the left. This problem is easier to analyse with a non-recombining tree in which each node has either one or two edges coming out of it to the right, but only ever one edge coming in from the left.

    Let each node be a draw of a ball, except for the first node (root node), which is the starting position. At each node write five things (give yourself plenty of space on the paper to do this):
    • B or R, being the colour drawn at that node (this is empty for the root node)
    • a pair (b,r) being the number of blues and reds remaining after that draw (this is (2,2) for the root node)
    • the payoff arising from that draw, which is 1 for B and -1.25 for R (this is empty for the root node)
    • the cumulative payoff (this is 0 for the root node)
    • the probability of reaching that node
    The leaf nodes are the nodes at which all blue balls have been drawn, ie the second datum is (0,2), (0,1) or (0,0). 'Leaf node' means a node from which you draw no outgoing lines to the right. The reason we stop when all blues have been drawn is that any further draws (if any are possible) can only give losses.

    For each leaf node, multiply the cumulative payoff by the probability (ie the last two items in the above list). Add that across leaves, and there's your expected value.

    I only had fifteen nodes in my final tree, and the expected value was indeed 1/3.
  4. Jun 29, 2017 #3

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    If you are allowed to stop or not stop at any time, the expected payoff cannot be worse than if you are forced to play to the end! The reason is that if you have more options available you cannot do worse. So, the maximum expected payoff when you can stop or not as you see fit must be at least as large as your 0.4583 figure, and this is larger than 1/3.

    Using your recursive scheme
    $$V(b,r) = \max \left(0,\frac{b}{b+r} [1 + V(b-1,r)] + \frac{r}{b+r} [-(5/4) + V(b, r-1)] \right) , $$
    with the boundary condition that ##V(b,0) = b ## (just pick all ##b## remaining balls and get +$1 each) and ##V(0,r) = 0## (stop picking balls). Doing this I get ##V(2,2) = 31/48 \doteq 0.64583##.
    Last edited: Jun 29, 2017
  5. Jun 29, 2017 #4
    I also get 0.4583 = 11/24, so I think you must have made a mistake somewhere.
    V(1,1) = 1/2 * (V(0,1)+1) + 1/2 (V(1,0) - 5/4) = 1/2 + 1/2(1 - 5/4) = 3/8
    V(2,1) = 2/3(V(1,1) + 1) + 1/3(V(2,0)-5/4) = 2/3(3/8 + 1) + 1/3(2-5/4) = 7/6
    V(1,2) = 1/3(V(0,2) + 1_ + 2/3(V(1,1) -5/4) = 1/3(0+1) + 2/3(3/8 - 5/4) = 1/3 - 14/24 .it will become negative, so V(1,2) = 0.
    V(2,2) = 1/2(V(1,2) + 1) + 1/2 (V(2,1) -5/4) = 1/2(0+1) + 1/2 (7/6 - 5/4) = 1/2 - 1/2(1/12) = 11/24

    Attached Files:

  6. Jun 29, 2017 #5

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    Sorry: you are correct; I mixed up the 1/3 and 2/3 when computing V(1,2).
    Anyway, the main point is that the OP has already taken care of the possibility of stopping, because that is built in to his recursion. Also: the answer is definitely > 1/3.
  7. Jun 29, 2017 #6


    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    The attached diagram demonstrates the basis on which I calculated the answer to be 1/3. The payment to the player at each node is s. The probability of reaching a node is P. The leaves are shaded green. We add up Σs across those leaves, multiplying by the probabilities P, which are all 1/6, and we get 1/3.

    EDIT: HOWEVER, I'm now convinced the value is what Ray says. See below.

    Attached Files:

    Last edited: Jun 29, 2017
  8. Jun 29, 2017 #7


    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    I've just realised that the value is indeed more than 1/3.
    The algorithm that gives 1/3 only stops when there are no more blues, on the simple expectation that one can only lose from then on by further draws.

    However, it can be advantageous to stop before then, even if there are remaining blues, because the expectation of payoffs from continued draws may be less than the expectation of stopping, which is 0.

    That will be the case when there are 1 blue and 2 reds remaining, because the expected payoff from continuing to draw until the blue is drawn is negative.

    In the diagram, that's the top node in the second column labelled


    If we remove the descendant nodes of that node, making it a leaf, the expectation over the leaves becomes ##0.458\dot3##

    If anybody's interested, here's R code for calculating V for any combination of reds and blues. It gives ##0.458\dot3## for ##V(2,2)##.

    Code (Text):

    V <- function (b,r){print(paste('b=',b,'r=',r))
      if (r==0)
        b else
          if (b==0)
            0 else
            max(0, b* (1 + V(b-1,r)) + r*(-(5/4) + V(b, r-1)) )/(b+r)
    and attached is a revised diagram.

    Attached Files:

    Last edited: Jun 29, 2017
  9. Jun 29, 2017 #8


    User Avatar
    Staff Emeritus
    Science Advisor
    Homework Helper
    Gold Member

    You do have an extraneous arrow.

    After initially picking two Reds, you have two arrows from that; one to a Blue (fine), one to a Red (not possible).

    Your numbers are fine.
  10. Jun 30, 2017 #9

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    A corresponding Maple procedure is:

    > option remember;
    > if a = 0 then 0
    > elif b = 0 then a
    > else max(0,a/(a+b)*(1+V(a-1,b))+b/(a+b)*(-5/4+V(a,b-1)))
    > end if;
    > end proc;

    This gives V(2,2) = 11/24 = 0.46786 and V(5,3) = 449/224 = 2.00446, etc.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted