Using a recursive algorithm to find the value of a game

Click For Summary
The discussion centers on calculating the expected value of a game involving drawing balls from a box, where players can stop at any time. The recursive formula for the expected value when forced to play until all balls are drawn is established as V(b,r) = max((b/(b+r))(1 + V(b-1, r)) + (r/(b+r))(-1.25 + V(b, r-1)), 0). The expected value is approximately 0.4583 when all balls must be drawn, but confusion arises regarding the expected value when players can stop, which is suggested to be 1/3. However, it is clarified that the expected value is actually greater than 1/3, as players can optimize their strategy by stopping before all draws are made, leading to a higher expected payoff. The conversation concludes with the assertion that the value of the game is indeed more than 1/3, emphasizing the advantage of stopping early in certain scenarios.
fignewtons
Messages
28
Reaction score
0

Homework Statement


Imagine you are playing a game with me, of drawing balls from a box. There are two blue balls and two red balls. They are picked with equal probability, and are drawn without replacement. If you draw a blue ball, I give you $1. If you draw a red ball, you pay me $1.25. What is the expected value of this game to you if you are allowed to stop at any point?

Homework Equations


I drew a lattice diagram that shows the value of the game in all the possible states of the balls.
(b,r) means that at this point, there are b blue balls and r red balls.

That is, something like this:
FullSizeRender.jpg

The boxed values indicate the value at each point in time, calculated recursively.

The Attempt at a Solution


I was able to figure out the value of the game if you were required to play until the box was completely empty using a recursive method.
the expected value at any point (b,r), V(b,r) = max((b/(b+r))(1 + V(b-1, r)) + (r/(b+r))(-1.25 + V(b, r-1)),0)
where V(b,r) is the method of value of the game at the situation with b blue balls and r red balls.
So V(2,2) is the value at the beginning of the complete game, so it was 0.4583 approximately if we were required to play until all the balls had been drawn.

I'm not sure how to compute the value of the game to you if you have the option of stopping whenever you want. I was told it was (1/3) but I don't know how this number comes about. Can someone explain how the way you value the game given this option is different from the one above, and how it is calculated? Thanks.
 
Physics news on Phys.org
Your tree is a 're-combining tree' in which a node can have more than one edge coming in from the left. This problem is easier to analyse with a non-recombining tree in which each node has either one or two edges coming out of it to the right, but only ever one edge coming in from the left.

Let each node be a draw of a ball, except for the first node (root node), which is the starting position. At each node write five things (give yourself plenty of space on the paper to do this):
  • B or R, being the colour drawn at that node (this is empty for the root node)
  • a pair (b,r) being the number of blues and reds remaining after that draw (this is (2,2) for the root node)
  • the payoff arising from that draw, which is 1 for B and -1.25 for R (this is empty for the root node)
  • the cumulative payoff (this is 0 for the root node)
  • the probability of reaching that node
The leaf nodes are the nodes at which all blue balls have been drawn, ie the second datum is (0,2), (0,1) or (0,0). 'Leaf node' means a node from which you draw no outgoing lines to the right. The reason we stop when all blues have been drawn is that any further draws (if any are possible) can only give losses.

For each leaf node, multiply the cumulative payoff by the probability (ie the last two items in the above list). Add that across leaves, and there's your expected value.

I only had fifteen nodes in my final tree, and the expected value was indeed 1/3.
 
  • Like
Likes scottdave
figNewtons said:

Homework Statement


Imagine you are playing a game with me, of drawing balls from a box. There are two blue balls and two red balls. They are picked with equal probability, and are drawn without replacement. If you draw a blue ball, I give you $1. If you draw a red ball, you pay me $1.25. What is the expected value of this game to you if you are allowed to stop at any point?

Homework Equations


I drew a lattice diagram that shows the value of the game in all the possible states of the balls.
(b,r) means that at this point, there are b blue balls and r red balls.

That is, something like this:
View attachment 206255
The boxed values indicate the value at each point in time, calculated recursively.

The Attempt at a Solution


I was able to figure out the value of the game if you were required to play until the box was completely empty using a recursive method.
the expected value at any point (b,r), V(b,r) = max((b/(b+r))(1 + V(b-1, r)) + (r/(b+r))(-1.25 + V(b, r-1)),0)
where V(b,r) is the method of value of the game at the situation with b blue balls and r red balls.
So V(2,2) is the value at the beginning of the complete game, so it was 0.4583 approximately if we were required to play until all the balls had been drawn.

I'm not sure how to compute the value of the game to you if you have the option of stopping whenever you want. I was told it was (1/3) but I don't know how this number comes about. Can someone explain how the way you value the game given this option is different from the one above, and how it is calculated? Thanks.

If you are allowed to stop or not stop at any time, the expected payoff cannot be worse than if you are forced to play to the end! The reason is that if you have more options available you cannot do worse. So, the maximum expected payoff when you can stop or not as you see fit must be at least as large as your 0.4583 figure, and this is larger than 1/3.

Using your recursive scheme
$$V(b,r) = \max \left(0,\frac{b}{b+r} [1 + V(b-1,r)] + \frac{r}{b+r} [-(5/4) + V(b, r-1)] \right) , $$
with the boundary condition that ##V(b,0) = b ## (just pick all ##b## remaining balls and get +$1 each) and ##V(0,r) = 0## (stop picking balls). Doing this I get ##V(2,2) = 31/48 \doteq 0.64583##.
 
Last edited:
Ray Vickson said:
Using your recursive scheme
$$V(b,r) = \max \left(0,\frac{b}{b+r} [1 + V(b-1,r)] + \frac{r}{b+r} [-(5/4) + V(b, r-1)] \right) , $$
with the boundary condition that ##V(b,0) = b ## (just pick all ##b## remaining balls and get +$1 each) and ##V(0,r) = 0## (stop picking balls). Doing this I get ##V(2,2) = 31/48 \doteq 0.64583##.

I also get 0.4583 = 11/24, so I think you must have made a mistake somewhere.
V(1,1) = 1/2 * (V(0,1)+1) + 1/2 (V(1,0) - 5/4) = 1/2 + 1/2(1 - 5/4) = 3/8
V(2,1) = 2/3(V(1,1) + 1) + 1/3(V(2,0)-5/4) = 2/3(3/8 + 1) + 1/3(2-5/4) = 7/6
V(1,2) = 1/3(V(0,2) + 1_ + 2/3(V(1,1) -5/4) = 1/3(0+1) + 2/3(3/8 - 5/4) = 1/3 - 14/24 .it will become negative, so V(1,2) = 0.
V(2,2) = 1/2(V(1,2) + 1) + 1/2 (V(2,1) -5/4) = 1/2(0+1) + 1/2 (7/6 - 5/4) = 1/2 - 1/2(1/12) = 11/24
 

Attachments

  • upload_2017-6-29_11-27-32.png
    upload_2017-6-29_11-27-32.png
    12.7 KB · Views: 485
willem2 said:
I also get 0.4583 = 11/24, so I think you must have made a mistake somewhere.
V(1,1) = 1/2 * (V(0,1)+1) + 1/2 (V(1,0) - 5/4) = 1/2 + 1/2(1 - 5/4) = 3/8
V(2,1) = 2/3(V(1,1) + 1) + 1/3(V(2,0)-5/4) = 2/3(3/8 + 1) + 1/3(2-5/4) = 7/6
V(1,2) = 1/3(V(0,2) + 1_ + 2/3(V(1,1) -5/4) = 1/3(0+1) + 2/3(3/8 - 5/4) = 1/3 - 14/24 .it will become negative, so V(1,2) = 0.
V(2,2) = 1/2(V(1,2) + 1) + 1/2 (V(2,1) -5/4) = 1/2(0+1) + 1/2 (7/6 - 5/4) = 1/2 - 1/2(1/12) = 11/24
Sorry: you are correct; I mixed up the 1/3 and 2/3 when computing V(1,2).
Anyway, the main point is that the OP has already taken care of the possibility of stopping, because that is built into his recursion. Also: the answer is definitely > 1/3.
 
The attached diagram demonstrates the basis on which I calculated the answer to be 1/3. The payment to the player at each node is s. The probability of reaching a node is P. The leaves are shaded green. We add up Σs across those leaves, multiplying by the probabilities P, which are all 1/6, and we get 1/3.

EDIT: HOWEVER, I'm now convinced the value is what Ray says. See below.
 

Attachments

  • ball draw diagram 2.png
    ball draw diagram 2.png
    22 KB · Views: 497
Last edited:
I've just realized that the value is indeed more than 1/3.
The algorithm that gives 1/3 only stops when there are no more blues, on the simple expectation that one can only lose from then on by further draws.

However, it can be advantageous to stop before then, even if there are remaining blues, because the expectation of payoffs from continued draws may be less than the expectation of stopping, which is 0.

That will be the case when there are 1 blue and 2 reds remaining, because the expected payoff from continuing to draw until the blue is drawn is negative.

In the diagram, that's the top node in the second column labelled

Blue
(1,2)
P=0.5
s=1
Σs=1

If we remove the descendant nodes of that node, making it a leaf, the expectation over the leaves becomes ##0.458\dot3##

If anybody's interested, here's R code for calculating V for any combination of reds and blues. It gives ##0.458\dot3## for ##V(2,2)##.

Code:
V <- function (b,r){print(paste('b=',b,'r=',r))
  if (r==0)
    b else
      if (b==0)
        0 else
        max(0, b* (1 + V(b-1,r)) + r*(-(5/4) + V(b, r-1)) )/(b+r)
}
V(2,2)
and attached is a revised diagram.
 

Attachments

  • ball draw diagram 2a.png
    ball draw diagram 2a.png
    24.4 KB · Views: 505
Last edited:
andrewkirk said:
The attachment process didn't cooperate so it didn't appear when I expected to. It should be visible now though.
You do have an extraneous arrow.

After initially picking two Reds, you have two arrows from that; one to a Blue (fine), one to a Red (not possible).

Your numbers are fine.
 
andrewkirk said:
I've just realized that the value is indeed more than 1/3.
The algorithm that gives 1/3 only stops when there are no more blues, on the simple expectation that one can only lose from then on by further draws.

However, it can be advantageous to stop before then, even if there are remaining blues, because the expectation of payoffs from continued draws may be less than the expectation of stopping, which is 0.

That will be the case when there are 1 blue and 2 reds remaining, because the expected payoff from continuing to draw until the blue is drawn is negative.

In the diagram, that's the top node in the second column labelled

Blue
(1,2)
P=0.5
s=1
Σs=1

If we remove the descendant nodes of that node, making it a leaf, the expectation over the leaves becomes ##0.458\dot3##

If anybody's interested, here's R code for calculating V for any combination of reds and blues. It gives ##0.458\dot3## for ##V(2,2)##.

Code:
V <- function (b,r){print(paste('b=',b,'r=',r))
  if (r==0)
    b else
      if (b==0)
        0 else
        max(0, b* (1 + V(b-1,r)) + r*(-(5/4) + V(b, r-1)) )/(b+r)
}
V(2,2)
and attached is a revised diagram.

A corresponding Maple procedure is:

V:=proc(a::nonnegint,b::nonnegint)
> option remember;
> if a = 0 then 0
> elif b = 0 then a
> else max(0,a/(a+b)*(1+V(a-1,b))+b/(a+b)*(-5/4+V(a,b-1)))
> end if;
> end proc;

This gives V(2,2) = 11/24 = 0.46786 and V(5,3) = 449/224 = 2.00446, etc.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 29 ·
Replies
29
Views
2K
  • · Replies 22 ·
Replies
22
Views
6K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 13 ·
Replies
13
Views
1K
  • · Replies 2 ·
Replies
2
Views
1K