Using a recursive algorithm to find the value of a game

fignewtons · Jun 28, 2017

Homework Statement

Imagine you are playing a game with me, of drawing balls from a box. There are two blue balls and two red balls. They are picked with equal probability, and are drawn without replacement. If you draw a blue ball, I give you $1. If you draw a red ball, you pay me $1.25. What is the expected value of this game to you if you are allowed to stop at any point?

Homework Equations

I drew a lattice diagram that shows the value of the game in all the possible states of the balls.
(b,r) means that at this point, there are b blue balls and r red balls.

That is, something like this:

The boxed values indicate the value at each point in time, calculated recursively.

The Attempt at a Solution

I was able to figure out the value of the game if you were required to play until the box was completely empty using a recursive method.
the expected value at any point (b,r), V(b,r) = max((b/(b+r))(1 + V(b-1, r)) + (r/(b+r))(-1.25 + V(b, r-1)),0)
where V(b,r) is the method of value of the game at the situation with b blue balls and r red balls.
So V(2,2) is the value at the beginning of the complete game, so it was 0.4583 approximately if we were required to play until all the balls had been drawn.

I'm not sure how to compute the value of the game to you if you have the option of stopping whenever you want. I was told it was (1/3) but I don't know how this number comes about. Can someone explain how the way you value the game given this option is different from the one above, and how it is calculated? Thanks.

andrewkirk · Jun 28, 2017

Your tree is a 're-combining tree' in which a node can have more than one edge coming in from the left. This problem is easier to analyse with a non-recombining tree in which each node has either one or two edges coming out of it to the right, but only ever one edge coming in from the left.

Let each node be a draw of a ball, except for the first node (root node), which is the starting position. At each node write five things (give yourself plenty of space on the paper to do this):

B or R, being the colour drawn at that node (this is empty for the root node)
a pair (b,r) being the number of blues and reds remaining after that draw (this is (2,2) for the root node)
the payoff arising from that draw, which is 1 for B and -1.25 for R (this is empty for the root node)
the cumulative payoff (this is 0 for the root node)
the probability of reaching that node

The leaf nodes are the nodes at which all blue balls have been drawn, ie the second datum is (0,2), (0,1) or (0,0). 'Leaf node' means a node from which you draw no outgoing lines to the right. The reason we stop when all blues have been drawn is that any further draws (if any are possible) can only give losses.

For each leaf node, multiply the cumulative payoff by the probability (ie the last two items in the above list). Add that across leaves, and there's your expected value.

I only had fifteen nodes in my final tree, and the expected value was indeed 1/3.

Ray Vickson · Jun 29, 2017

figNewtons said:

Homework Statement

Imagine you are playing a game with me, of drawing balls from a box. There are two blue balls and two red balls. They are picked with equal probability, and are drawn without replacement. If you draw a blue ball, I give you $1. If you draw a red ball, you pay me $1.25. What is the expected value of this game to you if you are allowed to stop at any point?

Homework Equations

I drew a lattice diagram that shows the value of the game in all the possible states of the balls.
(b,r) means that at this point, there are b blue balls and r red balls.

That is, something like this:
View attachment 206255
The boxed values indicate the value at each point in time, calculated recursively.

The Attempt at a Solution

I was able to figure out the value of the game if you were required to play until the box was completely empty using a recursive method.
the expected value at any point (b,r), V(b,r) = max((b/(b+r))(1 + V(b-1, r)) + (r/(b+r))(-1.25 + V(b, r-1)),0)
where V(b,r) is the method of value of the game at the situation with b blue balls and r red balls.
So V(2,2) is the value at the beginning of the complete game, so it was 0.4583 approximately if we were required to play until all the balls had been drawn.

I'm not sure how to compute the value of the game to you if you have the option of stopping whenever you want. I was told it was (1/3) but I don't know how this number comes about. Can someone explain how the way you value the game given this option is different from the one above, and how it is calculated? Thanks.

If you are allowed to stop or not stop at any time, the expected payoff cannot be worse than if you are forced to play to the end! The reason is that if you have more options available you cannot do worse. So, the maximum expected payoff when you can stop or not as you see fit must be at least as large as your 0.4583 figure, and this is larger than 1/3.

Using your recursive scheme
$$V(b,r) = \max \left(0,\frac{b}{b+r} [1 + V(b-1,r)] + \frac{r}{b+r} [-(5/4) + V(b, r-1)] \right) , $$
with the boundary condition that ##V(b,0) = b ## (just pick all ##b## remaining balls and get +$1 each) and ##V(0,r) = 0## (stop picking balls). Doing this I get ##V(2,2) = 31/48 \doteq 0.64583##.

willem2 · Jun 29, 2017

Ray Vickson said:

Using your recursive scheme
$$V(b,r) = \max \left(0,\frac{b}{b+r} [1 + V(b-1,r)] + \frac{r}{b+r} [-(5/4) + V(b, r-1)] \right) , $$
with the boundary condition that ##V(b,0) = b ## (just pick all ##b## remaining balls and get +$1 each) and ##V(0,r) = 0## (stop picking balls). Doing this I get ##V(2,2) = 31/48 \doteq 0.64583##.

I also get 0.4583 = 11/24, so I think you must have made a mistake somewhere.
V(1,1) = 1/2 * (V(0,1)+1) + 1/2 (V(1,0) - 5/4) = 1/2 + 1/2(1 - 5/4) = 3/8
V(2,1) = 2/3(V(1,1) + 1) + 1/3(V(2,0)-5/4) = 2/3(3/8 + 1) + 1/3(2-5/4) = 7/6
V(1,2) = 1/3(V(0,2) + 1_ + 2/3(V(1,1) -5/4) = 1/3(0+1) + 2/3(3/8 - 5/4) = 1/3 - 14/24 .it will become negative, so V(1,2) = 0.
V(2,2) = 1/2(V(1,2) + 1) + 1/2 (V(2,1) -5/4) = 1/2(0+1) + 1/2 (7/6 - 5/4) = 1/2 - 1/2(1/12) = 11/24

Ray Vickson · Jun 29, 2017

willem2 said:

I also get 0.4583 = 11/24, so I think you must have made a mistake somewhere.
V(1,1) = 1/2 * (V(0,1)+1) + 1/2 (V(1,0) - 5/4) = 1/2 + 1/2(1 - 5/4) = 3/8
V(2,1) = 2/3(V(1,1) + 1) + 1/3(V(2,0)-5/4) = 2/3(3/8 + 1) + 1/3(2-5/4) = 7/6
V(1,2) = 1/3(V(0,2) + 1_ + 2/3(V(1,1) -5/4) = 1/3(0+1) + 2/3(3/8 - 5/4) = 1/3 - 14/24 .it will become negative, so V(1,2) = 0.
V(2,2) = 1/2(V(1,2) + 1) + 1/2 (V(2,1) -5/4) = 1/2(0+1) + 1/2 (7/6 - 5/4) = 1/2 - 1/2(1/12) = 11/24

Sorry: you are correct; I mixed up the 1/3 and 2/3 when computing V(1,2).
Anyway, the main point is that the OP has already taken care of the possibility of stopping, because that is built into his recursion. Also: the answer is definitely > 1/3.

andrewkirk · Jun 29, 2017

The attached diagram demonstrates the basis on which I calculated the answer to be 1/3. The payment to the player at each node is s. The probability of reaching a node is P. The leaves are shaded green. We add up Σs across those leaves, multiplying by the probabilities P, which are all 1/6, and we get 1/3.

EDIT: HOWEVER, I'm now convinced the value is what Ray says. See below.

andrewkirk · Jun 29, 2017

I've just realized that the value is indeed more than 1/3.
The algorithm that gives 1/3 only stops when there are no more blues, on the simple expectation that one can only lose from then on by further draws.

However, it can be advantageous to stop before then, even if there are remaining blues, because the expectation of payoffs from continued draws may be less than the expectation of stopping, which is 0.

That will be the case when there are 1 blue and 2 reds remaining, because the expected payoff from continuing to draw until the blue is drawn is negative.

In the diagram, that's the top node in the second column labelled

Blue
(1,2)
P=0.5
s=1
Σs=1

If we remove the descendant nodes of that node, making it a leaf, the expectation over the leaves becomes ##0.458\dot3##

If anybody's interested, here's R code for calculating V for any combination of reds and blues. It gives ##0.458\dot3## for ##V(2,2)##.

Code:

V <- function (b,r){print(paste('b=',b,'r=',r))
  if (r==0)
    b else
      if (b==0)
        0 else
        max(0, b* (1 + V(b-1,r)) + r*(-(5/4) + V(b, r-1)) )/(b+r)
}
V(2,2)

and attached is a revised diagram.

SammyS · Jun 29, 2017

andrewkirk said:

The attachment process didn't cooperate so it didn't appear when I expected to. It should be visible now though.

You do have an extraneous arrow.

After initially picking two Reds, you have two arrows from that; one to a Blue (fine), one to a Red (not possible).

Your numbers are fine.

Ray Vickson · Jun 30, 2017

andrewkirk said:
I've just realized that the value is indeed more than 1/3.
The algorithm that gives 1/3 only stops when there are no more blues, on the simple expectation that one can only lose from then on by further draws.

However, it can be advantageous to stop before then, even if there are remaining blues, because the expectation of payoffs from continued draws may be less than the expectation of stopping, which is 0.

That will be the case when there are 1 blue and 2 reds remaining, because the expected payoff from continuing to draw until the blue is drawn is negative.

In the diagram, that's the top node in the second column labelled

Blue
(1,2)
P=0.5
s=1
Σs=1

If we remove the descendant nodes of that node, making it a leaf, the expectation over the leaves becomes ##0.458\dot3##

If anybody's interested, here's R code for calculating V for any combination of reds and blues. It gives ##0.458\dot3## for ##V(2,2)##.
Code:
V <- function (b,r){print(paste('b=',b,'r=',r))
  if (r==0)
    b else
      if (b==0)
        0 else
        max(0, b* (1 + V(b-1,r)) + r*(-(5/4) + V(b, r-1)) )/(b+r)
}
V(2,2)
and attached is a revised diagram.

A corresponding Maple procedure is:

V:=proc(a::nonnegint,b::nonnegint)
> option remember;
> if a = 0 then 0
> elif b = 0 then a
> else max(0,a/(a+b)*(1+V(a-1,b))+b/(a+b)*(-5/4+V(a,b-1)))
> end if;
> end proc;

This gives V(2,2) = 11/24 = 0.46786 and V(5,3) = 449/224 = 2.00446, etc.

Using a recursive algorithm to find the value of a game

Homework Help Overview

Discussion Character

Approaches and Questions Raised

Discussion Status

Contextual Notes

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Attachments

Attachments

Attachments

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Polar integral

Deriving spatial derivatives

Is this the correct general solution of the given PDE?

J_1(x) = (x^2/10)*(J_1(x) + J_3(x)) How to solve?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect