Programming a recursive expected value card game

member 428835 · Jun 17, 2022

Hi PF!

Take a deck of cards. 26 are red and 26 are black. You draw cards randomly without replacement: if the card is red you gain a dollar, if black you lose a dollar. What's the expected value?

My thought process was to look at simple cases and build to a 52 card deck. We know if ##r=0## expected payout is 0, and if ##b=0## expected payout is ##r##. Otherwise the following rule should govern the expected payout ##E(r,b)## of the process:

$$E(r,b) = \frac{r}{r+b}(1 + E(r-1,b) ) + \frac{b}{r+b}(-1 + E(r,b-1) )$$

Obviously we draw another card when ##E(r,b)>0##, else we stop. My question is how to program this? I've tried the following but am stumped as to what's next. Can you help?

Python:

def cards(r, b):
    def expVal(r, b):
        if r == 0:
            return 0
        elif b == 0:
            return r
        else:
            return r/(r + b) * (1 + expVal(r-1, b)) + b/(r + b) * (-1 + expVal(r, b-1))
    expVal(r, b)
if __name__ == "__main__":
    r = 26
    b = 26
    cards(r, b)

FactChecker · Jun 17, 2022

In your problem statement, are you allowed to stop when there are no more red cards left? Are you allowed to keep going when there are no more black cards left? You have not defined the end-game rules.
1) Make sure that your logic in lines 3-6 match the end-game rules.
2) You need to print the result.

Borg · Jun 18, 2022

joshmccraney said:

Take a deck of cards. 26 are red and 26 are black. You draw cards randomly without replacement: if the card is red you gain a dollar, if black you lose a dollar. What's the expected value?

If these are the only rules, you would always win if you quit while ahead or at worse, have a draw. I like those odds. It would make an interesting machine learning problem to determine the optimal stopping point when you're ahead.

willem2 · Jun 18, 2022

Borg said:

If these are the only rules, you would always win if you quit while ahead or at worse, have a draw. I like those odds. It would make an interesting machine learning problem to determine the optimal stopping point when you're ahead.

You don't need anything that complicated. You can just change the program to let retval return 0, if it would ever return a negative number.
The other problem with the program is that it is too slow, because it calls expval many times with the same arguments. You'll need memoization.
if you put

Python:

rom functools import cache
@cache

before the definition of expval (and drop the redundant cards(b,r) definition) this will compute instantaneously, instead of never.

FactChecker · Jun 18, 2022

willem2 said:

The other problem with the program is that it is too slow, because it calls expval many times with the same arguments.

Does it? It seems that every call will have one fewer red or one fewer black -- never repeating the same two arguments.

PeroK · Jun 18, 2022

One principle is that as more cards are drawn the winnings that you will settle for must reduce. The first strategy to test is the quit as soon as you are 1 card ahead. You could calculate how much that wins.

The next strategy is to quit when you are two cards ahead until some point and then quit when you are one card ahead after that point. You have one variable here. If you optimise that then does that fix the point at which you drop to one card?

Then, you quit three cards ahead, then two then one. That means two variable points. But, we already have the optimum point for the switch from 2 to 1. Now we calculate the optimum point for the switch from 3 to 2.

We continue until the optimum point is as early as possible.

My guess is that we would quit at 3 cards ahead after 3 cards, then 2 cards ahead at some point and 1 card ahead at some final point. I doubt it would be worth gambling if you 3 cards ahead, possibly 4 at the very maximum and perhaps a maximum of 2 would be optimal.

Borg · Jun 18, 2022

That's why I thought it would be an interesting ML program. It would also take into account how many cards of each type are remaining when it decides whether to quit.

FactChecker · Jun 18, 2022

As soon as you are ahead, without replacement, the odds turn against you, and if you continue to draw all the cards, then you will only break even. The converse is that if you are behind, then the odds turn in your favor and you know that you can always break even by continuing to draw all the cards.
So I think that the optimum strategy is to continue until you are ahead and then stop immediately (or until there are no more cards)

UPDATE: @Borg has pointed out the error of this thinking. See posts #15 and #16.

Borg · Jun 18, 2022

I don't think that it's that simple. The ultimate goal in machine learning is to maximize outcomes. Since there are cases where you can earn more than $1, there is the possibility of earning more if there are still good cards left. Over thousands of games, it will learn when to try for more in order to increase the total amount won per game.

FactChecker · Jun 18, 2022

Borg said:

I don't think that it's that simple. The ultimate goal in machine learning is to maximize outcomes. Since there are cases where you can earn more than $1, there is the possibility of earning more if there are still good cards left. Over thousands of games, it will learn when to try for more in order to increase the total amount won per game.

There is no need for machine learning when the analysis gives an answer. Once you are ahead, it means that there are more black cards remaining than red cards. Therefore, the expected return for continuing the game is negative. So you should stop as soon as you get ahead. Do not continue when the odds are against you.

Borg · Jun 18, 2022

So you would quit if your first draw was a winner? I could definitely see being able to beat that strategy over the long term.

FactChecker · Jun 18, 2022

Borg said:

So you would quit if your first draw was a winner? I could definitely see being able to beat that strategy over the long term.

Yes, I would quit. It would not be a good bet to continue the game with 25 red cards and 26 black cards. I would expect, on average, to lose the following play.

Borg · Jun 18, 2022

However, you have a chance to double your money. Even if the next card was a loser, you would still have a very signicant chance of eventually getting back to +1 again. The odds of that happening for this example would be higher than being stuck with zero.

FactChecker · Jun 18, 2022

Borg said:

However, you have a chance to double your money. Even if the next card was a loser, you would still have a very signicant chance of eventually getting back to +1 again. The odds of that happening for this example would be higher than being stuck with zero.

If the next card is a loser, then there are 25 red and 25 black cards remaining. I would expect, on average, to break even by continuing play. But that is worse than the +1 advantage that I had before I picked the second card. So it was not wise to pick the second card.

Borg · Jun 18, 2022

Only if you never get above zero in 50 draws. That's little different than the 52 cards in the starting scenario. Statistically speaking, the odds are very high that you would get in positive territory at some point for either situation. Over time, the attempt to go for a second dollar (with just under 50-50 odds) would win me more money.

FactChecker · Jun 18, 2022

Borg said:

Only if you never get above zero in 50 draws. That's little different than the 52 cards in the starting scenario. Statistically speaking, the odds are very high that you would get in positive territory at some point for either situation. Over time, the attempt to go for a second dollar (with just under 50-50 odds) would win me more money.

AHa! I see your point. I stand corrected. If there is a high enough variance so that you can expect some greater advantage at some point later on, then it is better to play for that. You should stop when your current advantage is greater or equal to the expected maximum advantage if you continue the game.
That is really worth some thought. Thanks.

Borg · Jun 18, 2022

Early in the game, I would expect that there might even be favorable odds of going for a third dollar. What to go for and when in a problem like this is something that machine learning excels in.

willem2 · Jun 19, 2022

FactChecker said:

Does it? It seems that every call will have one fewer red or one fewer black -- never repeating the same two arguments.

expval (2,2) calls expval(1,2) and expval(2,1) and both call expval(1,1)
If you memoize, you have to compute a maxium of r*b nodes, and if you don't
With r=b=10, you only have to to 120 function calls, opposed to 369511 without memoization.
r=b=26 will just never get done without memoization

PeroK · Jun 19, 2022

FactChecker said:

AHa! I see your point. I stand corrected. If there is a high enough variance so that you can expect some greater advantage at some point later on, then it is better to play for that. You should stop when your current advantage is greater or equal to the expected maximum advantage if you continue the game.
That is really worth some thought. Thanks.

If we take a simple example. You draw a red card first.

If your strategy is to quit, then you win $1 every time this happens.

If your strategy is to play on to the second card, then you win $2 almost 50% of the time and are back to even about 50% of the time. That's an average win of almost the same as above, but you can still win $1 on most of the remaining games.

My first simulation gives an average win of ##0.963## for the first strategy; ##1.200## if you try to get two cards ahead only until the second card (then quit whenever you are one card ahead after that); ##1.318## if you try to get two cards ahead until the 4th card; etc.

And, in fact, the optimum strategy is to keep playing until the 48th (or 50th) card looking to get two cards ahead.

You can confirm analytically that it's better to play on from 46 to 48 as follows.

The test case is where you are ##+1## after 47 cards. With the "46" strategy, you would quit now and take the $1 whenever this happens. With the "48" strategy, you would play on and:

With probability ##\frac 2 5## win $2.

With probability ##\frac 3 5## be level after 48 cards.

With probability ##\frac 3 {10}## win $1 by quitting after 49 cards.

With probability ##\frac 3 {10}## be one card down after 49 cards. And, so have a final probability of ##\frac 1 {10}## of winning $1 after 51 cards.

That all adds up to an expected win of $1.2 dollars for the "48" strategy against $1 for the "46" strategy in these key cases. All other cases being equal.

Finally, it's just as good to play on if you are one card ahead after 49 cards from the "48" to the "50" strategy as your expected winning would be:

With probability ##\frac 1 3## you win $2.

With probability ##\frac 2 3## you are level after 50 cards, so with probability ##\frac 1 3## you win $1.

Both strategies win $1 on average in this key case.

I'll look at simulating playing on trying to get 3 cards ahead now. This must be better again.

PS The expected win is about $1.756 per game with the "48" or "50" strategy.

PeroK · Jun 19, 2022

The optimum strategy is to look for a win of $6 early on and then settle for $5, $4 etc. as more cards have gone.

The full optimum strategy is:

Look for a win of $6 until the 6th, 8th or 10th cards (I can't detect much difference in these cases). I got the best result using the 6th card.

Settle for a win of $5 from the 7th until the 25th card

Settle for a win of $4 win from the 26th until the 32nd card.

Settle for a $3 win from the 33rd until the 43rd card.

Settle for a $2 win from the 44th card until the 50th card.

Settle for $1 on the 51st card.

The average win is approx $2.62 per game.

There is only a marginal difference if you don't look for the $6 wins and just settle for a $5 win (from the start).

FactChecker · Jun 19, 2022

willem2 said:

expval (2,2) calls expval(1,2) and expval(2,1) and both call expval(1,1)
If you memoize, you have to compute a maxium of r*b nodes, and if you don't
With r=b=10, you only have to to 120 function calls, opposed to 369511 without memoization.
r=b=26 will just never get done without memoization

I see. I stand corrected.

PeroK · Jun 19, 2022

PS I ran some simulations with 10 million trials. The best result was looking for +6 until the 8th card. That gave a return of $2.622 per game.

PPS Here's a breakdown for the 10 million games in terms of the various wins:

$6: 0.0283
$5: 0.2213
$4: 0.1325
$3: 0.1539
$2: 0.1314
$1: 0.0912
$0: 0.3327

It's amazing how often you can win $5-$6. It's 25% of the time.

PeroK · Jun 19, 2022

I fixed the OP's program and it produced the answer of $2.6245.

I also used it to confirm the cutoff points (which resolved the marginal cases from my simulation):

There is no point in looking for +7

Settle for +6

Settle for +5 from the 11th card (adjustment from the 9th)

Settle for +4 from the 24th card (adjustment from the 26th)

Settle for +3 from the 35th card (adjustment from the 33rd)

Settle for +2 from the 44th card

Settle for +1 from either the 49th or 51st card

I reran the simulation over 10 million trials and the adjustments pushed the average up to 2.6243.

Programming a recursive expected value card game

1. What is a recursive expected value card game?

2. How is recursion used in programming a recursive expected value card game?

3. What are the benefits of using recursion in programming a recursive expected value card game?

4. Can you give an example of a recursive expected value card game?

5. How can I improve my skills in programming a recursive expected value card game?

Similar threads

Hot Threads

Recent Insights