Optimal Stopping Strategy for Winning Game with Two Bells

fignewtons
Messages
28
Reaction score
0

Homework Statement


You are playing a game with two bells. Bell A rings according to a homogeneous poisson process at a rate r per hour and Bell B rings once at a time T that is uniformly distributed from 0 to 1 hr (inclusive). You get $1 each time A rings and can quit anytime but if B rings before you quit, you must return all the money you received thus far.

Homework Equations


P[A rings] = rΔt
P[B rings] = Δt/(1-t)

The Attempt at a Solution


At time t, with x in earnings, the optimal stop strategy for this game is continuing if E[W] > 0, where W is net earnings over a small period of time (call it Δt):
E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings] + E[W|no rings]P[no rings] + E[W|many rings]P[many rings]
E[W|A rings] = 1
E[W|B rings] = $-x
E[W|no rings] = $0
and P[many rings] ≈ 0 (since Δt is small time interval)

So E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings]

substituting in the relevant equations...

the strategy is to continue only if x < r(1-t), and stop otherwise.

I'm not sure how to find what's the most amount of money I can win under this strategy. I thought it means I should take the derivative of this winnings strategy but taking the derivative doesn't make sense for the inequality.

Ie. derivative wrt x of x < r(1-t) is d/dx(x-r(1-t)) < 0 which yields 1< 0, doesn't make sense.
 
Physics news on Phys.org
figNewtons said:
I thought it means I should take the derivative of this winnings strategy but taking the derivative doesn't make sense for the inequality.
Why would you take the derivative of the winnings? Instead, I suggest thinking of what is the maximal x at which you will quit based on your quitting criterion.
 
So the maximal x is just r(1-t)? If you decide to stop at any time t.
 
figNewtons said:

Homework Statement


You are playing a game with two bells. Bell A rings according to a homogeneous poisson process at a rate r per hour and Bell B rings once at a time T that is uniformly distributed from 0 to 1 hr (inclusive). You get $1 each time A rings and can quit anytime but if B rings before you quit, you must return all the money you received thus far.

Homework Equations


P[A rings] = rΔt
P[B rings] = Δt/(1-t)

The Attempt at a Solution


At time t, with x in earnings, the optimal stop strategy for this game is continuing if E[W] > 0, where W is net earnings over a small period of time (call it Δt):
E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings] + E[W|no rings]P[no rings] + E[W|many rings]P[many rings]
E[W|A rings] = 1
E[W|B rings] = $-x
E[W|no rings] = $0
and P[many rings] ≈ 0 (since Δt is small time interval)

So E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings]

substituting in the relevant equations...

the strategy is to continue only if x < r(1-t), and stop otherwise.

I'm not sure how to find what's the most amount of money I can win under this strategy. I thought it means I should take the derivative of this winnings strategy but taking the derivative doesn't make sense for the inequality.

Ie. derivative wrt x of x < r(1-t) is d/dx(x-r(1-t)) < 0 which yields 1< 0, doesn't make sense.

Assuming that your ##x## really means expected earnings, you have ##x(t + \Delta t) = x(t)## if you stop at ##t## and ##x(t + \Delta t) = x(t) + (r - x(t)/(1-t)) \Delta t## if you wait until ##t + \Delta t##. Therefore, as long as you wait you have a differential equation ##dx(t)/dt = r - x(t)/(1-t)## with initial condition ##x(0) = 0##.
 
Ray Vickson said:
Assuming that your ##x## really means expected earnings, you have ##x(t + \Delta t) = x(t)## if you stop at ##t## and ##x(t + \Delta t) = x(t) + (r - x(t)/(1-t)) \Delta t## if you wait until ##t + \Delta t##. Therefore, as long as you wait you have a differential equation ##dx(t)/dt = r - x(t)/(1-t)## with initial condition ##x(0) = 0##.
We are not talking about expected winnings. The question was:
figNewtons said:
what's the most amount of money I can win under this strategy.
This has nothing to do with the expected earnings.
 
Orodruin said:
We are not talking about expected winnings. The question was:

This has nothing to do with the expected earnings.

Yes, in a way it does. His criterion is to continue whenever the expected earnings over the next interval ##\Delta t## are ##> 0##; that is exactly where his condition ##x < r(1-t)## comes from.

That being said, my response to him was ill-advised: ##x## is not a continuous variable, but can change only in increments of ##+1## or ##-x##, and solutions of differential equations do not do that.
 
Last edited:
Ray Vickson said:
Yes, in a way it does. His criterion is to continue whenever the expected earnings over the next interval ##\Delta t## are ##> 0##; that is exactly where his condition ##x < r(1-t)## comes from.

That being said, my response to him was ill-advised: ##x## is not a continuous variable, but can change only in increments of ##+1## or ##-x##, and solutions of differential equations do not do that.
No, the maximal possible winning depends only on the criterion for quitting nor the discrete increment. It is true that the criterion in itself is based on the infinitesimal increment, but the maximal winnings can be deduced directly from the quitting criterion.
 
Sorry for being thick, but I am confused, is the maximal winning r(1-t) as in the quitting criterion verbatim?
or is it r(1-t) -1 since x<r(1-t) means it can never reach r(1-t) exactly, and since x is discrete and increases only in increments of $1.
or am I completely off?
 
figNewtons said:
Sorry for being thick, but I am confused, is the maximal winning r(1-t) as in the quitting criterion verbatim?
or is it r(1-t) -1 since x<r(1-t) means it can never reach r(1-t) exactly, and since x is discrete and increases only in increments of $1.
or am I completely off?
Almost there. You need to think a bit about when the quitting criterion will actually tell you to quit. Also, the maximal winning should not depend on t.
 
  • #10
Orodruin said:
Almost there. You need to think a bit about when the quitting criterion will actually tell you to quit. Also, the maximal winning should not depend on t.

thanks for the hint. i think the quitting criterion tells you to quit or continue at end of Δt given you start at t. we want max Δt because playing longer allows for chance of winning more dollars, so the least t can be is 0 (which allows for the max Δt). In this case, t=0. so r(1-0)=r...the maximum winnings is $r...
it kind of makes sense intuitively (if you start at time 0 and play for 1 hour, and quit just before the bell B rings, you can get at most $r because A rings at a rate r/hour) but I'm not sure if the logic makes sense...please verify/correct
 
  • #11
figNewtons said:
In this case, t=0. so r(1-0)=r...the maximum winnings is $r...
This essentially assumes that ##r## is an integer. You probably should also discuss the case when ##r## is not an integer.

Essentially, the quitting criterion tells you when it is no longer profitable to continue. It is no longer profitable when the rate at which ##A## rings is lower than the rate at which ##B## rings weighted by the gain/loss -- which is what you discussed already in the first post. If the possible loss is already ##r##, you will never have ##B## ringing at a small enough rate to justify continuing.

The maximal winning occurs if you get really lucky and essentially ##A## rings ##r## times in quick succession essentially at ##t = 0##.

Edit: Another interesting question is how much a casino should charge you for playing this game ...
 
  • #12
Orodruin said:
This essentially assumes that ##r## is an integer. You probably should also discuss the case when ##r## is not an integer.

Essentially, the quitting criterion tells you when it is no longer profitable to continue. It is no longer profitable when the rate at which ##A## rings is lower than the rate at which ##B## rings weighted by the gain/loss -- which is what you discussed already in the first post. If the possible loss is already ##r##, you will never have ##B## ringing at a small enough rate to justify continuing.

The maximal winning occurs if you get really lucky and essentially ##A## rings ##r## times in quick succession essentially at ##t = 0##.

Edit: Another interesting question is how much a casino should charge you for playing this game ...
Ok, so we can take Δt again to be a small number, just that t=0. And as Δt -> 0, P[B ring] -> 0
 
  • #13
figNewtons said:
Ok, so we can take Δt again to be a small number, just that t=0. And as Δt -> 0, P[B ring] -> 0
I would not see it this way. The probability that A rings in the time interval ##\Delta t## also goes to zero. The question is how they relate to each other and therefore whether the expectation value is positive or negative.
 
Back
Top