Optimal Stopping Strategy for Winning Game with Two Bells

fignewtons · Dec 10, 2016

Homework Statement

You are playing a game with two bells. Bell A rings according to a homogeneous poisson process at a rate r per hour and Bell B rings once at a time T that is uniformly distributed from 0 to 1 hr (inclusive). You get $1 each time A rings and can quit anytime but if B rings before you quit, you must return all the money you received thus far.

Homework Equations

P[A rings] = rΔt
P[B rings] = Δt/(1-t)

The Attempt at a Solution

At time t, with x in earnings, the optimal stop strategy for this game is continuing if E[W] > 0, where W is net earnings over a small period of time (call it Δt):
E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings] + E[W|no rings]P[no rings] + E[W|many rings]P[many rings]
E[W|A rings] = 1
E[W|B rings] = $-x
E[W|no rings] = $0
and P[many rings] ≈ 0 (since Δt is small time interval)

So E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings]

substituting in the relevant equations...

the strategy is to continue only if x < r(1-t), and stop otherwise.

I'm not sure how to find what's the most amount of money I can win under this strategy. I thought it means I should take the derivative of this winnings strategy but taking the derivative doesn't make sense for the inequality.

Ie. derivative wrt x of x < r(1-t) is d/dx(x-r(1-t)) < 0 which yields 1< 0, doesn't make sense.

Orodruin · Dec 11, 2016

figNewtons said:

I thought it means I should take the derivative of this winnings strategy but taking the derivative doesn't make sense for the inequality.

Why would you take the derivative of the winnings? Instead, I suggest thinking of what is the maximal x at which you will quit based on your quitting criterion.

fignewtons · Dec 11, 2016

So the maximal x is just r(1-t)? If you decide to stop at any time t.

Ray Vickson · Dec 11, 2016

figNewtons said:

Homework Statement

You are playing a game with two bells. Bell A rings according to a homogeneous poisson process at a rate r per hour and Bell B rings once at a time T that is uniformly distributed from 0 to 1 hr (inclusive). You get $1 each time A rings and can quit anytime but if B rings before you quit, you must return all the money you received thus far.

Homework Equations

P[A rings] = rΔt
P[B rings] = Δt/(1-t)

The Attempt at a Solution

At time t, with x in earnings, the optimal stop strategy for this game is continuing if E[W] > 0, where W is net earnings over a small period of time (call it Δt):
E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings] + E[W|no rings]P[no rings] + E[W|many rings]P[many rings]
E[W|A rings] = 1
E[W|B rings] = $-x
E[W|no rings] = $0
and P[many rings] ≈ 0 (since Δt is small time interval)

So E[W] = E[W|A rings]P[A rings] + E[W|B rings]P[B rings]

substituting in the relevant equations...

the strategy is to continue only if x < r(1-t), and stop otherwise.

I'm not sure how to find what's the most amount of money I can win under this strategy. I thought it means I should take the derivative of this winnings strategy but taking the derivative doesn't make sense for the inequality.

Ie. derivative wrt x of x < r(1-t) is d/dx(x-r(1-t)) < 0 which yields 1< 0, doesn't make sense.

Assuming that your ##x## really means expected earnings, you have ##x(t + \Delta t) = x(t)## if you stop at ##t## and ##x(t + \Delta t) = x(t) + (r - x(t)/(1-t)) \Delta t## if you wait until ##t + \Delta t##. Therefore, as long as you wait you have a differential equation ##dx(t)/dt = r - x(t)/(1-t)## with initial condition ##x(0) = 0##.

Orodruin · Dec 11, 2016

Ray Vickson said:

Assuming that your ##x## really means expected earnings, you have ##x(t + \Delta t) = x(t)## if you stop at ##t## and ##x(t + \Delta t) = x(t) + (r - x(t)/(1-t)) \Delta t## if you wait until ##t + \Delta t##. Therefore, as long as you wait you have a differential equation ##dx(t)/dt = r - x(t)/(1-t)## with initial condition ##x(0) = 0##.

We are not talking about expected winnings. The question was:

figNewtons said:

what's the most amount of money I can win under this strategy.

This has nothing to do with the expected earnings.

Ray Vickson · Dec 11, 2016

Orodruin said:

We are not talking about expected winnings. The question was:

This has nothing to do with the expected earnings.

Yes, in a way it does. His criterion is to continue whenever the expected earnings over the next interval ##\Delta t## are ##> 0##; that is exactly where his condition ##x < r(1-t)## comes from.

That being said, my response to him was ill-advised: ##x## is not a continuous variable, but can change only in increments of ##+1## or ##-x##, and solutions of differential equations do not do that.

Orodruin · Dec 11, 2016

Ray Vickson said:

Yes, in a way it does. His criterion is to continue whenever the expected earnings over the next interval ##\Delta t## are ##> 0##; that is exactly where his condition ##x < r(1-t)## comes from.

That being said, my response to him was ill-advised: ##x## is not a continuous variable, but can change only in increments of ##+1## or ##-x##, and solutions of differential equations do not do that.

No, the maximal possible winning depends only on the criterion for quitting nor the discrete increment. It is true that the criterion in itself is based on the infinitesimal increment, but the maximal winnings can be deduced directly from the quitting criterion.

fignewtons · Dec 12, 2016

Sorry for being thick, but I am confused, is the maximal winning r(1-t) as in the quitting criterion verbatim?
or is it r(1-t) -1 since x<r(1-t) means it can never reach r(1-t) exactly, and since x is discrete and increases only in increments of $1.
or am I completely off?

Orodruin · Dec 13, 2016

figNewtons said:

Sorry for being thick, but I am confused, is the maximal winning r(1-t) as in the quitting criterion verbatim?
or is it r(1-t) -1 since x<r(1-t) means it can never reach r(1-t) exactly, and since x is discrete and increases only in increments of $1.
or am I completely off?

Almost there. You need to think a bit about when the quitting criterion will actually tell you to quit. Also, the maximal winning should not depend on t.

fignewtons · Dec 13, 2016

Orodruin said:

Almost there. You need to think a bit about when the quitting criterion will actually tell you to quit. Also, the maximal winning should not depend on t.

thanks for the hint. i think the quitting criterion tells you to quit or continue at end of Δt given you start at t. we want max Δt because playing longer allows for chance of winning more dollars, so the least t can be is 0 (which allows for the max Δt). In this case, t=0. so r(1-0)=r...the maximum winnings is $r...
it kind of makes sense intuitively (if you start at time 0 and play for 1 hour, and quit just before the bell B rings, you can get at most $r because A rings at a rate r/hour) but I'm not sure if the logic makes sense...please verify/correct

Orodruin · Dec 13, 2016

figNewtons said:

In this case, t=0. so r(1-0)=r...the maximum winnings is $r...

This essentially assumes that ##r## is an integer. You probably should also discuss the case when ##r## is not an integer.

Essentially, the quitting criterion tells you when it is no longer profitable to continue. It is no longer profitable when the rate at which ##A## rings is lower than the rate at which ##B## rings weighted by the gain/loss -- which is what you discussed already in the first post. If the possible loss is already ##r##, you will never have ##B## ringing at a small enough rate to justify continuing.

The maximal winning occurs if you get really lucky and essentially ##A## rings ##r## times in quick succession essentially at ##t = 0##.

Edit: Another interesting question is how much a casino should charge you for playing this game ...

fignewtons · Dec 13, 2016

Orodruin said:

This essentially assumes that ##r## is an integer. You probably should also discuss the case when ##r## is not an integer.

Essentially, the quitting criterion tells you when it is no longer profitable to continue. It is no longer profitable when the rate at which ##A## rings is lower than the rate at which ##B## rings weighted by the gain/loss -- which is what you discussed already in the first post. If the possible loss is already ##r##, you will never have ##B## ringing at a small enough rate to justify continuing.

The maximal winning occurs if you get really lucky and essentially ##A## rings ##r## times in quick succession essentially at ##t = 0##.

Edit: Another interesting question is how much a casino should charge you for playing this game ...

Ok, so we can take Δt again to be a small number, just that t=0. And as Δt -> 0, P[B ring] -> 0

Orodruin · Dec 13, 2016

figNewtons said:

Ok, so we can take Δt again to be a small number, just that t=0. And as Δt -> 0, P[B ring] -> 0

I would not see it this way. The probability that A rings in the time interval ##\Delta t## also goes to zero. The question is how they relate to each other and therefore whether the expectation value is positive or negative.

Optimal Stopping Strategy for Winning Game with Two Bells

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Hot Threads

Prove that the integral is equal to ##\pi^2/8##

Solving the wave equation with piecewise initial conditions

Area of loop in x-y plane

Calculating radius of gyration of plane figure about x-axis

Solve this problem that involves induction

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective