# Estimating a mean from games of ruin

• I
• lalekl
In summary: The product of all the ##R_k## for days ##k## she has experienced is the likelihood of the experience she has had. You can use this in Maximum Likelihood Estimation to estimate ##p##.In summary, the gambler in this conversation is playing a casino game with fixed bet and odds, and records the number of bets she can make before running out of money. She then goes back with a larger bankroll and repeats the process. The conversation discusses different methods for estimating the true expected value of the game using only the recorded data. These methods include Maximum Likelihood Estimation and a Bayesian approach. The conversation also explores the possibility of using a curve-fitting approach to account for unknown probabilities. However, it is noted

#### lalekl

Imagine a gambler playing a casino game with fixed bet, fixed odds, no skill, and a starting bankroll, ##M_0##. She plays until she can no longer afford to bet and records only how many bets she was able to make, ##N_0##, until she could not afford to bet. Each day she goes back to the casino with a larger bankroll, ##M_i##, plays the same game until she could no longer afford to bet, and records how many bets she made, ##N_i##. Can she determine the true EV of the game using only this information ( her sequences of bankrolls, ##\{M_i\}##, and ##\{N_i\}##, how many bets she was able to make on day ##i## before she could no longer afford to bet that day)?

Last edited:
She can't determine it, but she can make an unbiased estimate of it, for which the expected error decreases with every new day's observation.

Write ##p## for the unknown probability of her winning a bet. On each day ##k## write the probability ##R_k## of losing the initial sum ##M_k## in exactly ##N_k## bets. That will be a binomial probability with that is a function of ##p##, ##N_k## and ##W_k##. The product of all the ##R_k## for days ##k## she has experienced is the likelihood of the experience she has had. You can use this in Maximum Likelihood Estimation to estimate ##p##.

that is an interesting answer, but the casino game could for instance be a slot machine with non-negative, unbounded, discrete support and i don't think it would have any commonly known distribution that would allow us to specify the kind of product you are suggesting.

andrewkirk said:
The product of all the RkRkR_k for days kkk she has experienced is the likelihood of the experience she has had. You can use this in Maximum Likelihood Estimation to estimate ppp.
You can also use a Bayesian approach to calculate a new posterior distribution for p after data from each day’s game.

lalekl said:
that is an interesting answer, but the casino game could for instance be a slot machine with non-negative, unbounded, discrete support and i don't think it would have any commonly known distribution that would allow us to specify the kind of product you are suggesting.
It sounds like you're envisaging a game in which one pays 1 to enter, and can win an amount that is any non-negative integer, with probabilities ##p_0,p_1,...##. Then it will be the case that
$$p_k=f(k, a_1,a_2,...,a_m)$$
for some function ##f## and parameters ##a_1,...a_m##.

There are two possibilities
Case 1: Gambler knows ##f## but not the parameters ##a_1,...,a_m##. Then she can use MLE to estimate those parameters based on her experience.
Case 2: Gambler does not know ##f##. Then she can't use MLE directly. But if the experience is long enough, she can make a pragmatic approximation. Let ##n## be the largest win she's ever made in a single play. Then assume temporarily that the probability of winning more than ##n## in a single play is zero. So now we have a game in which she just needs to estimate ##p_1,...p_n##, which she can do using MLE. Having done that, she can draw a graph of the ##p_k## estimates and fit a suitable curve to it, that has support on ##\mathbb R_+##. She then multiplies that curve by a scalar so that its integral is 1. That accounts for the unobserved tail of big wins.

so we are definitely in case 2. but you could also win less than an integer or any number between the integers, i'll have to think about how that affects your argument. however, she isn't allowed to use any information about the game other than the two recorded sequences, so she can't write down her largest win. Is there another way to do that?

lalekl said:
you could also win less than an integer
No she can't. In my formulas the winnings are gross not net of the price to play. If we wanted to do it based on net winnings we'd just include an extra parameter ##p_{-1}##. In the above, gross, formulation a win of zero on a single play means that she has lost the unit of money she paid to play.
lalekl said:
she isn't allowed to use any information about the slots other than the two recorded sequences
That's a peculiar condition, as it implies she has amnesia about everything that happened while inside the casino.
But it can be surmounted. It just makes the approximation a little more rough. She just guesses ##n## as a number greater than she expects any machine would pay out. The curve-fitting and scaling will adjust for this to some extent if she guesses too low.
lalekl said:
that throws much of a wrench in your argument
It's not an argument, but a way of estimating unknown parameters given limited information. There's nearly always a way to make an estimate in a given set-up. It's just that they become progressively more inaccurate and slower to converge as the information set is constrained.

Dale
so she bets a dollar, wins 0.33333 repeating or maybe 22+pi, we are meant to be completely ignorant of the support...for a given n the amount of parameters to be estimated could be much larger than n, and certainly larger than the number of observations we have.

yes it is peculiar. you can reframe it if you'd like: she doesn't even try to estimate it, she just records that information on a blog each day and that's all you have access to in order to try to estimate the mean

If the odds and the bankroll are known then you can calculate the likelihood of any given number of bets being played. So you can use Bayesian methods to go from the observed number of bets played to a posterior distribution on the odds. Once you know the odds then you can directly calculate the EV. So you wind up with a posterior distribution on the EV.

Dale said:
If the odds and the bankroll are known then you can calculate the likelihood of any given number of bets being played. So you can use Bayesian methods to go from the observed number of bets played to a posterior distribution on the odds. Once you know the odds then you can directly calculate the EV. So you wind up with a posterior distribution on the EV.
we don't know the odds of the game, only that they don't change.

Right. But if the odds were known then you could calculate the likelihood of the different possible outcomes. That allows you to learn about the unknown odds by observing the outcomes.

Dale said:
Right. But if the odds were known then you could calculate the likelihood of the different possible outcomes. That allows you to learn about the unknown odds by observing the outcomes.
sorry i read your response too quickly. I'm still having an issue where if we don't know the distribution, how can we construct this likelihood function? It seems to me that the number of parameters grows faster than the observations...that's a problem isn't it?

lalekl said:
It seems to me that the number of parameters grows faster than the observations
No, there is only one unknown parameter, the odds. Unless I am misunderstanding the problem

Dale said:
No, there is only one unknown parameter, the odds. Unless I am misunderstanding the problem
the support of the game is unbounded, should i treat an infinite vector of the odds as one parameter? that doesn't seem to work for me...

lalekl said:
the support of the game is unbounded,
What do you mean by this?

lalekl said:
It seems to me that the number of parameters grows faster than the observations
The parameters fix the distribution of the win from a single play, not the distribution of the number of plays to exhaust the initial sum on a given day. So the number of parameters does not grow at all. It is fixed.
lalekl said:
the support of the odds is unbounded
This was already covered in posts 5 and 7. And it's not the support of the odds that is unbounded but the support of the winnings from a single play. It's very important to keep one's terminology straight in probability theory or we just end up completely confused.

Dale said:
What do you mean by this?
the unknown distribution could pay out any non-negative number as far as the player is concerned

lalekl said:
so she bets a dollar, wins 0.33333 repeating or maybe 22+pi
Then the probability of losing all her money on a given day is negligible, because she will end up with an amount less than the price of a single play, unless she never wins anything. In addition, it makes no sense to have a play pay off a real number. Currency is integers, in cents if not in dollars. Nobody ever paid anybody pi dollars.

lalekl said:
the unknown distribution could pay out any non-negative number as far as the player is concerned
So what do you mean by “fixed odds” in your first post? If the payout is variable in what way are the odds fixed?

Can you describe the game a little more carefully please

Dale said:
So what do you mean by “fixed odds” in your first post? If the payout is variable in what way are the odds fixed?

Can you describe the game a little more carefully please
it means the odds of any given pay do not change day to day, but the player is ignorant of the paytable.

andrewkirk said:
Then the probability of losing all her money on a given day is negligible, because she will end up with an amount less than the price of a single play, unless she never wins anything.
i updated the wording to be less confusing in that respect

andrewkirk said:
In addition, it makes no sense to have a play pay off a real number. Currency is integers, in cents if not in dollars. Nobody ever paid anybody pi dollars.
Either way, the point is she is ignorant of how many possible parameters there are to estimate, can you guarantee that your estimation method doesn't demand more parameters than there is data?

lalekl said:
it means the odds of any given pay do not change day to day, but the player is ignorant of the paytable.
Then you just treat the paytable as your unknown. The more succinctly that you can parameterize your unknown paytable then the faster you can gather information about it.

Dale said:
Then you just treat the paytable as your unknown. The more succinctly that you can parameterize your unknown paytable then the faster you can gather information about it.
Yes I agree, I think I am having the same issue with your solution as I am with andrewkirk's: a paytable consisting of every pay and a fixed probability for that pay requires you to estimate the number of parameters in the set up. But since she doesn't know those she has to find all possible pays that cannot be paid and assign zero to those. This list of parameters seems to me to be too big to be unique and to require more observations than she will ever have...

lalekl said:
This list of parameters seems to me to be too big to be unique and to require more observations than she will ever have...
Sure. As @andrewkirk mentioned, you can set up the player’s ignorance in a way that makes it so that very little information is obtained from each play and therefore it will take a long time (like longer than the age of the universe) to converge.

However, it is often possible to parameterize a good approximation to something like the unknown paytable with a relatively few number of parameters. Even though the paytable may not be exact, it can still give you good information in a relatively short time.

The other approach could be to forget the paytable altogether. If all you want to estimate is the EV then simply use that as your parameter to calculate the likelihood of a given number of games

Dale said:
Sure.

The other approach could be to forget the paytable altogether. If all you want to estimate is the EV then simply use that as your parameter to calculate the likelihood of a given number of games
Now we’re cooking with whale oil! But how?

lalekl said:
Imagine a gambler playing a casino game with fixed bet, fixed odds, no skill, and a starting bankroll, ##M_0##.

Can she determine the true EV of the game using only this information ( her sequences of bankrolls, ##\{M_i\}##, and ##\{N_i\}##, how many bets she was able to make on day ##i## before she could no longer afford to bet that day)?
Imagining a gambler at a casino game makes us imagine more information than you specify as given. For example, the gambler would know her winnings on each play of the game. A gambler with a rudimentary understanding of the game would know the amounts that are possible to win, even if she does not know the probability of winning each amount.

Perhaps we should imagine a person Alice who gives her son Bob an amount of money m[j] on each day j and goes to watch a movie while Bob plays the game at a casino. Later in the day Alice askes Bob "How many turns did you play today?" Bob, who always plays till he runs out of money, tells Alice he played n[j] turns. So Alice's only information on day j is (m[j],n[j]). What's a good way for Alice to estimate the expected return for 1 play of the game?

Stephen Tashi said:
Imagining a gambler at a casino game makes us imagine more information than you specify as given. For example, the gambler would know her winnings on each play of the game. A gambler with a rudimentary understanding of the game would know the amounts that are possible to win, even if she does not know the probability of winning each amount.

Perhaps we should imagine a person Alice who gives her son Bob an amount of money m[j] on each day j and goes to watch a movie while Bob plays the game at a casino. Later in the day Alice askes Bob "How many turns did you play today?" Bob, who always plays till he runs out of money, tells Alice he played n[j] turns. So Alice's only information on day j is (m[j],n[j]). What's a good way for Alice to estimate the expected return for 1 play of the game?
Yeah i like that...I also got complaints i think for using “runs out of money” though, so rewrote as “can no longer afford to bet” or something like that...although i will have to say you can say a player knows that there are inifinite possibilities on s slot machine but it is stretching it to say that they know all the values...

Last edited:
It is always possible to do MLE. The general approach is to assume a distribution type (aka family) for the underlying random process. Say the distribution PDF is ##f(x;\vec a)## where ##\vec a## is a vector of unknown parameters. The functional form of ##F## has to be assumed and the parameters ##a_j## are to be estimated.

We have a set of observations which in this case are the pairs ##(M_k,N_k)##. To do MLE, we just need to find a formula for the likelihood of each observation, which will be a probability or a probability density for ##N_k## given ##M_k##. Then we construct the overall likelihood which is the product of the likelihoods of the individual daily observations:
$$L(\vec N;\vec M, \vec a) = \prod_{k=1}^n L(N_k; M_k,\vec a)$$
We take partial derivatives of this wrt each ##a_j##, set them to zero and so get ##m## equations in ##m## unknowns, where ##m## is the dimensionality of ##\vec a## (number of parameters). We solve the system to find ##\vec a##.

In case of this casino problem the underlying distribution is that of the payout on a single play. The only thing we know about ##f## is that its integral on ##[0,\infty)## must be 1. We make a pragmatic choice based on what shape we might expect and how many observations we have. The fewer observations we have, the smaller should be the number of parameters, to avoid overfitting.

Say we decide the observation set is large enough to justify ##m## parameters. Then a suitable functional form would be
$$f(x;\vec a)=a_m \delta(x) + C\exp\left(-x^m + \sum_{j=1}^{m-1} a_j x^j\right)$$
The ##\delta(x)## is the Dirac delta function. We use it to allow a nonzero probability ##a_m## of no payout.
The parameter ##C## is chosen to make the integral 1, so it is calculated from the other parameters, rather than estimated. This distribution is always positive and will tend asymptotically to zero as ##x\to\infty##, which is what we would expect for a payoff.

Given that, the likelihood of the observation ##(M_k,N_k)## is

$$L(N_k;M_k,\vec a) = \int_0^{D_k-B_1} \int_0^{D_k - B_2} ... \int_0^{D_k - B_{N_k}} \prod_{r=1}^{N_k} f(x_r;\vec a) dx_{N_k}...dx_1$$

where ##D_k\triangleq N_k-M_k+1## and ##B_j\triangleq \sum_{r=1}^{j-1} x_r##. Note that ##B_1=0## and ##B_2=x_1##.

In solving the MLE, we apply the constraint that ##a_m\in [0,1]## as that is the probability of a zero payout. All the other ##a_j## can be any real number.

Once we have solved the MLE, we have a full estimated specification of the PDF, so we can integrate that to find the expected payoff from a single play.

Given the problem has been specified in recent posts to have continuous rather than discrete support, talk of a 'paytable' is misplaced. Tables only make sense for discrete sample spaces. We are estimating a function, not a table, and we estimate it by assuming a suitable form, then estimating its parameters.

The minimum number of parameters that can be used in this method is one, in which case the parameter is the probability of no payout on a play and the distribution of nonzero payouts is exponential with parameter 1. As the number of observations increases (the number of days for which ##(M_k,N_k)## was observed), the number of parameters can be increased, but I prefer to err on the side of under-fitting, so would only add another parameter after a load more observations.

StoneTemplePython
Listen for any practical real application, your method “works”; but it misses part of the spirit of the problem which is to not have a different method depending on what kind of distributions you think you might be getting from the very limited information set.

The game still has disrcrete support, but a priori any value may be chosen. your solution as phrased is very akin to Freiling’s axiom of symmetry (https://en.wikipedia.org/wiki/Freiling's_axiom_of_symmetry). So maybe there is just a philisophical difference. the problem does not rule out any nonnegative real number precisely to discourage you from creating too many parameters in a philisophically suspect way. also the spirit of the problem is to provide that your solution in general will give us what we want...if you can show that regardless of support and how you mis-specify it or the functional form of the distribution, you will get the correct mean, then you win. You could for instance explain how to update the parameter space or alter the distributional form based on new information...

Complaining about definintions of “paytables” or “going broke” or the impracticality of paying someone in infinite decimals is just poor sportsmanship. That is only natural for me to expect given that i apparently haven’t found a good way to phrase the problem. I am grateful for the efforts you have made, and for the record, and for any existing slot machine, your continuous method would work very well...although I wouldn't be able to formally prove it.

Last edited:
Camilla plays a casino game with fixed bet, fixed odds, no skill, and a starting bankroll, ##M_0##. She plays until she can no longer afford to bet and records in a journal only her bankroll and how many bets she was able to make, ##N_0##, until she could not afford to bet. She has amnesia from drinking too many free drinks at the casino so remembers nothing else about the game. Each day she goes back to the casino with a larger bankroll, ##M_i##, plays the same game until she can no longer afford to bet, records her bankroll and how many bets she made, ##N_i## , drinks too much and suffers amnesia again forgetting everything else about the game. Can she determine the true return of the game using only the information in her journal ( her sequences of bankrolls, ##\{M_i\}##, and ##\{N_i\}##, how many bets she was able to make on day ##i## before she could no longer afford to bet that day)?

Additional info: the distribution of the game has non-negative discrete support, potentially unbounded, potentially including any given real number >= 0, has zero weight on infinity, mean less than 1, and finite variance. Camilla's method of estimating the mean may not use information about the support she could have learned from playing the game unless she intuits it from the only information about the game she is allowed to use (the sequences ##\{M_i\}##, and ##\{N_i\}##) and the additional assumptions listed here. She does not even know the bet of the game.

Last edited:
Look, in the predecessor thread to this I suggested using Wald's Equality. With some insight related to rescaling, you can easily apply it here. Your response then was

lalekl said:
The problem with using ##E[J]## in practice for my games is that distributions like slot machines with very heavy tails tend to give absurdly high ##E[J]## with very low chances of ever experiencing it, whereas the order statistics have empirically very tantalizing properties just out of reach of my analytic abilities.

which was a non-sequitor and misleading. The game is in the house's favor, each play is iid, has finite mean and in fact a finite variance. There's simply not much more we can ask for. You asked to estimate the expected payoff of the game, and this is the simplest most direct way. Tails are in fact part of the payout mix of an unfair game. The question was about estimating how unfair the game is, and tails must be taken into account. When we have a finite second moment there probability distributions ultimately behave in an intelligible and intuitive way. High variance may necessitate a lot of sampling, but that's life. When you have a only a finite first moment they behave a bit strangely. And when you have no first moment -- e.g. for the symmetric random walk (i.e. null recurrent chains) -- you get bizarre behavior. Thankfully we have a second moment and don't need to worry about any of this.

The order statistics were irrelevant to what was asked then and now.

- - - -
I'm increasingly wondering whether there is a real question underlying these postings -- you've repeatedly used nonstandard terminology and changed the problem each time someone tries to answer it. I've seen fairer games in a casino. As a result, I rather like @andrewkirk 's idea of loading you up with integrals and sending you on your way. If you want to try and apply Wald's Equality to solve your question, you're welcome to do so.

jim mcnamara
Sorry to mislead with the non-sequitor...I didn't think it was, I just don't think estimating ##E[J]## is practical but I think other methods are and empirically they converge faster but I need a strong statement to make progress on them...

the question in the other thread was meant to be more general and definitely I was not 100% and am not 100% sure whether it is a coherent statement. discussing with you and the others pushed me to try to be very specific and I got as far there as I thought I could. The question was always straightforward though, is the limit of that sequence well defined and equal to the unconditional mean...sounds like everyone believed it was incoherent or undefined at best.

Not sure how much of that applies to the question in this thread. My criticisms of the other answers here is that they are not real answers: they make stronger assumptions. There is no demonstration that picking arbitrary supports or parameters will ever get you to the right answer for an unknown distribution. I don't even believe probability statements can be assigned a priori in that way see for instance (https://en.wikipedia.org/wiki/Freiling's_axiom_of_symmetry), and focus on the criticisms of Freiling's intuition. I feel like you should be someone who is saying this.

On the other hand the problem stated in this thread is unambiguous, but has maybe a lot of subtle details. You are suggesting for this I try using: $$E[X_1+...+X_J]=E[J]E[X_1]$$
So obviously ##E[X_1]## is what we want to find. How do we get the other two terms from the information in the problem? Each day I see ##J_i## and ##M_i##, and I know that she experienced a return bounded by ##1-M_i/J_i## and ##1-M_i/J_i+1/J_i##, can I take one of them as an estimator for ##E[X_1+...+X_J]/E[J]## and say that converges to the same mean? It is conditional on a different information set...so that isn't obvious to me...

Last edited:
lalekl said:
Not sure how much of that applies to the question in this thread. My criticisms of the other answers here is that they are not real answers: they make stronger assumptions. There is no demonstration that picking arbitrary supports or parameters will ever get you to the right answer for an unknown distribution. I don't even believe probability statements can be assigned a priori in that way see for instance (https://en.wikipedia.org/wiki/Freiling's_axiom_of_symmetry), and focus on the criticisms of Freiling's intuition. I feel like you should be someone who is saying this.

this axiom of symmetry really is a different thread in my view. It feels awfully close to philosophy though, so watch out. I generally try hard to avoid getting in the weeds of individual distributions and in the process sidestep a lot of issues. This seems to be one of them -- though not one of the usual suspects I had in mind to be honest.

All I need here for the below is that the game has a finite second moment and is in the House's favor. That's it.

lalekl said:
On the other hand the problem stated in this thread is unambiguous, but has maybe a lot of subtle details. You are suggesting for this I try using: $$E[X_1+...+X_J]=E[J]E[X_1]$$
So obviously ##E[X_1]## is what we want to find. How do we get the other two terms from the information in the problem? Each day I see ##J_i## and ##M_i##, and I know that she experienced a return bounded by ##1-M_i/J_i## and ##1-M_i/J_i+1/J_i##, can I take one of them as an estimator for ##E[X_1+...+X_J]/E[J]## and say that converges to the same mean? It is conditional on a different information set...so that isn't obvious to me...

So these rescaling approach / linearity exploit just jumps off the page at me. You basically have a memo or table with results of a full 'day' of play from Bob or whomever:

##\begin{bmatrix}
\text{starting bankroll} = S_n& \Big \vert & \text{number of plays} \\
1 & & j^{(1)}\\
1 + a & & j^{(2)} \\
1 + b & & j^{(3)}\\
\vdots & \vdots \\
\end{bmatrix}##

Or something like that. (Note I'm using the change of variable from that prior post #13.)
For starters the table has ##r## entries in it. Maybe it ends up having countably infinite number. Maybe not.

The point is your gambler (Bob?) has different starting bankrolls each day. But ##\bar{Y}## is constant because the same games are played each day. We can rescale the starting bankroll to homogenieze the ##J## that we are estimating an expected value for. That's a backwards take at it. The forward take is, ##\bar{Y}## is fixed, and as you proportionately increase the starting bankroll ##S_n##, I know that the associated ##E\big[J\big]## grows linearly with this change -- this is exactly what Wald's Equality tells me.

But we have to work backward at things because this is a curious kind of inference problem. The idea is that each row in there should be rescaled to have a constant (let's go with constant = 1) in the left hand entry -- so for instance your second row, right hand entry would have ##\frac{j^{(2)}}{1+a}## in it.

With this you'll have standardised your table to in effect have the same amount of starting capital. Each right hand side entry thus has the same expected number of turns until failure -- which you don't directly see. But you can still sum all of the results on the RHS and divide by ##n##. Weak Law of Large numbers still applies -- even though the trials aren't cleanly iid -- they are still independent. And the average on the RHS tends to ##E\big[J\big]##. The Left Hand side is fixed. This allows you to estimate ##\bar{Y}## as the scaling needed to convert your ##E\big[J\big]## to the constant on LHS. It's a bit unpleasant to have these two be a product but that's life. Assymptotically the estimate is right, though because it is in product form, it may be biased for any finite number of estimates. (I can feel Jensen's Inequality possibly be an issue as we have ##E\big[J\big] = \frac{\text{constant}}{\bar{Y}}## in this setup and the mapping of ##u \to \frac{1}{u}## is strictly convex for ##u \gt 0## though maybe it isn't an issue, in any case it's not something I'm inclined to dwell on.)

- - - -
As another aside: I've thought about re-framing as a simple linear regression -- there's a closely related point about minimizing squared errors though for the linear relationship between expected results that Wald's Equality tells us, though doing this requires potentially a lot more work to fully justify and I'm not going to go that route.
- - - -

There's various ways to interpret/ justify the rescaling done above. Hopefully this is enough to play around with for a while.

There can of course be other nits about 'chunkiness' of betting stakes -- e.g. minimum bet size of 1 and after a long day at the casino you have ##30 \text{ cents}## in your pocket. Boundary issues are a nuisance but not something I'd worry too much about. The solution of course is to gather more data farther from the boundary to smooth the result out. Alternatively, and what I'd suggest doing to start, is for convenience assume there are not meaningful boundary issues, and come back and add them later on to complicate the model.

Dale and lalekl
Very interesting! So much to learn and explore here.

StoneTemplePython said:
The forward take is, ¯YY¯\bar{Y} is fixed, and as you proportionately increase the starting bankroll SnSnS_n, I know that the associated E[J]E[J]E\big[J\big] grows linearly with this change -- this is exactly what Wald's Equality tells me.

I think I am missing something to get ##S_n## and ##E[J]## increase linearly together. I see ##E[J]## has to increase linearly with ##E[Y_1...+Y_J]##. So does that rely on ##E[Y_1...+Y_J]## and ##S_n## increasing linearly together, and is that obvious?

Last edited: