Confidence in investment yield, betting

neutrinomass · Feb 1, 2012

Hi everybody,

I'm a physicist, not a statistician, by training and my knowledge of statistics is severely limited. I indulge in sports betting where I typically bet on the exact score of a game. To see whether my approach works I have to know whether I'm winning or not - to be precise, I must know whether my small profit is genuine or merely a statistical fluctuation. In other words I want to know, given the number of games and the observed profit, what is the probability that the average yield per bet is positive? If I were betting just on events with two possible outcomes I think I could just use the information on [1] but how does this generalise to a larger number of outcomes (i.e. so that 50% correct is not the break-even point) ?

I know this is pretty broad and I don't really expect a page-long reply on how exactly to go about doing this (though it would be greatly appreciated). I'd be grateful if you could just steer me in the right direction -- what should I be googling for?

Thanks in advance.

[1] http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

SW VandeCarr · Feb 1, 2012

neutrinomass said:

Hi everybody,

I'm a physicist, not a statistician, by training and my knowledge of statistics is severely limited. I indulge in sports betting where I typically bet on the exact score of a game. To see whether my approach works I have to know whether I'm winning or not - to be precise, I must know whether my small profit is genuine or merely a statistical fluctuation. In other words I want to know, given the number of games and the observed profit, what is the probability that the average yield per bet is positive? If I were betting just on events with two possible outcomes I think I could just use the information on [1] but how does this generalise to a larger number of outcomes (i.e. so that 50% correct is not the break-even point) ?

I know this is pretty broad and I don't really expect a page-long reply on how exactly to go about doing this (though it would be greatly appreciated). I'd be grateful if you could just steer me in the right direction -- what should I be googling for?

You could google "Gambler's Ruin". Seriously, breaking even in the long run is about as good as you will do in most pure gambling situations where no skill is involved. Blackjack is one card game where good winning strategies exist. In most types of poker, I think the main skills are knowing when to hold, when to fold and how to bluff. In nearly all casino games, the odds at least slightly favor the house. In sports betting, the betting line tends to even out the house's risk (and the intention is to slightly favor the house with lines like -3.5 and -7.5 in football) I don't know how you bet on exact scores.

EDIT: BTW, if you have gains, they are only "real" when you take them (cash out). Otherwise, they are imaginary.

Stephen Tashi · Feb 1, 2012

neutrinomass said:

To see whether my approach works I have to know whether I'm winning or not - to be precise, I must know whether my small profit is genuine or merely a statistical fluctuation.

Of course, you can't know. The best you can do is get some statement involving a probability. The two broad choices of statistical methods are 1) Bayesian and 2) Frequentist. With frequentist statistics, you assume the results are due to a certain probability model and compute things about the probability of the data given that we accept that particular model. With Bayesian statistics, you assume there is a probability distribution over the possible probability models and you compute things having to do with the probability of a model being the true one given the observed data.

"Confidence" has a technical meaning in frequentist statistics and it used in connection with the problem of estimating parameters. (The Wikipedia entry on "confidence interval" explains this. There are also the distinct concepts of "prediction intervals" and "credible intervals".

If you want to do the typical Frequentist statistics approach to deciding if your strategy is better than random, you should look up "Hypothesis Testing". If you are trying to estimate the profit of a gambling strategy ( random or otherwise) then "Confidence intervals" might be relevant but the meaning of these intervals is not what most laymen think it is.

In other words I want to know, given the number of games and the observed profit, what is the probability that the average yield per bet is positive?

Calculating that would depend on the rules for your betting strategy and what probability model you assume for the data. You have stated the fact that a score can have various values and that you bet on the exact score, but you didn't state a probability distribution for the scores, the payoff for making a correct bet and what kind of strategy we would assume for a unskilled guesser. For example, an unskilled guesser might be defined as one who always guesses the mean score, or one who picks a random score, or one who picks the score that happened in the previous game. Does an unskilled gambler always bet the same amount or does he bet random amounts?

neutrinomass · Feb 2, 2012

Stephen Tashi, thank you for the reply, it was very helpful. I'm not sure at this stage if my question is reasonably well-defined because I clearly do not know things like the probability distribution of the possible outcomes and even the odds are not constant from event-to-event. I'll see though if I can make any sense of the Wikipedia (I suspect just playing more is an easier way to find out what way the balance swings).

Regarding SW VandeCarr, you are right, bookmakers take a cut so any random strategy is doomed to fail. As far as sports betting goes however, the idea is to pick the winner frequently enough so that your profits compensate for the bookie's take. This is not something you can do with casino games.

Stephen Tashi · Feb 2, 2012

My conception of sports bookmaking is that the odds offered on bets are determined empirically. If the bookmakers can change the odds as the betting develops, I understand how this is possible. For example if an event has 3 outcomes, the book maker can look at the pool of existing bets and determine if he will loose money on any result. If he risks a loss on a result then he can offer different odds to attract more bets to other results. I suppose bookmakers can also hedge against losses by effectively buying and selling bets they have made to other bookmakers or making their own bets with other bookmakers.

Of course, that's just my conception. Is it correct?

neutrinomass · Feb 2, 2012

You are absolutely right. Sports bookmakers typically make no attempt to predict winners and/or the probability distributions of outcomes. Rather, they rely on public opinion and, based on the money coming in from clients, adjust their odds so that they win regardless of the outcome. So if the public is betting the same amounts on player A as they are on B, the bookmaker will offer 1.9 odds for both, keeping a handsome 10% for themselves regardless of the outcome. If it emerges that A had an injury in practice (or something) and the public starts placing more bets on B, the bookmaker will lower the odds offered for B and increase those for A.

I don't know about hedging at other bookies though. The odds don't typically vary across bookmakers enough to allow you to do that and it would also mean falling victim to the other bookmaker's commission as well!

Tosh5457 · Feb 2, 2012

You'll have to assume the distribution of your profit/loss is gaussian.

So basically, first you'll need to calculate the mean and standard deviation of the sample you got so far.

Second, you'll need to calculate the confidence interval, and a certainty of 95% seems ok. For 95% certainty, the interval is:

[tex](\frac{-1.96*s}{\sqrt{n}}+m,\frac{1.96*s}{\sqrt{n}}+m)[/tex]

where s is the standard deviation, n is the number of results you got on your sample and m is the mean.

That's the interval where your real profitability will be, with a 95% certainty. If the lower value is less than 0, then you don't know if you're a winner or a loser. You'll find that for low n's, this will be the case, especially if the profitability you have now is low too.

neutrinomass · Feb 3, 2012

Tosh5457 said:

You'll have to assume the distribution of your profit/loss is gaussian.

So basically, first you'll need to calculate the mean and standard deviation of the sample you got so far.

Second, you'll need to calculate the confidence interval, and a certainty of 95% seems ok. For 95% certainty, the interval is:

[tex](\frac{-1.96*s}{\sqrt{n}}+m,\frac{1.96*s}{\sqrt{n}}+m)[/tex]

where s is the standard deviation, n is the number of results you got on your sample and m is the mean.

That's the interval where your real profitability will be, with a 95% certainty. If the lower value is less than 0, then you don't know if you're a winner or a loser. You'll find that for low n's, this will be the case, especially if the profitability you have now is low too.

Thank you very much, this is very helpful (and annoyingly simple). Apart from central limit theorems and such this approach seems reasonable because it also captures the fact that the profitability of riskier plays (which will increase s) will be harder to determine, I will give it a go tonight though I suspect that my sample size is too small. Thanks again guys (or girls)!

Stephen Tashi · Feb 3, 2012

That's the interval where your real profitability will be, with a 95% certainty.

No, it isn't.

- not if "certainty" means probability and m is the observed sample mean. This is why the layman's interpretation of "confidence inteval" has a problem.

Stephen Tashi · Feb 4, 2012

What I'm alluding to is the point made by the current Wikipedia article on "confidence interval", namely:

A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained. (An interval intended to have such a property, called a credible interval, can be estimated using Bayesian methods; but such methods bring with them their own distinct strengths and weaknesses).

Being a Bayesian, I'm sympathetic to the layman's interpretation of "confidence intervals". But I don't think the field of statistics should be granted any exceptions to the rules of logic and proof that are practiced in other branches of mathematics. We should be clear about the meaning of a confidence interval.

Let's say you do the confidence interval calculation suggested by Tosh5457 and use the numerical values for s and m that are computed from your data, so you get a specific numerical interval like ( 15.25, 23.22 ). It is not valid to make the assertion "There is a 0.95 probability that the mean of my distribution is in (15.25.23.22)" .

One way to think about this is to define the events
A = the population mean is in (15.25,23.22)
B = I observed the following data ...(list the data that you have).

If we assert that the probability of A given B = P(A|B) = 0.95 then we assert that
P(A|B) = P(A and B)/ P(B) = 0.95. However there is no information in the problem about a probability distribution for B. There is also no information about the probability distribution of A or P(B|A) or P(A and B). The "frequentist" statement of the problem doesn't grant that there is anything probabilistic about the location of the mean of the population. From that point of view the mean is in a single definite, but unknown place.

If we take the Bayesian point of view and grant that P(A and B) and P(B) are meaningful ideas, then it is still incorrect to assert P(A|B) = 0.95. You can make up various distributions for the mean of the normal distribution , get various different ratios for P(A and B)/P(B) and thus get different answers for P(A|B).

Making up a "prior distribution" for the mean let's you compute a "credible interval". From th Bayesian point of view, it let's specialists incorporate additional facts into the analysis or even express the idea of "total ignorance". For example you may know that there is an upper limt for how much you could lose and an upper limit for how much you could gain on a bet. The mean of your bets would lie between those bounds. As "total ignorance" you could say "Let's assume the mean result of a bet is uniformly distributed between the lower and upper limit." This let's you compute a compute a "credible interval". ( You can also compute P(A|B) if you are interested in the particular interval given by event A. )

Often when you pick a prior distribution that expresses "total igonrance" , the endpoints of a "credible interval" and those of a "confidence interval" associated with the same probability are approximately the same. To me, this is what saves the common misinterpretation of confidence intervals from leading to disasters in practical applications.

Tosh5457 · Feb 6, 2012

Stephen Tashi said:

What I'm alluding to is the point made by the current Wikipedia article on "confidence interval", namely:

Being a Bayesian, I'm sympathetic to the layman's interpretation of "confidence intervals". But I don't think the field of statistics should be granted any exceptions to the rules of logic and proof that are practiced in other branches of mathematics. We should be clear about the meaning of a confidence interval.

Let's say you do the confidence interval calculation suggested by Tosh5457 and use the numerical values for s and m that are computed from your data, so you get a specific numerical interval like ( 15.25, 23.22 ). It is not valid to make the assertion "There is a 0.95 probability that the mean of my distribution is in (15.25.23.22)" .

One way to think about this is to define the events
A = the population mean is in (15.25,23.22)
B = I observed the following data ...(list the data that you have).

If we assert that the probability of A given B = P(A|B) = 0.95 then we assert that
P(A|B) = P(A and B)/ P(B) = 0.95. However there is no information in the problem about a probability distribution for B. There is also no information about the probability distribution of A or P(B|A) or P(A and B). The "frequentist" statement of the problem doesn't grant that there is anything probabilistic about the location of the mean of the population. From that point of view the mean is in a single definite, but unknown place.

If we take the Bayesian point of view and grant that P(A and B) and P(B) are meaningful ideas, then it is still incorrect to assert P(A|B) = 0.95. You can make up various distributions for the mean of the normal distribution , get various different ratios for P(A and B)/P(B) and thus get different answers for P(A|B).

Making up a "prior distribution" for the mean let's you compute a "credible interval". From th Bayesian point of view, it let's specialists incorporate additional facts into the analysis or even express the idea of "total ignorance". For example you may know that there is an upper limt for how much you could lose and an upper limit for how much you could gain on a bet. The mean of your bets would lie between those bounds. As "total ignorance" you could say "Let's assume the mean result of a bet is uniformly distributed between the lower and upper limit." This let's you compute a compute a "credible interval". ( You can also compute P(A|B) if you are interested in the particular interval given by event A. )

Often when you pick a prior distribution that expresses "total igonrance" , the endpoints of a "credible interval" and those of a "confidence interval" associated with the same probability are approximately the same. To me, this is what saves the common misinterpretation of confidence intervals from leading to disasters in practical applications.

I didn't mean certainty as probability. I know it's not a probability, but just because I've been told so

Thanks for the explanation, I now see why it's not a probability, without the Bayesian interpretation.

neutrinomass · Feb 8, 2012

Ok, for the record I think that the simplest thing for my case is to compare against a random strategy by taking a list of the odds {a_1, a_2, ...} and money {b_1, b_2,...} placed on each bet and writing a program that will bet randomly. Run it a few times and histogram the profits. Presumably this will be a Gaussian with mean m and standard deviation s. Then by comparing the distance of my profit from the mean in units of the standard deviation I can obtain a rough picture of whether my system (probably) works or whether the data are insufficient to rule out statistical fluctuations. Thank you, you have been very helpful.

The Investor · Feb 13, 2012

I will assume the bets are not correlated, so you are not betting on multiple outcomes within the same event for instance.

For each bet, record the odds bet at, and what you think the true probability should be, then run a Monte Carlo simulation based on your assumed probabilities.

If you think the prices are more than 5% out say, you will make money pretty quickly if you are correct (which is extremely unlikely). Even being able to systematically break even long term would require a high level of know-how and/or skill. And of course many who think they are systematic winners have just been very lucky.

The bigger your assumed edge is, the smaller the sample you will need to test your accuracy. If your assumed edge is, 0.5% you could have 10,000 bets and still not be sure whether it is actually zero.

Again, you can run a Monte Carlo simulation tossing a biased coin that pays out at evens, but has a 50.5% chance of landing on heads. You bet a fixed amount on heads each time, and see where you end up after 10,000 tosses. Play around with it by changing the bias, and using Kelly staking etc.

Of course you don't need to do a simulation to work it out, but actually seeing some randomly generated graphs makes it clearer what is happening and what the range of outcomes is. You will see graphs that appear to be trending up or down. A lot of this stuff can be counter-intuitive.

Confidence in investment yield, betting

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight