Combining Probability Estimates for Event Prediction

BBITA · Mar 28, 2012

Firstly, hello from Sydney, Australia. It is terrific to have found your forum, it looks great and full of strong contributions.

I'd be much obliged if anyone out there can help me with this problem...

I am trying to combine two probability estimates for an event. I have many thousands of events to test against. Each of the two estimates is derived from two views of the same process (call one subjective, the other objective). When tested, both have predictive merit. However I expect there is some crossover, and thus if I can combine correctly I expect they should "inform" each other to produce a stronger single estimate.

I am looking for a way to a) estimate how correct each is overall; and b) then combine them in these proportions to (hopefully) produce a more robust single probability estimate.

Any help on this would be greatly appreciated.

Kind regards,

Bbita

(Though I have perused the board a little since I found it, I have not seen a similar query. My apologies if this has already been dealt with).

Stephen Tashi · Mar 28, 2012

BBITA said:

two probability estimates for an event

This is not a clear description of a situation involving probability. It isn't clear whether "an event" is an independent repetition of the same random variable (like tossing the same coin over and over again) or whether the thousands of events you have are all different random variables (like the outcomes of different sporting events). It isn't clear whether the "two estimates" are two estimates per event or just two estimates that are to be applied to all the events. It isn't clear whether the event has a yes-no, success-fail type of outcome or whether it is continuous quantity like the amount of rainfall in a day.

If you are doing a real world problem, it's simplest just to state the problem.

BBITA · Mar 28, 2012

Good morning Stephen (and all),

With the start of the F1 season, I am back on hobby project where I am trying to assess the viability of a performance model I have developed, the output being the probability of each car in a race winning, summing to one for all entrants.

I also have the implied public probabilities via betting odds, also summing to one after normalization.

I suspect that the model and the public pick each up on various elements, however the public would be more team / driver personality driven, whereas my model is more driven by performance data. The data cover many years.

With that context and from original post, I suspect one (the betting market) will be more correct / accurate overall than my model, but that my model may still contribute some valuable implied info through its odds. And so I am looking for a method to a) estimate how correct each is overall; and b) then combine both in proportion to their accuracy to (hopefully) produce a more robust single probability estimate.

Many thanks,

BBITA

Stephen Tashi · Mar 28, 2012

That's a clear description of the situation. The question of what makes one prediction better than another isn't precise ( e.g. is it crucial to get the probability of each car winning correct, for is it only important to get the probability of a few cars that are more likely to win correct), however that can be worked on.

This is a challenging problem because you are not dealing with repetitions of the same random variable. It's rather like evaluating weather predictions. The forecast will say 20% chance rain one day and 80% chance another. I recall asking (on another forum) the question of how one evaluates a model that produces probabilities as an output if we don't have empirical data where the inputs repeat. I didn't get a satisfactory answer! So I suppose we are on our own unless someone else chimes in.

The most famous "goodness of fit" statistic for predictions of probability is the chi-squared statistic. The usual scenario for that is that you have divided the outcomes in which the data "falls" into a number of mutually exclusive cases called "cells". The chi-square statistic is computed from two inputs per cell. We use the predicted probability (which is used to compute the expected frequency) of occurences in the cell and the number of occurences that in the cell that occur in the data.

To see if chi-square can be applied to your problem, we must ask if there is a good way to define "cells". One thought is to use the predicted probability itself to define cells. For example, we can define 10 cells. Cell_0 is the set of cars whose predicted probability of winning is less than 0.1. Cell_1 is the set of cars who predicted probability of winning is greater than 0.1 and less than 0.2, etc.

I don't know if that is going to work. The Chi-squared statistic has a tricky aspect called the "degrees of freedom". This is essentially the number of constraints that are imposed on the data. The fact that only 1 car wins a race imposes some constraints. We'll have to think about what the degrees of freedom is.

Of course, if you are a computer programmer or can use software that does simulations, it is straighforward to simulate data sets under the assumption that the predicted probabilities are correct. You can see how various statistics are distributed and side step many theoretical worries.

Are you familar with doing simulations?

BBITA · Mar 31, 2012

I have about 500 events over about 30 years. Most are repetitive editions of the same circuit. I have drawn correlations between practice and qualifying times, record of recent performance, created a driver's performance coefficient, and normalised for weather conditions.

The probabilities generated through a logistic regression correspond well with the outcome.
(Time intervals are not a reliable candidate for dependent variable and preclude the use of binary logistic regression).

However, when there is a divergence in estimated probabilities between my model and the public's odds, experience shows that the public is usually more correct.

Thus my hope to combine model generated probabilities with those provided by the public.

Stephen Tashi · Mar 31, 2012

BBITA said:

Thus my hope to combine model generated probabilities with those provided by the public.

If your are asking whether there is some theroretical result that gives a definite formula for combining two probability estimates and guarantees that the method is a good way or the best way of combining them then the answer is "No".

If you propose a specific method of combining the estimates, there are ways of testing how well the method works versus the historical data.

You didn't say whether you are familiar with doing computer simulations.

To say that we want to "combine" two estimates is too general a problem to solve. To get a mathematical answer, the usual method would be to specify a "family" of ways of combining them where the members of the family are distinguished by values of some parameters. Then we solve (theoretically or empirically) for the best value of the parameters.

For example, a simplistic way to combine two probabilities to take a "convex combination". If p1 and p2 are probabilities and alpha is a number between 0 and 1 then p = (alpha)(p1) + (1- alpha)(p2) is a number between 0 and 1.

A way to empirically solve for alpha would be to do simulations. For each race you have your own estimated proabilities of winning for each car and the public probabilities. Pick various values of alpha (eg. 0.01, 0.02,.0.03,...etc.) For each value of alpha, compute the convex combination of your probabilities and the public's. Use this set of "combined" probabilities to simulate all the historical races. Repeat this simulation thousands of times. Then analyze how well the particular value of alpha did. To do the analysis, you need some way to measure how well the combined probabilities fit the results of the simulation. The chi-square method that I mentioned in the previous post might work for that.

Another way of evaluating how well a particular alpha did is to assume that you will use a particular betting strategy whether you really intend to have one or not. (An interesting popularized treatment of betting strategies is the book "Fortune's Formula".) Simulate using the betting strategy with the combined probabilities thousands of times and see how much it wins or loses on the average with a particular value of alpha.

If you want to implement your observation that the public is more likely to be correct when its estimates diverge from your model then you can try more complicated formulas. For example, suppose your model's probability for a car winning is p1 and the public's probability is p2. Let beta = sqrt( abs(p1 - p2)). Let the rule for combination be p = (alpha)(1-beta)p1 + (1-alpha)(beta)p2 where alpha is some number in the interval [0,1]. If I've visualized this correctly, then when alpha is near 1/2 and you and the public agree, we get a p near p1. If you and the public disagree then we get a p that is more heavily weighted toward p2, the public's probability.

Of course there are an infinity of other formulas that implement the same qualitative behavior. You can also use more than one parameter in a formula. You can also make up formulas that give the probability for car J winning as functions of probabilities (your's and the public's) for all the other cars, instead of just for their estimates of car J. So picking the best way out of "all possible ways" appears to be too ambitious an undertaking.

Combining Probability Estimates for Event Prediction

1. How do you combine two probability estimates?

2. What is the formula for combining two probability estimates?

3. Can you combine more than two probability estimates?

4. What are some common applications of combining two probability estimates?

5. Are there any limitations to combining two probability estimates?

Similar threads

Hot Threads

Recent Insights