# Unorthodox probability theory

Here is a simple problem but with a lot of hidden difficulty:

We have a weather station somewhere in the country, with the aid of satelitte information and it gives us the probability of rain every day.
Suppose it is an average value p, for the season.

At some other place there is an Indian chief who used to do the job for us for hundreds of years until such time as he was overtaken by modern technology and suppose he assigns the probability q to rain.

We know that the modern weather station is more reliable so we are likely to discard the value q and we say that p is the correct probability.
But is that altogether true ?
Could it be that the correct probability in such situations is some function f(p,q) ? And if so how do we express it ?

Here is a short treatment I found:

http://www.mathpages.com/home/kmath267.htm

The author of mathpages investigates the issue to some depth.
But what happens if we have correlations also present ?
Any more references ?

f(p,q) = p*q / (p*q+(1-p)*(1-q))

on the assumption that the separate predictions are uncorrelated.
But some correlation is likely to exist in such situations, which in the case of modern weathermen v. Indian chief might be the identicality of certain methods used by both.

f(p,q) = p*q / (p*q+(1-p)*(1-q))
But it suggests this as the answer to a different question:If the weather station is correct p of the time and the chief q of the time, and these are uncorrelated and they both predict rain, what is the probability that it will rain? So if p=0.75 and q=0.75 then f(p,q)=0.9
Regarding the question you ask, if both say the probability of rain is 75% then the most reasonably thing seems to be to accept 75% as the probability.

But it suggests this as the answer to a different question:If the weather station is correct p of the time and the chief q of the time, and these are uncorrelated and they both predict rain, what is the probability that it will rain? So if p=0.75 and q=0.75 then f(p,q)=0.9
Regarding the question you ask, if both say the probability of rain is 75% then the most reasonably thing seems to be to accept 75% as the probability.

It does n't look outrageous to me if f(p,q) = 0.9, if the two pieces of evidence are uncorrelated.
Suppose police receive a UFO call. They say "it's another loonie this time of the night'. But when they receive a second call, they jump to attention. Don't they ?
That's because they believe the two phone calls are uncorrelated, which again is not a certainty as, likewise, the landing of the UFO itself is never a certainty.

But usually some degree of correlation exists, that's why I 'd like to know if there are some more references.

f(p,q)=0.75 holds when they are 100% correlated.
Think of football. If we both make the statement "Real Madrid are going to win this coming Sunday", does that count as two opinions ? More likely than not it does n't. More likely than not we read the same sports journal and we repeat the same thing.

Intuitevely speaking, I should think if observers A and B agree about something then f(p,q) is the maximum. If A is a good predictor and B is a known bad predictor then again f(p) cannot be increased.

Please write with references, also good words for google search can be useful.

In situations like this, non-zero correlations are to be expected. In fact repeated sampling from two populations X and Y such that $$\rho=Cov(X.Y)/\sigma_x\sigma_y$$ will usually not be zero. On average, if X and Y are independent, $$\rho$$ may well tend to zero with repeated sampling. However the concept of statistical independence is not defined in terms of correlations. For example, two non interacting probabilistic systems may follow the same physical laws and therefore appear statistically correlated, but in the absence of interaction, may be said to be independent. This may be shown operationally when perturbations of one system do not affect the other.

Formally, statistical independence of A and B is defined as $$P(A\cap B)=P(AB)$$.

http://www.sciencedirect.com/scienc...228cab704beb52587eb9f52e63fdd2de&searchtype=a

Last edited:
Does it make any sense to do that:

f(p,q) = p^a*q^b/( p^a*q^b + (1-p)^a*(1-q) ^b)

where a, b are experimental constants ?
The sum of f(p,q) over the various outcomes is not 1 but I could normalize f(p,q) to make it equal to 1. It's kind of heuristic.

If a=1, b=0 then f(p,q) = p and the auxiliary as it were estimator "q" is discarded.
If a=1, b=1 then p(p,q) = p*q/( p*q + (1-p)*(1-q))= the maximal case (independence)

Last edited:
Does it make any sense to do that:

f(p,q) = p^a*q^b/( p^a*q^b + (1-p)^a*(1-q) ^b)

where a, b are experimental constants ?
The sum of f(p,q) over the various outcomes is not 1 but I could normalize f(p,q) to make it equal to 1. It's kind of heuristic.

If a=1, b=0 then f(p,q) = p and the auxiliary as it were estimator "q" is discarded.
If a=1, b=1 then p(p,q) = p*q/( p*q + (1-p)*(1-q))= the maximal case (independence)

My point was that we may expect statistical correlations in situations where weather predictions are made by two "independent" methods since both methods are addressing the same object. The issue of statistical independence is simply not very relevant here. IMO, the relevant test is to compare the two proportions to see if the scientific method is significantly superior to the alternative method.

My point was that we may expect statistical correlations in situations where weather predictions are made by two "independent" methods since both methods are addressing the same object. The issue of statistical independence is simply not very relevant here. IMO, the relevant test is to compare the two proportions to see if the scientific method is significantly superior to the alternative method.

The scientific method is the first formula ?
I don't believe so.
It is scientific but under the conditions it specifies.

Here is an obvious case of failure:
Predictor A is a learned journalist, predictor B is Paul the Octopus (& don't spoil by saying you believed the octopus !).
In my heuristic formula it will sure turn out to be journalist = 1, octopus = 0

The test is provided us by logarithmic sums, to see where the maximum is and one does need of course statistical data of whatever it is he is measuring.

But what might be a good approximation to begin ?
There are some commercial names of neuron analyzing software that I imagine do this kind of thing. I don't know if I can write the names of those products.

The scientific method is the first formula ?
I don't believe so.
It is scientific but under the conditions it specifies.

Here is an obvious case of failure:
Predictor A is a learned journalist, predictor B is Paul the Octopus (& don't spoil by saying you believed the octopus !).
In my heuristic formula it will sure turn out to be journalist = 1, octopus = 0

The test is provided us by logarithmic sums, to see where the maximum is and one does need of course statistical data of whatever it is he is measuring.

But what might be a good approximation to begin ?
There are some commercial names of neuron analyzing software that I imagine do this kind of thing. I don't know if I can write the names of those products.

I don't follow your reasoning. Why do need a good approximation? Why do you say the "journalist=1"? Whether it's Paul the Octopus vs the journalist or the meteorologist vs the Indian, it's simply a case of comparing two proportions:correct/(correct+incorrect) predictions where "correct" means agreeing with the subsequently observed outcomes.

Comparing two proportions is a straightforward standard statistical test; not unorthodox at all. What exactly is your point?

Last edited:
I don't follow your reasoning. Why do need a good approximation? Why do you say the the "journalist=1"? Whether it's Paul the Octopus vs the journalist or the meteorologist vs the Indian, it's simply a case of comparing two proportions:correct/(correct+incorrect) predictions where "correct" means agreeing with the subsequently observed outcomes.

Comparing two proportions is a straightforward standard statistical test; not unorthodox at all. What exactly is your point?

I see.
You break the problem into components and say "correct weather station + correct indian = x%" and so on. And if Indian is infact worthless then it should turn out x = p.
Perhaps I was thinking along the lines of probability systems with lots of such components + observers, where the making of component cases is difficult.

EnumaElish
Homework Helper
I see.
You break the problem into components and say "correct weather station + correct indian = x%" and so on. And if Indian is infact worthless then it should turn out x = p.
Perhaps I was thinking along the lines of probability systems with lots of such components + observers, where the making of component cases is difficult.

The only thing unorthodox here is your notation. I would expect that the Indian would have a certain probability of correct and incorrect guesses: call that $$p_I, 1-p_I$$ and the same for the weatherman $$p_W, 1-p_W$$. From these data you could create a 2x2 table and calculate the odds ratio with a variance estimate. You could also look at the number of cases where I is correct and W is wrong, where they are both wrong, where they are both correct and where W is right and I is wrong.

In short, there's a lot of ways to analyze the data that are fairly standard. How you would you use the fact that we found, say, a probability of 0.043 cases where I is correct and W is wrong?

Last edited:
If you have M guessers and N states of matter they are guessing, it becomes difficult.
The system (sun-rain, weatherman-indian) is only 2x2 so it looks easy.
Among the N states of matter, or types of weather if you like, some are likely to be pretty infrequent, so how long do we say to our customers they have to wait for an answer before sufficient data are collected ? 3000 years ?
That's why approximations like the one in mathpages.com are being tried.

If you have M guessers and N states of matter they are guessing, it becomes difficult.
The system (sun-rain, weatherman-indian) is only 2x2 so it looks easy.
Among the N states of matter, or types of weather if you like, some are likely to be pretty infrequent, so how long do we say to our customers they have to wait for an answer before sufficient data are collected ? 3000 years ?
That's why approximations like the one in mathpages.com are being tried.

That's not what you asked. Please answer my question. How would you use the fact that for some, say small, percentage of cases the Indian is correct and the weatherman is incorrect even though the weatherman has a significantly higher correct prediction rate overall. (Note that correct/incorrect is a dichotomous variable regardless of the number of conditions involved.)

That's not what you asked. Please answer my question. How would you use the fact that for some, say small, percentage of cases the Indian is correct and the weatherman is incorrect even though the weatherman has a significantly higher correct prediction rate overall. (Note that correct/incorrect is a dichotomous variable regardless of the number of conditions involved.)

It adds something, like the 75% that became 90% a few posts above.
But some other times the weatherman is unfairly penalized.

The way I see it the Indian is useless if he says random things.
Above this extreme it should be true that one opinion adds to the other.

It adds something, like the 75% that became 90% a few posts above.
But some other times the weatherman is unfairly penalized.

The way I see it the Indian is useless if when in disagreement with the weatherman he says random things.
Above this extreme it should be true that one opinion adds to the other.

Your answers are not coherent. I asked you a very specific question. The Indian and the weatherman are presumed to be making independent predictions and the data is evaluated based on the facts of realized outcomes. I'm not interested in the way you see it. What do you mean the Indian is saying random things? Is that forgone conclusion? It has no place in a scientific investigation. We are only interested in the statistical evaluation of their predictions. I'll ask you again: How is the fact that sometimes the Indian and weatherman give discordant predictions in favor of the Indian useful when overall the weatherman is more reliable based on a statistical test?

Your answers are not coherent. I asked you a very specific question. The Indian and the weatherman are presumed to be making independent predictions and the data is evaluated based on the facts of realized outcomes. I'm not interested in the way you see it. What do you mean the Indian is saying random things? Is that forgone conclusion? It has no place in a scientific investigation. We are only interested in the statistical evaluation of their predictions. I'll ask you again: How is the fact that sometimes the Indian and weatherman give discordant predictions in favor of the Indian useful when overall the weatherman is more reliable based on a statistical test?

My answers are not coherent maybe because I have reached my present limits of knowledge about the problem, which after all I sort of made up in the first place without knowing the answer. I 'll think more about it in the light of the day.
I did have some examples with figures showing a certain improvement in an old ms dos spreadsheet of mine - impossible to dig out now.

Let's call the statement "rain", statement A and the statement "no rain", statement B.
Prior to the event there are four possible predictive statements A-A, A-B, B-A, and B-B, each with its respective probabilities (p1, 1-p1), (p2,1-p2) etc of positive and negative outcome.
So if we measure them, we could say those are the probabilities and there is no if and but.

But the task is to infer something without making the breakdown in this way.
For consider what happens when we have ten guessers.
Two possibilities again, A and B, but ten guessers.
Now we have a many component situation like as follows:

A-A-A-A-A-A-A-A-A-A
A-A-A-A-A-A-A-A-A-B
..............................
B-B-B-B-B-B-B-B-B-A
B-B-B-B-B-B-B-B-B-B

It's 2^10 = 1024 cases.
We need some 1000 occurences of each of the above to work out true ratios.
If the reports are daily, then we need 2^10 * 1000 days = 1,024,000 days = 2805 years !
And maybe some of the ABAB type words will not turn up many times so the significance of our calculations may be poor, even after 2805 years.

This is why we need an approximation, along the lines of mathpages.com.

Last edited:
EnumaElish
Homework Helper
The OP is related to the Bayesian "prior belief" paradigm/model in decision/game theory. I may have a prior belief that tomorrow it's going to rain because I dreamt about it last night -- the point is, although the belief may be empirically unsubstantiated or even irrational, people I interact with should take it into account. For example, I might cancel all my appointments because I believe it is going to rain with at least 40% probability. If my prior belief is "it'll rain with 50% probability if the forecast says there's a 90% chance, and it'll rain with only 25% probability if the forecast says there's less than a 90% chance" then the people who are expecting to meet with me tomorrow should take my "belief system" (as well as the forecast) into account when they are planning their day.

I have some clues and some lines from you and I 'm trying to put it together.
There are some extremes in the problem:

- First predictor makes reasonably good predictions, the other one just flips a coin. Then p2 = 0.5 and the formula from mathpages (the nature of which I 'm still investigating), becomes f = p1.
- First predictor is as good as can be again and the second one simply copies him every day. So p1=p2, the formula claims there 'll be an improvement but it's a false promise. f=p1 is again the truth in such a case.
- First predictor is good, the second is good only when copies and the rest of the time he is at the 0.5 level. Again no real improvement.

* in the real world (of people professionally engaged in such schemes), there are the assiduous copiers as also there is the partial identicality of methods used that makes it look as if copying is taking place

The good extreme:

- That seems to be the situation where the formula applies.

Notice also that if the second predictor turns out to be below the 0.5 value for true, then we simply call him a liar, and the true p2 is the 1-p2 which is > 0.5.

* it's no difficult matter to measure the raw efficiencies of many such predictors, and even have more than two outcomes in the "experiment". For ex. we could have five: sun-partially clouded-overcast-slight rain-heavy rain. The difficulty is to put them together.

The argument used by mathpages.com to derive the formula is hellishly tortuous.
Let me try to do this differently:

Each of the two forecasters, weatherman and Indian, follows his own set of methods.
We can't tell to what extent those methods are the same.
The ultimate case of independence is this:
Weatherman goes to sleep every night and he is visited by good fairy Bruchilde. Bruchilde knows what is gonna happen but she tells the truth in the dream with probability p using a random number generator in her laptop. Weatherman, who is really Bruchilde's spokesperson, proceeds to tell us his view when he wakes up and naturally he scores p percent of the time.
Indian similarly goes to sleep every night and he is visited by good fairy Matilde, who also knows what is gonna happen and she reveals the truth to the Indian with probability q, using another laptop.

In a situation such as this, at the end of the proceedings if the event A=Rain occurs N times, it will be scored by both N*p*q times and missed by both N*(1-p)*(1-q) times (plus-minus the random fluctuation).

But on any given day there are only three types of contest going on:

The AA v. BB, the AB v. BA or the BA v. AB.

So for the case of prediction AA v. prediction BB, it's PROB(A) = f(p,q) = p*q / (p*q + (1-p)*(1-q)).
While for the other cases the symmetric formulas apply.

In the real world now where there are no fairies, the real problem has to be attacked.
This in my opinion ought to be done as follows:

Suppose it is true that p > q (weatherman somewhat superior).
Then we write q' = 0.5 + L*(q-0.5) = q(L)
L is a number between 0 and 1. For L = 0 q' = 0.5 while for L = 1, q' = q.

Then f(p,q) becomes f(p,q') = p* (0.5+L*(q-.5))/(p(0.5+L*(q-0.5))+(1-p)* (0.5-L*(q-0.5)))

If L = 0 then f = p (meaning Indian does 't count). If L = 1 then both count.

We write down the daily forecasts and the daily outcomes.
We measure the p and q values.
We let L = 1 and we compute Y = log(f of the posteriously known correct guess) every time and we add the quantities Y. Then we compute I = exp (-sum of Y's/no of records).
Then we repeat with L = 0.99, 0.98 ... down to L = 0.
The value of L that makes I maximum is our best estimator.

This can be done with more than two participants in the contest also.
Are there any better ideas ?

Last edited:
We established the probability formula for the case of two independent probability forecasts:

f(p,q) = p*q / (p*q + (1-p)*(1-q))

In practice the two forecasts are never independent.
If p >= q > 0.5 then the "true" q is somewhere in between.
That is we have to replace q in the formula with a q', such that 0.5 <= q' <= q.

In the previous post I describe a way of computing a maximum likelihood estimator L so that:

q' = 0.5 + L*(q-0.5)

Are there any better ways of doing it ?

Last edited:
A lot depends on the methods and assumptions of deriving p and q. Using modern instruments only does not assure correct estimation.

I believe you start with the ideal formula (independet p-q) and apply maximum likelihood estimators.