Probability of Binomial Variable ≥ Another Binomial Variable

SirTristan
Messages
12
Reaction score
0
If two binomially distributed variables are generated as paired events, how often will the variable with p=X be greater than the variable with p=Y? Also what is the "equity" if ties are counted as .5 for each?

For instance in Excel I generated 10,000 numbers with p=.8 and 10,000 with p=.6. The first set of numbers was greater 3,173 times, they were equal 5,642 times, and the second set was greater 1,185 times. So p=.8 was greater than p=.6 31.73% of the time. Counting ties as equal the total equity for the first set was (3173+5642/2)/10000=.5994.

Repeating this for p=.7 and p=.4, the first was greater 4,204 times, they were equal 4,610 times, and the second set was greater 1,186 times. p=.7 was greater than p=.4 42.04% of the time, and the "equity" for the first variable was (4204+4610/2)/10000=.6509.
 
Physics news on Phys.org
Hi SirTristan! :smile:

What you are looking for is

P\{X\geq Y\}=P\{X-Y\geq 0\}

Thus you must know the distribution of X-Y. Sadly, I do not know any nice formula for this. However, if X~B(n,p) and Y=B(m,q) and n and m is large, then we can appy the Central Limit Theorem.

Indeed, if n is large, then X~N(np,np(1-p)) and if m is large then Y~N(mq,mq(1-q)). Thus X-Y~N(np+mq,np(1-p)+mq(1-q)). Thus if Z is standard normal, then you need to calculate

P\left\{Z\geq \frac{-np-mq}{\sqrt{np(1-p)+mq(1-q)}}\right\}

which can be easily done by using some kind of table...
 
Looks like you mean Bernoulli variables (Binomial with n=1). For this case it's easy to set up a 2x2 table, e.g. with P[X=1]=p and P[Y=1]=q you have P[X=0,Y=1]=(1-p)q etc and thus P[X>Y]=p(1-q) and P[X=Y]=pq+(1-p)(1-q) which should match reasonably closely the percentages you found by Monte Carlo simulation. You may like to try a Chi square test to see if the observations are close enough to the predictions.
 
Thank you very much guys :)

bpet, yes those numbers seem to match the simulations quite precisely. More simple math than I expected :) Here's what those formulas give:
Code:
P	Q	X>Y	X=Y	Equity
0.8	0.6	0.32	0.56	0.6
0.7	0.4	0.42	0.46	0.65
That's almost exactly the simulation numbers.

I'm having trouble with micromass's formula though - perhaps I'm doing something wrong? Since n=m=1, here's what I get for the numerator [-p-q] and the denominator [sqrt(p*(1-p)+q*(1-q))], the Z score, and the probability of being higher than that Z score:
Code:
P	Q	Num	Den	Z	Probability
0.8	0.6	-1.4	.6325	-2.2136	.9866
0.7	0.4	-1.1	.6708	-1.6398	.9495
0.6	0.8	-1.4	.6325	-2.2136	.9866
Perhaps I'm misapplying the formula, because from how I'm gathering it, when P is less than Q gives the same result as when P is higher than Q. Shouldn't it be that X-Y is distributed with a mean of np-mq rather than np+mq? And the numerator should be -(np-mq) rather than -np-mq? Using that numerator gives me:
Code:
P	Q	Num	Den	Z	Probability
0.8	0.6	-0.2	.6325	-.3162	.6241
0.7	0.4	-0.3	.6708	-.4472	.6726
0.6	0.8	0.2	.6325	.3162	.3759
These numbers make more sense to me, although I think they're a bit less accurate relative to the simulation.
 
I'm sorry SirTristan, you are correct! The numerator should indeed be -(np-mq).

Also, the formula I gave you will only approximate the real probability for large n and m. If you pick n=m=1, then this will be highly inaccurate, as your example shows!

Maybe you could try the same thing for n,m>20 or so, you'll see that the formula approximates your simulation quite closely!
 
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...

Similar threads

Back
Top