Probability of Binomial Variable ≥ Another Binomial Variable

Click For Summary

Discussion Overview

The discussion revolves around the probability of one binomially distributed variable being greater than another, specifically when generated as paired events. Participants explore the implications of different probabilities (p) for these variables and examine both simulation results and theoretical approaches to calculate these probabilities.

Discussion Character

  • Exploratory
  • Mathematical reasoning
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant presents simulation results showing the frequency with which one binomial variable exceeds another based on different probabilities.
  • Another participant suggests using the Central Limit Theorem to approximate the distribution of the difference between the two binomial variables.
  • A third participant clarifies that the variables in question are Bernoulli variables and proposes a method to calculate the probabilities using a 2x2 table.
  • Further calculations are shared, comparing simulation results with theoretical predictions, indicating a close match.
  • One participant expresses confusion regarding the application of a formula for calculating probabilities, questioning the signs in the numerator and denominator.
  • Another participant acknowledges the confusion and confirms the need for adjustments in the formula, emphasizing that the approximation is less accurate for small sample sizes.

Areas of Agreement / Disagreement

Participants generally agree on the methods for calculating probabilities and the close alignment of theoretical predictions with simulation results. However, there is disagreement regarding the application of specific formulas, particularly in the context of small sample sizes, which remains unresolved.

Contextual Notes

Limitations include the dependence on sample size for the accuracy of the Central Limit Theorem approximation, as well as the need for clarification on the correct application of formulas in specific cases.

Who May Find This Useful

This discussion may be useful for those interested in probability theory, statistical methods, and the application of binomial distributions in simulations and theoretical contexts.

SirTristan
Messages
12
Reaction score
0
If two binomially distributed variables are generated as paired events, how often will the variable with p=X be greater than the variable with p=Y? Also what is the "equity" if ties are counted as .5 for each?

For instance in Excel I generated 10,000 numbers with p=.8 and 10,000 with p=.6. The first set of numbers was greater 3,173 times, they were equal 5,642 times, and the second set was greater 1,185 times. So p=.8 was greater than p=.6 31.73% of the time. Counting ties as equal the total equity for the first set was (3173+5642/2)/10000=.5994.

Repeating this for p=.7 and p=.4, the first was greater 4,204 times, they were equal 4,610 times, and the second set was greater 1,186 times. p=.7 was greater than p=.4 42.04% of the time, and the "equity" for the first variable was (4204+4610/2)/10000=.6509.
 
Physics news on Phys.org
Hi SirTristan! :smile:

What you are looking for is

P\{X\geq Y\}=P\{X-Y\geq 0\}

Thus you must know the distribution of X-Y. Sadly, I do not know any nice formula for this. However, if X~B(n,p) and Y=B(m,q) and n and m is large, then we can appy the Central Limit Theorem.

Indeed, if n is large, then X~N(np,np(1-p)) and if m is large then Y~N(mq,mq(1-q)). Thus X-Y~N(np+mq,np(1-p)+mq(1-q)). Thus if Z is standard normal, then you need to calculate

P\left\{Z\geq \frac{-np-mq}{\sqrt{np(1-p)+mq(1-q)}}\right\}

which can be easily done by using some kind of table...
 
Looks like you mean Bernoulli variables (Binomial with n=1). For this case it's easy to set up a 2x2 table, e.g. with P[X=1]=p and P[Y=1]=q you have P[X=0,Y=1]=(1-p)q etc and thus P[X>Y]=p(1-q) and P[X=Y]=pq+(1-p)(1-q) which should match reasonably closely the percentages you found by Monte Carlo simulation. You may like to try a Chi square test to see if the observations are close enough to the predictions.
 
Thank you very much guys :)

bpet, yes those numbers seem to match the simulations quite precisely. More simple math than I expected :) Here's what those formulas give:
Code:
P	Q	X>Y	X=Y	Equity
0.8	0.6	0.32	0.56	0.6
0.7	0.4	0.42	0.46	0.65
That's almost exactly the simulation numbers.

I'm having trouble with micromass's formula though - perhaps I'm doing something wrong? Since n=m=1, here's what I get for the numerator [-p-q] and the denominator [sqrt(p*(1-p)+q*(1-q))], the Z score, and the probability of being higher than that Z score:
Code:
P	Q	Num	Den	Z	Probability
0.8	0.6	-1.4	.6325	-2.2136	.9866
0.7	0.4	-1.1	.6708	-1.6398	.9495
0.6	0.8	-1.4	.6325	-2.2136	.9866
Perhaps I'm misapplying the formula, because from how I'm gathering it, when P is less than Q gives the same result as when P is higher than Q. Shouldn't it be that X-Y is distributed with a mean of np-mq rather than np+mq? And the numerator should be -(np-mq) rather than -np-mq? Using that numerator gives me:
Code:
P	Q	Num	Den	Z	Probability
0.8	0.6	-0.2	.6325	-.3162	.6241
0.7	0.4	-0.3	.6708	-.4472	.6726
0.6	0.8	0.2	.6325	.3162	.3759
These numbers make more sense to me, although I think they're a bit less accurate relative to the simulation.
 
I'm sorry SirTristan, you are correct! The numerator should indeed be -(np-mq).

Also, the formula I gave you will only approximate the real probability for large n and m. If you pick n=m=1, then this will be highly inaccurate, as your example shows!

Maybe you could try the same thing for n,m>20 or so, you'll see that the formula approximates your simulation quite closely!
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 80 ·
3
Replies
80
Views
8K
  • · Replies 6 ·
Replies
6
Views
12K
  • · Replies 4 ·
Replies
4
Views
3K