Neat Workable Formula for Probability of Co-Occurrence in a Fixed Group

Ahmed Abdullah · May 19, 2015

Of total N people, m people are good at mathematics and c people are good at computer science. What is the expected number of people good at both mathematics and computer science? Or what is the probability that r people are good at both mathematics and computer science.

I have derived the formula. But it contains n!, p!, c! etc which is difficult to compute for large values (my real problem has large values for all of these). I am looking for a neat workable formula, I am hoping it exist since it is such a basic problem.

micromass · May 19, 2015

You don't have enough information to find this.

mathman · May 19, 2015

It very much depends on the dependence of good at one thing with good at the other. For example if c < m, but all people good at c are also good at m, then the answer is P=0 for r > c and P=1 otherwise.

Ahmed Abdullah · May 19, 2015

What if being good at math and computer science is independent of each other.

Adel Makram · May 20, 2015

Ahmed Abdullah said:

Of total N people, m people are good at mathematics and c people are good at computer science. What is the expected number of people good at both mathematics and computer science? Or what is the probability that r people are good at both mathematics and computer science.

I have derived the formula. But it contains n!, p!, c! etc which is difficult to compute for large values (my real problem has large values for all of these). I am looking for a neat workable formula, I am hoping it exist since it is such a basic problem.

That depends on whether the performance of people at math and computer science (CS) are dependent or not.

In your example, on average the probability of doing good in math is m/N and c/N for CS. Let's call the first one P(math) and the second one P(CS). Then the probaility of doing good in both math and computer science is P(math, CS) = P(math) P(CSImath), where P(CSImath) is the conditional probability of doing good in CS given the probability of doing good in math. Then once P(math, CS) is calculated, the expected number of people doing good in math and CS can be known from N. P(math, CS).

Adel Makram · May 20, 2015

Ahmed Abdullah said:

What if being good at math and computer science is independent of each other.

In case of independecny, then the problem becomes straightforward. P(math, CS)= P(math) P(CS).
The expected number of good people in both math and CS becomes N [P(math) P(CS)].

Adel Makram · May 20, 2015

Ahmed Abdullah said:

Or what is the probability that r people are good at both mathematics and computer science.

Now once we obtain the probability of being good in math and CS, P(math, CS), we can then calculate the probability of having r people who are good in both. This is a binomial probability distribution. In that case P(math, CS) is regarded as the success rate, let's call it (u) . So P(r)= C_N,ru^r(1-u)^N-r where C_N,r is the binomial coefficient or combination of N and r which is also equal to N!/r!(N-r)!

Ahmed Abdullah · May 20, 2015

Adel Makram said:

Now once we obtain the probability of being good in math and CS, P(math, CS), we can then calculate the probability of having r people who are good in both. This is a binomial probability distribution. In that case P(math, CS) is regarded as the success rate, let's call it (u) . So P(r)= C_N,ru^r(1-u)^N-r where C_N,r is the binomial coefficient or combination of N and r which is also equal to N!/r!(N-r)!

If we have 4 people with 2 people good at math and 2 people good at computer. According to your formula we get a non zero probability for r=3 .

Ahmed Abdullah · May 20, 2015

The formula I have derived is P(r)= C(N,r)*C(N-r , m-r) * C(N-m, c-r) / ( C(N,m) * C(N,n))
N=Total people
m=number of people good at math
c=number of people good at computer
r= number of people good at both

I am assuming it is correct but it contains large N, m and c. I guess i have to use Stirling approximation. Hoping for some direction.

Adel Makram · May 20, 2015

Ahmed Abdullah said:

If we have 4 people with 2 people good at math and 2 people good at computer. According to your formula we get a non zero probability for r=3 .

The binomial formula assumes the outcome of all possible probabilities of having people who are good in math and CS a complete probability space. In other words, if N` represents the number of people who are good in both, then N` should be the number appears in my formula not the original N which is the total number of people. N` is then caculated from the formula in the last line in the post #5.

Adel Makram · May 20, 2015

Ahmed Abdullah said:

If we have 4 people with 2 people good at math and 2 people good at computer. According to your formula we get a non zero probability for r=3 .

This is not correct, becuase in this example, we can not have more than 2 people who are good in math and CS all together.

Ahmed Abdullah · May 20, 2015

Actually I am interested in the cases where resources are limited (4 people, 2 math guy , 2 computer guy in the example)

mathman · May 20, 2015

Ahmed Abdullah said:

What if being good at math and computer science is independent of each other.

If independent, the probability of both is simply the product of the probabilities of each. The expected number is then the probability times the size of the population - result [itex]\frac{mc}{N}[/itex].

Ahmed Abdullah · May 20, 2015

mathman said:

If independent, the probability of both is simply the product of the probabilities of each. The expected number is then the probability times the size of the population - result [itex]\frac{mc}{N}[/itex].

I am interested in the cases where people are fixed. For example if we have 4 people , 2 are good at computer and 2 are math then it is impossible to have 3 people good at both computer and math. Which is different than your approach.

The formula I have derived is P(r)= C(N,r)*C(N-r , m-r) * C(N-m, c-r) / ( C(N,m) * C(N,n))
N=Total people
m=number of people good at math
c=number of people good at computer
r= number of people good at both

I am not sure whether the expected number match in your case and my case but the scenario is different. I am wondering is there any standard formula for this particular kind of problem.

mathman · May 21, 2015

I don't understand what you are looking for, but you seem to be overthinking. The formula I gave has an answer of 1 for good at both. Half the people are good at math and half at computer. Independence lads to 1/4 good at both Mean number is (1/4)(4)=1. If you looking for the distribution function, that is another question.

FactChecker · May 22, 2015

Ahmed Abdullah said:

I am interested in the cases where people are fixed. For example if we have 4 people , 2 are good at computer and 2 are math then it is impossible to have 3 people good at both computer and math. Which is different than your approach.

People being fixed does not matter. Their answer for expected value = c*m/N is correct. The expected value will always be within the range of possible results, so you do not have to worry about "3 people good at both computer and math". In your example, you would expect 2*2/4 = 1 person to be good at both math and computers. And you must admit that that is correct. However, the expected value may not be an integer, which is totally impossible to ever actually occur for an integer number of people.

Neat Workable Formula for Probability of Co-Occurrence in a Fixed Group

What is the definition of probability of co-occurrence?

How is the probability of co-occurrence calculated?

What is the difference between probability of co-occurrence and correlation?

How can the probability of co-occurrence be used in research?

What are some limitations of using probability of co-occurrence?

Similar threads

Hot Threads

Recent Insights