Neat Workable Formula for Probability of Co-Occurrence in a Fixed Group

  • Thread starter Ahmed Abdullah
  • Start date
  • Tags
    Probability
In summary: C(N,r)*C(N-r , m-r) * C(N-m, c-r) / ( C(N,m) * C(N,n)) N=Total peoplem=number of people good at mathc=number of people good at computer r= number of people good at bothIn summary, there is a nonzero probability of having r people who are good at both math and computer science.
  • #1
Ahmed Abdullah
203
3
Of total N people, m people are good at mathematics and c people are good at computer science. What is the expected number of people good at both mathematics and computer science? Or what is the probability that r people are good at both mathematics and computer science.

I have derived the formula. But it contains n!, p!, c! etc which is difficult to compute for large values (my real problem has large values for all of these). I am looking for a neat workable formula, I am hoping it exist since it is such a basic problem.
 
Physics news on Phys.org
  • #2
You don't have enough information to find this.
 
  • #3
It very much depends on the dependence of good at one thing with good at the other. For example if c < m, but all people good at c are also good at m, then the answer is P=0 for r > c and P=1 otherwise.
 
  • #4
What if being good at math and computer science is independent of each other.
 
  • #5
Ahmed Abdullah said:
Of total N people, m people are good at mathematics and c people are good at computer science. What is the expected number of people good at both mathematics and computer science? Or what is the probability that r people are good at both mathematics and computer science.

I have derived the formula. But it contains n!, p!, c! etc which is difficult to compute for large values (my real problem has large values for all of these). I am looking for a neat workable formula, I am hoping it exist since it is such a basic problem.
That depends on whether the performance of people at math and computer science (CS) are dependent or not.

In your example, on average the probability of doing good in math is m/N and c/N for CS. Let's call the first one P(math) and the second one P(CS). Then the probaility of doing good in both math and computer science is P(math, CS) = P(math) P(CSImath), where P(CSImath) is the conditional probability of doing good in CS given the probability of doing good in math. Then once P(math, CS) is calculated, the expected number of people doing good in math and CS can be known from N. P(math, CS).
 
Last edited:
  • #6
Ahmed Abdullah said:
What if being good at math and computer science is independent of each other.
In case of independecny, then the problem becomes straightforward. P(math, CS)= P(math) P(CS).
The expected number of good people in both math and CS becomes N [P(math) P(CS)].
 
  • #7
Ahmed Abdullah said:
Or what is the probability that r people are good at both mathematics and computer science.
Now once we obtain the probability of being good in math and CS, P(math, CS), we can then calculate the probability of having r people who are good in both. This is a binomial probability distribution. In that case P(math, CS) is regarded as the success rate, let's call it (u) . So P(r)= CN,rur(1-u)N-r where CN,r is the binomial coefficient or combination of N and r which is also equal to N!/r!(N-r)!
 
  • #8
Adel Makram said:
Now once we obtain the probability of being good in math and CS, P(math, CS), we can then calculate the probability of having r people who are good in both. This is a binomial probability distribution. In that case P(math, CS) is regarded as the success rate, let's call it (u) . So P(r)= CN,rur(1-u)N-r where CN,r is the binomial coefficient or combination of N and r which is also equal to N!/r!(N-r)!

If we have 4 people with 2 people good at math and 2 people good at computer. According to your formula we get a non zero probability for r=3 .
 
  • #9
The formula I have derived is P(r)= C(N,r)*C(N-r , m-r) * C(N-m, c-r) / ( C(N,m) * C(N,n))
N=Total people
m=number of people good at math
c=number of people good at computer
r= number of people good at both

I am assuming it is correct but it contains large N, m and c. I guess i have to use Stirling approximation. Hoping for some direction.
 
  • #10
Ahmed Abdullah said:
If we have 4 people with 2 people good at math and 2 people good at computer. According to your formula we get a non zero probability for r=3 .
The binomial formula assumes the outcome of all possible probabilities of having people who are good in math and CS a complete probability space. In other words, if N` represents the number of people who are good in both, then N` should be the number appears in my formula not the original N which is the total number of people. N` is then caculated from the formula in the last line in the post #5.
 
  • #11
Ahmed Abdullah said:
If we have 4 people with 2 people good at math and 2 people good at computer. According to your formula we get a non zero probability for r=3 .
This is not correct, becuase in this example, we can not have more than 2 people who are good in math and CS all together.
 
  • #12
Actually I am interested in the cases where resources are limited (4 people, 2 math guy , 2 computer guy in the example)
 
  • #13
Ahmed Abdullah said:
What if being good at math and computer science is independent of each other.
If independent, the probability of both is simply the product of the probabilities of each. The expected number is then the probability times the size of the population - result [itex]\frac{mc}{N}[/itex].
 
  • #14
mathman said:
If independent, the probability of both is simply the product of the probabilities of each. The expected number is then the probability times the size of the population - result [itex]\frac{mc}{N}[/itex].
I am interested in the cases where people are fixed. For example if we have 4 people , 2 are good at computer and 2 are math then it is impossible to have 3 people good at both computer and math. Which is different than your approach.

The formula I have derived is P(r)= C(N,r)*C(N-r , m-r) * C(N-m, c-r) / ( C(N,m) * C(N,n))
N=Total people
m=number of people good at math
c=number of people good at computer
r= number of people good at both

I am not sure whether the expected number match in your case and my case but the scenario is different. I am wondering is there any standard formula for this particular kind of problem.
 
  • #15
I don't understand what you are looking for, but you seem to be overthinking. The formula I gave has an answer of 1 for good at both. Half the people are good at math and half at computer. Independence lads to 1/4 good at both Mean number is (1/4)(4)=1. If you looking for the distribution function, that is another question.
 
  • #16
Ahmed Abdullah said:
I am interested in the cases where people are fixed. For example if we have 4 people , 2 are good at computer and 2 are math then it is impossible to have 3 people good at both computer and math. Which is different than your approach.
People being fixed does not matter. Their answer for expected value = c*m/N is correct. The expected value will always be within the range of possible results, so you do not have to worry about "3 people good at both computer and math". In your example, you would expect 2*2/4 = 1 person to be good at both math and computers. And you must admit that that is correct. However, the expected value may not be an integer, which is totally impossible to ever actually occur for an integer number of people.
 

What is the definition of probability of co-occurrence?

The probability of co-occurrence refers to the likelihood of two or more events happening together or in relation to each other. It is a measure of how often two events or variables occur in conjunction with each other.

How is the probability of co-occurrence calculated?

The probability of co-occurrence is calculated by dividing the number of times the events occur together by the total number of trials or observations. It can also be calculated using conditional probability, which takes into account the probability of one event occurring given that another event has already occurred.

What is the difference between probability of co-occurrence and correlation?

The probability of co-occurrence measures the likelihood of two events happening together, while correlation measures the strength and direction of the relationship between two variables. Probability of co-occurrence is a simpler measure that only looks at whether two events occur together, while correlation takes into account the entire distribution of the variables.

How can the probability of co-occurrence be used in research?

The probability of co-occurrence can be used in research to identify patterns and relationships between variables. It can also be used to make predictions about the likelihood of future events based on past observations. Additionally, it can be used to test hypotheses and determine the strength of the relationship between variables.

What are some limitations of using probability of co-occurrence?

One limitation of using probability of co-occurrence is that it only measures the likelihood of two events occurring together and does not take into account the direction or strength of the relationship between the events. Additionally, it may not accurately represent the true relationship between variables if there are confounding variables or if the sample size is small.

Similar threads

  • Set Theory, Logic, Probability, Statistics
2
Replies
40
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
4K
Replies
12
Views
735
  • STEM Academic Advising
Replies
6
Views
3K
Replies
2
Views
72
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
Back
Top