Question about determining probability

  • Thread starter Thread starter Mr.V.
  • Start date Start date
  • Tags Tags
    Probability
Click For Summary
SUMMARY

This discussion focuses on calculating the probability of selecting overlapping elements from two non-mutually exclusive subsets drawn from a larger dataset. The user presents a scenario with 5000 unique elements, where two subsets of 100 elements each are chosen, and 10 elements overlap. The probability calculation involves combinatorial methods, specifically the use of combinations to determine the likelihood of selecting a specific number of overlapping elements. The final probability is derived using the formula: (\frac{1}{50})^{10} * \frac{100!}{10!*90!}, resulting in approximately 0.00018.

PREREQUISITES
  • Understanding of basic probability theory
  • Familiarity with combinatorial mathematics, specifically combinations
  • Knowledge of probability notation and calculations
  • Experience with large datasets and sampling methods
NEXT STEPS
  • Study combinatorial probability calculations in depth
  • Learn about hypergeometric distribution for sampling without replacement
  • Explore advanced probability concepts such as conditional probability
  • Investigate statistical software tools for probability simulations, such as R or Python's SciPy library
USEFUL FOR

Statisticians, data scientists, mathematicians, and anyone interested in probability theory and combinatorial analysis.

Mr.V.
Messages
9
Reaction score
1
Hi!
I have a data set of ~5000 unique elements.
From that set I have 2 subsets that are not mutually exclusive. For example if the elements are letters from A-Z, the first set could be A, B, C, D, E, F, and G the second set could be E, F, G, H, I, and J.
Here's the question...
The first subset has 100 elements randomly chosen from the 5000. The second subset has 100 elements randomly chosen from the 5000. Of interest is that 10 of the elements from subset1 are also in subset 2.
What is the probability of that happening?
Here's my logic so far, though I'm not sure I'm right.
If we use subset 1 as the reference:
If subset 1 had 1 element, the probability of getting any 1 element in subset 2 is:
[tex]\frac{100}{5000}[/tex] or [tex]\frac{1}{50}[/tex]. The probability of getting 2 elements in a set of 2 is: [tex]\frac{100}{5000} * \frac{99}{4999}[/tex] ... the probability of getting 10 if subset 1 were only 10 elements would be [tex]\frac{100!}{90!}*\frac{4990!}{5000!}[/tex] which is roughly [tex](\frac{1}{50})^{10} = 1.024*10^{-17}[/tex]
However since I have a set of 100, I need to include the chance of getting that set of 10 in many different ways...I think I should use combinations correct? So since I had a set of 100, and I want a subset of 10, there are [tex]\frac{100!}{10!*90!}[/tex] different ways of ordering that set...
So is the correct answer...
[tex](\frac{1}{50})^{10} * \frac{100!}{10!*90!} = 0.00018[/tex]
?
Thanks for your help!
 
Last edited:
Physics news on Phys.org
Hmm.

Lets say we have 1000 objects, and 20 are in a particular subset - call it set A, and we want to know what the odds are of picking a subset B of 20 elements at random so that exactly 2 of these elements is in A.

Clearly, there are 1000 choose 20 ways to choose B.
Now, if B is to have two elements from A, then there are 20 choose 2 ways to pick those two elements, and, since B will also have 18 elements from the remaining 1980 objects, there are 1980 chose 18 ways to pick those.

Thus, the probability of getting exactly 2 elements would be:
[tex]\frac{20 \rm{C}2 \times 1980 \rm{C}18}{2000 \rm{C} 20}[/tex]
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 36 ·
2
Replies
36
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K