Minimal Values of Repeated Numbers for Desired Probability?

AI Thread Summary
The discussion focuses on determining the minimal values of repeated numbers (b1, b2, ... , bk) such that the probability of selecting q identical numbers from a total of ∑r=1k br numbers equals a specified probability p. The challenge arises from having too many variables relative to the equations available, leading to the need for a more formal solution beyond brute force methods. It is noted that for sampling without replacement, at least one of the b_i must be greater than or equal to q to achieve the desired probability, while this condition is not necessary for sampling with replacement. The conversation also touches on the use of hypergeometric probabilities and the inclusion-exclusion principle to compute the necessary probabilities for various scenarios. Overall, the problem is complex and requires careful consideration of probability theory to derive a solution.
Seanskahn
Messages
29
Reaction score
0

Homework Statement


  • Let 0≤p≤1.
  • Let there be k distinct numbers (they can be natural numbers) a1, a2, ... , ak, each repeating respectively b1, b2, ... , bk times.
  • Let q < ∑r=1k br
Determine the minimal values of b1 ... bk such that the probability of q numbers chosen out of ∑r=1k br numbers being exactly the same is p.

Example : consider 1,1,1,1,2 . The probability that 2 numbers chosen out of the 5 is exactly the same is 6/10

Homework Equations



The only ones I can come up with is
  • h = ∑r=1k br
and
  • p = ∑r=1k qCb_r where q < br / qCh
C stands for Combination , in contrast to permutation.

The Attempt at a Solution


[/B]
Technically this is not a homework assignment, but needed for a project where a software needs to generate pseudo-random numbers, with some properties.

I notice that we have too many variables and just two equations. But a minimal solution should be possible. I personally am stuck, but it looks like I am missing / forgetting some basic properties of probability - with which this problem can be easily solved.. I could apply brute force. The example was generated via brute force. But I am sure a more formal solution is available.

This is why I classify this as precalculus homework..

Please help. My background is Physics MSc - so don't hesitate to apply more advanced notation on the solution than precalculus, if needed,.
 
Physics news on Phys.org
It seems as though the probability will always be a rational number. If p is irrational, the probability can never equal p. If you assume that p is rational, then maybe you can use its rational representation, p = m/n, to figure something out.
 
sean_s said:

Homework Statement


  • Let 0≤p≤1.
  • Let there be k distinct numbers (they can be natural numbers) a1, a2, ... , ak, each repeating respectively b1, b2, ... , bk times.
  • Let q < ∑r=1k br
Determine the minimal values of b1 ... bk such that the probability of q numbers chosen out of ∑r=1k br numbers being exactly the same is p.

Example : consider 1,1,1,1,2 . The probability that 2 numbers chosen out of the 5 is exactly the same is 6/10

Homework Equations



The only ones I can come up with is
  • h = ∑r=1k br
and
  • p = ∑r=1k qCb_r where q < br / qCh
C stands for Combination , in contrast to permutation.

The Attempt at a Solution


[/B]
Technically this is not a homework assignment, but needed for a project where a software needs to generate pseudo-random numbers, with some properties.

I notice that we have too many variables and just two equations. But a minimal solution should be possible. I personally am stuck, but it looks like I am missing / forgetting some basic properties of probability - with which this problem can be easily solved.. I could apply brute force. The example was generated via brute force. But I am sure a more formal solution is available.

This is why I classify this as precalculus homework..

Please help. My background is Physics MSc - so don't hesitate to apply more advanced notation on the solution than precalculus, if needed,.

For sampling without replacement (i.e., once a number is picked it becomes unavailable to be picked again) you need to have at least one of the ##b_i \geq q## in order to have any chance that you can get ##q## numbers the same. For sampling with replacement, this need not be the case, because with replacement you "put back" a number after it has been selected.

I get 6/10 in your 11112 example, but only if the sampling is without replacement.

There is a way of performing your calculations, but it is very computer-intensive. Let's say that all your ##b_i## are ##\geq q##. It was not stated clearly how many numbers you are choosing---you only said you wanted ##q## the same, so the number ##n## picked must be at least ##q##. So, we pick ##n## numbers without replacement. For each ##i = 1,2, \ldots, k##, let ##E_i## be the event that number ##i## is repeated exactly ##q## times in your sample of size ##n##. You want to know something like the probability that at least one of the events ##E_1, E_2, \ldots, E_k## occurs. However, this is still not sufficiently well-defined. Do you want just one of ##E_1, E_2, \ldots, E_k## to occur (so if ##q = 3## and you have 111, you cannot have 222 also, etc.), or do you allow several occurrences of ##q## identical numbers (so it would be OK to have both 111 and 222, etc.)? Each of these two cases can be treated using versions of the so-called inclusion-exclusion principle, but the final details are different in the two versions.

Let
$$\begin{array}{rcl}
S_1&=& \sum_{i=1}^j P(E_i) \\
S_2 &=& \sum_{i,j: \: i < j} P(E_i E_j) \\
S_3 &=&\sum_{i,j,l: \: i < j < l} P(E_i E_j E_l) \\
&\vdots& \\
S_k&=& P(E_1 E_2 \cdots E_n)
\end{array}
$$
(Here, ##P(AB)## means ##P(A\: \text{and}\: B)##, etc.)
For example, if ##k=4## we have
$$\begin{array}{rcl}
S_1 &=& P(E_1) + P(E_2) + P(E_3) + P(E_4)\\
S_2&=&P(E_1 E_2) + P(E_1 E_3) + P(E_1 E_4) + P(E_2 E_3) + \cdots + P(E_3 E_4) \\
S_3 &=& P(E_1 E_2 E_3) + P(E_1 E_2 E_4) + \cdots + P(E_2 E_3 E_4)\\
S_4 &=& P(E_1 E_2 E_3 E_4)
\end{array}
$$

If we let ##E_{{}\geq 1}## denote the event that at least one of the ##E_i## occur and ##E_{=1}## the event that exactly one of the ##E_i## occur, we have:
$$P(E_{\geq 1}) = S_1 - S_2 + S_3 - \cdots \pm S_k $$
and
$$P(E_{=1}) = S_1 - 2 S_2 + 3 S_3 - \cdots \pm k S_k$$

OK, so how do we compute the ##S_1, S_2, \ldots, S_k##?

To compute ##P(E_1)## we categorize the outcomes as "1" and "not 1". The probability of getting exactly ##q## 1s in ##n## picks is given by the hypergeometric probability
$$P(E_1) = \frac{C(b_1,q) C(B_1,n-q)}{C(N,n)}$$
Here, ##N = b_1 + b_2 + \cdots + b_k##, ##B_i = N - b_i## for each ##i##, and ##C(a,b)## is the binomial coefficient ##{}^aC_b##. Similarly for the other ##P(E_i)##, so we have
$$S_1 = \sum_{i=1}^k \frac{C(b_i,q)C(B_i,n-q)}{C(N,n)} $$
To compute ##P(E_1 E_2)##, categorize the outcomes into three classes, "1", "2" and "neither 1 nor 2", which contain ##b_1, b_2## and ##B_{12} = N - b_1 - b_2## members. Now we have a 3-class hypergeometric distribution, which is less familiar than the classic hypergeometric, but is still easy enough to get (from conditional probabilities, for example):
$$P(E_1 E_2 E_3) = \frac{C(b_1,q) C(b_2,q) C(B_{12}, n - 2q)}{C(N,n)},$$
etc.
So, in principle we can compute ##S_2##.

In a similar way we have
$$P(E_i E_j E_l) = \frac{C(b_i,q) C(b_j,q) C(b_l,q) C(N-b_i-b_k-b_l, n-3q)}{C(N,n)}, $$
etc.

Of course, in any practical example, some of the ##P(E_i E_j E_l \cdots E_r)## may be zero, just because ##qr > n## for example.

So, it is all computable in principle, but it will not be pleasant. Best of luck.
 
Last edited:
I picked up this problem from the Schaum's series book titled "College Mathematics" by Ayres/Schmidt. It is a solved problem in the book. But what surprised me was that the solution to this problem was given in one line without any explanation. I could, therefore, not understand how the given one-line solution was reached. The one-line solution in the book says: The equation is ##x \cos{\omega} +y \sin{\omega} - 5 = 0##, ##\omega## being the parameter. From my side, the only thing I could...
Back
Top