Minimal Values of Repeated Numbers for Desired Probability?

Seanskahn · Jun 24, 2017

Homework Statement

Let 0≤p≤1.
Let there be k distinct numbers (they can be natural numbers) a₁, a₂, ... , a_k, each repeating respectively b₁, b₂, ... , b_k times.
Let q < ∑_r=1^k b_r

Determine the minimal values of b₁ ... b_k such that the probability of q numbers chosen out of ∑_r=1^k b_r numbers being exactly the same is p.

Example : consider 1,1,1,1,2 . The probability that 2 numbers chosen out of the 5 is exactly the same is 6/10

Homework Equations

The only ones I can come up with is

h = ∑_r=1^k b_r

and

p = ∑_r=1^k ^qC_{b_r} where q < b_r / ^qC_h

C stands for Combination , in contrast to permutation.

The Attempt at a Solution

[/B]
Technically this is not a homework assignment, but needed for a project where a software needs to generate pseudo-random numbers, with some properties.

I notice that we have too many variables and just two equations. But a minimal solution should be possible. I personally am stuck, but it looks like I am missing / forgetting some basic properties of probability - with which this problem can be easily solved.. I could apply brute force. The example was generated via brute force. But I am sure a more formal solution is available.

This is why I classify this as precalculus homework..

Please help. My background is Physics MSc - so don't hesitate to apply more advanced notation on the solution than precalculus, if needed,.

FactChecker · Jun 24, 2017

It seems as though the probability will always be a rational number. If p is irrational, the probability can never equal p. If you assume that p is rational, then maybe you can use its rational representation, p = m/n, to figure something out.

Ray Vickson · Jun 24, 2017

sean_s said:

Homework Statement

Let 0≤p≤1.

Let there be k distinct numbers (they can be natural numbers) a₁, a₂, ... , a_k, each repeating respectively b₁, b₂, ... , b_k times.

Let q < ∑_r=1^k b_r

Determine the minimal values of b₁ ... b_k such that the probability of q numbers chosen out of ∑_r=1^k b_r numbers being exactly the same is p.

Example : consider 1,1,1,1,2 . The probability that 2 numbers chosen out of the 5 is exactly the same is 6/10

Homework Equations

The only ones I can come up with is

h = ∑_r=1^k b_r

and

p = ∑_r=1^k ^qC_{b_r} where q < b_r / ^qC_h

C stands for Combination , in contrast to permutation.

The Attempt at a Solution

[/B]
Technically this is not a homework assignment, but needed for a project where a software needs to generate pseudo-random numbers, with some properties.

I notice that we have too many variables and just two equations. But a minimal solution should be possible. I personally am stuck, but it looks like I am missing / forgetting some basic properties of probability - with which this problem can be easily solved.. I could apply brute force. The example was generated via brute force. But I am sure a more formal solution is available.

This is why I classify this as precalculus homework..

Please help. My background is Physics MSc - so don't hesitate to apply more advanced notation on the solution than precalculus, if needed,.

For sampling without replacement (i.e., once a number is picked it becomes unavailable to be picked again) you need to have at least one of the ##b_i \geq q## in order to have any chance that you can get ##q## numbers the same. For sampling with replacement, this need not be the case, because with replacement you "put back" a number after it has been selected.

I get 6/10 in your 11112 example, but only if the sampling is without replacement.

There is a way of performing your calculations, but it is very computer-intensive. Let's say that all your ##b_i## are ##\geq q##. It was not stated clearly how many numbers you are choosing---you only said you wanted ##q## the same, so the number ##n## picked must be at least ##q##. So, we pick ##n## numbers without replacement. For each ##i = 1,2, \ldots, k##, let ##E_i## be the event that number ##i## is repeated exactly ##q## times in your sample of size ##n##. You want to know something like the probability that at least one of the events ##E_1, E_2, \ldots, E_k## occurs. However, this is still not sufficiently well-defined. Do you want just one of ##E_1, E_2, \ldots, E_k## to occur (so if ##q = 3## and you have 111, you cannot have 222 also, etc.), or do you allow several occurrences of ##q## identical numbers (so it would be OK to have both 111 and 222, etc.)? Each of these two cases can be treated using versions of the so-called inclusion-exclusion principle, but the final details are different in the two versions.

Let
$$\begin{array}{rcl}
S_1&=& \sum_{i=1}^j P(E_i) \\
S_2 &=& \sum_{i,j: \: i < j} P(E_i E_j) \\
S_3 &=&\sum_{i,j,l: \: i < j < l} P(E_i E_j E_l) \\
&\vdots& \\
S_k&=& P(E_1 E_2 \cdots E_n)
\end{array}
$$
(Here, ##P(AB)## means ##P(A\: \text{and}\: B)##, etc.)
For example, if ##k=4## we have
$$\begin{array}{rcl}
S_1 &=& P(E_1) + P(E_2) + P(E_3) + P(E_4)\\
S_2&=&P(E_1 E_2) + P(E_1 E_3) + P(E_1 E_4) + P(E_2 E_3) + \cdots + P(E_3 E_4) \\
S_3 &=& P(E_1 E_2 E_3) + P(E_1 E_2 E_4) + \cdots + P(E_2 E_3 E_4)\\
S_4 &=& P(E_1 E_2 E_3 E_4)
\end{array}
$$

If we let ##E_{{}\geq 1}## denote the event that at least one of the ##E_i## occur and ##E_{=1}## the event that exactly one of the ##E_i## occur, we have:
$$P(E_{\geq 1}) = S_1 - S_2 + S_3 - \cdots \pm S_k $$
and
$$P(E_{=1}) = S_1 - 2 S_2 + 3 S_3 - \cdots \pm k S_k$$

OK, so how do we compute the ##S_1, S_2, \ldots, S_k##?

To compute ##P(E_1)## we categorize the outcomes as "1" and "not 1". The probability of getting exactly ##q## 1s in ##n## picks is given by the hypergeometric probability
$$P(E_1) = \frac{C(b_1,q) C(B_1,n-q)}{C(N,n)}$$
Here, ##N = b_1 + b_2 + \cdots + b_k##, ##B_i = N - b_i## for each ##i##, and ##C(a,b)## is the binomial coefficient ##{}^aC_b##. Similarly for the other ##P(E_i)##, so we have
$$S_1 = \sum_{i=1}^k \frac{C(b_i,q)C(B_i,n-q)}{C(N,n)} $$
To compute ##P(E_1 E_2)##, categorize the outcomes into three classes, "1", "2" and "neither 1 nor 2", which contain ##b_1, b_2## and ##B_{12} = N - b_1 - b_2## members. Now we have a 3-class hypergeometric distribution, which is less familiar than the classic hypergeometric, but is still easy enough to get (from conditional probabilities, for example):
$$P(E_1 E_2 E_3) = \frac{C(b_1,q) C(b_2,q) C(B_{12}, n - 2q)}{C(N,n)},$$
etc.
So, in principle we can compute ##S_2##.

In a similar way we have
$$P(E_i E_j E_l) = \frac{C(b_i,q) C(b_j,q) C(b_l,q) C(N-b_i-b_k-b_l, n-3q)}{C(N,n)}, $$
etc.

Of course, in any practical example, some of the ##P(E_i E_j E_l \cdots E_r)## may be zero, just because ##qr > n## for example.

So, it is all computable in principle, but it will not be pleasant. Best of luck.

Minimal Values of Repeated Numbers for Desired Probability?

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Thread 'Family of lines that are at a distance of 5 from the origin'

Similar threads

Hot Threads

Geometry: Similar Shapes

Length of Diagonal

Eliminate ##\theta## between a pair of given equations

Approximate value of ##E=1/2! +1/3!+1/4!+... ##

Find integer points on this equation

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective