Probability of at least two happening

r0bHadz · Jan 28, 2019

Homework Statement

A computer program is tested by 5 independent tests. If there is an error, these tests
will discover it with probabilities 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. Suppose that the
program contains an error. What is the probability that it will be found by at least two tests

Homework Equations

The Attempt at a Solution

I found the probability that it will happen for at least one:

1-(.9*.8*.7*.6*.5) = .8488

Now my question is: what is the compliment of "at least two tests"

I think for some reason it is "1 and none" but how do I know weather it is "1 and none" or "1 or none"

Pr(at least 2) + Pr(1 and none) = 1
Pr(at least 2) + Pr(1 or none) = 1

it has to be one of these but I'm not sure which.

With Pr(1 and none), Pr(1 and none) = Pr(1)*.8488

With Pr(1 or none) = Pr(1)+Pr(none) - Pr(1)*.8488

Wrichik Basu · Jan 28, 2019

r0bHadz said:

Now my question is: what is the compliment of "at least two tests"

It should be Pr(None or 1), that is,

Required Probability = 1 - (Pr(None of the tests find an error) + Pr(Exactly one of the tests find an error)).

Ray Vickson · Jan 28, 2019

r0bHadz said:

Homework Statement

A computer program is tested by 5 independent tests. If there is an error, these tests
will discover it with probabilities 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. Suppose that the
program contains an error. What is the probability that it will be found by at least two tests

Homework Equations
The Attempt at a Solution

I found the probability that it will happen for at least one:

1-(.9*.8*.7*.6*.5) = .8488

Now my question is: what is the compliment of "at least two tests"

I think for some reason it is "1 and none" but how do I know weather it is "1 and none" or "1 or none"

Pr(at least 2) + Pr(1 and none) = 1
Pr(at least 2) + Pr(1 or none) = 1

it has to be one of these but I'm not sure which.

With Pr(1 and none), Pr(1 and none) = Pr(1)*.8488

With Pr(1 or none) = Pr(1)+Pr(none) - Pr(1)*.8488

Problems like this one cry out for the use of generating functions. Let
$$ I_i = \begin{cases} 1 & \text{if test i discovers the fault} \\
0 & \text{otherwise}
\end{cases} $$ The different random variables ##I_1, I_2, I_3, I_4, I_5## are independent, and the number of tests that detect the fault is:
$$ \text{number that detect the fault} = N = I_1 + I_2 + I_3 + I_4 + I_5 $$
The distribution of ##N## can be found by looking at its generating function ##\tilde{N}(z),## which in this case is a fifth-degree polynomial:
$$ \tilde{N}(z) = \sum_{n=0}^5 P(N = n) z^n $$
The really neat thing about generating functions is the following theorem (which is actually quite easy to prove):

Theorem: The generating function of a sum of independent random variables is the product of their individual generating functions.

In this case, ##\tilde{I_1}(z) = 0.9 + 0.1 z, ## etc., so we have
$$\tilde{N}(z) = (.9+.1 z)(.8+.2 z) (.7 + .3 z) (.6 + .4 z) (.5 +.5 z).$$
If you expand that out algebraically to get the fifth-degree polynomial, the coefficient of ##z^j## is just ##P(N = j)##, so all you need to do is sum some coefficients.

StoneTemplePython · Jan 29, 2019

The generating function approach seems like a pretty good one in general. (Alternatively various bounding methods like Chernoff were made for inhomogenous Bernouli trial setups like this.)

In this particular case, since OP recently had a question on sample spaces and events, perhaps a 3rd approach is illuminating.

- - - - -
we can look at the problem in terms of its complement:
##P\big(\text{at least two heads}\big) = 1- P\big(\text{at most one heads}\big)##

and
the (compound) event of exactly zero heads over all trials is denoted by ##A_0## and the event of exactly one heads over all trials is denoted by ##A_1##. These events are mutually exclusive.
- - - -
side note:
it may be a bit overkill, but for avoidance of doubt, we can easily derive the above from first principles. (I spoiler tagged it as it may be too much symbol manipulation for a basic result.)

##P\big(\text{at least two heads}\big) = P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big)##
and over any sequence of ##n## tosses either 0, 1, 2, 3, ..., n-1, or n, heads must have occurred. These events are mutually exclusive, so the associated probabilities add. That is

##P\big(A_0 \cup A_1 \cup A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = 1##
because we've summed over all mutually exclusive probabilities in the sample space.

But taking advantage of mutual exclusivity, we know
## P\big(A_0 \cup A_1\big) + P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = P\big(A_0 \cup A_1 \cup A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = 1 ##

hence the openning setup is to take the above equation and subtract ##P\big(A_0 \cup A_1\big)## from each side, giving

## P\big(\text{at least two heads}\big)= P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) =1 - P\big(A_0 \cup A_1\big) = 1- P\big(\text{at most one heads}\big) ##
which is what was shown at the top

- - - -
##P\big(\text{at most one heads}\big) = P\big(A_0 \cup A_1\big) = P\big(A_0\big) + P\big(A_1\big)##
because the probabilities of mutually exclusive events add.

now consider the n-tuple that has the probability of failures for each trial, i.e.
##\mathbf q = (q_1, q_2, ..., q_n)##
(where n=5 in the problem)

##P\big(A_0\big) = q_1 \cdot q_2 \cdot ... \cdot q_n = e_n\big(\mathbf q\big)##
where ##e_n## is the nth elementary symmetric function aka just the product of all n terms. These probabilities multiply because underlying trials / coin tosses (aka simple events), are independent

now
##P\big(A_1\big)## is given by the probability of coin ##i## being heads and all other coins being tails and then summing over all i -- equivalently we are 'leaving out' one failure and 'including' one success and we are summing over all ##\binom{n}{n-1} = n## possibilities here.

so consider the case of ##i = 1##, the associated probability is
##(1-q_1)\cdot \big(q_2\cdot q_3 \cdot ... \cdot q_n \big) = \big(q_2\cdot q_3 \cdot ... \cdot q_n\big) - \big(q_1 \cdot q_2\cdot q_3 \cdot ... \cdot q_n\big) = \big( q_2\cdot q_3 \cdot ... \cdot q_n\big) - e_n\big(\mathbf q\big)##and in general for having trial ##i## be the success, this is given by
##\big(\prod_{k\neq i} q_k \big) - e_n\big(\mathbf q\big)##

- - - - -
side note:
each one of these corresponds to an event
the above may be interpreted the probability of the event where trial ##i## is heads and all other trials are tails. We can denote this event as ##A_1^{(i)}##. A little bit of thought tells us that

##A_1 = A_1^{(1)} \cup A_1^{(2)} \cup ... \cup A_1^{(n)}##
and each of these events are mutually exclusive (e.g. it is impossible to have both trial 1 be the only one with heads and for trial 2 to be the only one with heads). So

##P\big(A_1\big) = P\big(A_1^{(1)} \cup A_1^{(2)} \cup ... \cup A_1^{(n)}\big) = P\big(A_1^{(1)} \big) + P\big( A_1^{(2)}\big) + ... + P\big(A_1^{(n)}\big)= \sum_{i=1}^n P\big(A_1^{(i)} \big)##

because we can add probabilities for mutually exclusive events. This gives the rationale for summing over all ##i##, which is what is shown below.

- - - - -
and summing over all i gives

##P\big(A_1\big) =\sum_{i=1}^n P\big(A_1^{(i)} \big) = \sum_{i=1}^n \Big(\big(\prod_{k\neq i} q_k \big) - e_n\big(\mathbf q\big)\Big) = \Big(\sum_{i=1}^n \prod_{k\neq i} q_k \Big) - \sum_{i=1}^n e_n\big(\mathbf q\big) = e_{n-1}\big(\mathbf q\big) - n \cdot e_n\big(\mathbf q\big)##

where we recognized that
##e_{n-1}\big(\mathbf q\big)## is the ##(n-1)##th elementary symmetric function -- i.e. it is equivalent to choosing ##n-1## of the ##n## q values and multiplying them, and then summing over all ##\binom{n}{n-1} = n## combinations.

- - - - -
This approach may become a bit cumbersome if there is a followup problem of finding "probability of at least 3 happening" while Ray's method will easily capture it.

On the other hand this approach is good practice for emphasizing the role of events and the sample space -- with special emphasis on the power of mutually exclusive events. (It also gives you another way of seeing that the special case of of homogenous probabilities for each coin toss simplifies all the calculations, leading to the binomial distribution.) The common thread between both approaches I suppose is that we are modelling / embedding the probabilities in terms of a polynomial -- Ray's method explicitly does this whereas in my approach it's implicit-- the elementary symmetric functions are intimately connected with roots of polynomials.

Ray Vickson · Jan 29, 2019

StoneTemplePython said:

The generating function approach seems like a pretty good one in general. (Alternatively various bounding methods like Chernoff were made for inhomogenous Bernouli trial setups like this.)

In this particular case, since OP recently had a question on sample spaces and events, perhaps a 3rd approach is illuminating.

- - - - -
we can look at the problem in terms of its complement:
##P\big(\text{at least two heads}\big) = 1- P\big(\text{at most one heads}\big)##

and
the (compound) event of exactly zero heads over all trials is denoted by ##A_0## and the event of exactly one heads over all trials is denoted by ##A_1##. These events are mutually exclusive.
- - - -
side note:
it may be a bit overkill, but for avoidance of doubt, we can easily derive the above from first principles. (I spoiler tagged it as it may be too much symbol manipulation for a basic result.)

##P\big(\text{at least two heads}\big) = P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big)##
and over any sequence of ##n## tosses either 0, 1, 2, 3, ..., n-1, or n, heads must have occurred. These events are mutually exclusive, so the associated probabilities add. That is

##P\big(A_0 \cup A_1 \cup A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = 1##
because we've summed over all mutually exclusive probabilities in the sample space.

But taking advantage of mutual exclusivity, we know
## P\big(A_0 \cup A_1\big) + P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = P\big(A_0 \cup A_1 \cup A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = 1 ##

hence the openning setup is to take the above equation and subtract ##P\big(A_0 \cup A_1\big)## from each side, giving

## P\big(\text{at least two heads}\big)= P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) =1 - P\big(A_0 \cup A_1\big) = 1- P\big(\text{at most one heads}\big) ##
which is what was shown at the top

- - - -
##P\big(\text{at most one heads}\big) = P\big(A_0 \cup A_1\big) = P\big(A_0\big) + P\big(A_1\big)##
because the probabilities of mutually exclusive events add.

now consider the n-tuple that has the probability of failures for each trial, i.e.
##\mathbf q = (q_1, q_2, ..., q_n)##
(where n=5 in the problem)

##P\big(A_0\big) = q_1 \cdot q_2 \cdot ... \cdot q_n = e_n\big(\mathbf q\big)##
where ##e_n## is the nth elementary symmetric function aka just the product of all n terms. These probabilities multiply because underlying trials / coin tosses (aka simple events), are independent

now
##P\big(A_1\big)## is given by the probability of coin ##i## being heads and all other coins being tails and then summing over all i -- equivalently we are 'leaving out' one failure and 'including' one success and we are summing over all ##\binom{n}{n-1} = n## possibilities here.

so consider the case of ##i = 1##, the associated probability is
##(1-q_1)\cdot \big(q_2\cdot q_3 \cdot ... \cdot q_n \big) = \big(q_2\cdot q_3 \cdot ... \cdot q_n\big) - \big(q_1 \cdot q_2\cdot q_3 \cdot ... \cdot q_n\big) = \big( q_2\cdot q_3 \cdot ... \cdot q_n\big) - e_n\big(\mathbf q\big)##and in general for having trial ##i## be the success, this is given by
##\big(\prod_{k\neq i} q_k \big) - e_n\big(\mathbf q\big)##

- - - - -
side note:
each one of these corresponds to an event
the above may be interpreted the probability of the event where trial ##i## is heads and all other trials are tails. We can denote this event as ##A_1^{(i)}##. A little bit of thought tells us that

##A_1 = A_1^{(1)} \cup A_1^{(2)} \cup ... \cup A_1^{(n)}##
and each of these events are mutually exclusive (e.g. it is impossible to have both trial 1 be the only one with heads and for trial 2 to be the only one with heads). So

##P\big(A_1\big) = P\big(A_1^{(1)} \cup A_1^{(2)} \cup ... \cup A_1^{(n)}\big) = P\big(A_1^{(1)} \big) + P\big( A_1^{(2)}\big) + ... + P\big(A_1^{(n)}\big)= \sum_{i=1}^n P\big(A_1^{(i)} \big)##

because we can add probabilities for mutually exclusive events. This gives the rationale for summing over all ##i##, which is what is shown below.

- - - - -
and summing over all i gives

##P\big(A_1\big) =\sum_{i=1}^n P\big(A_1^{(i)} \big) = \sum_{i=1}^n \Big(\big(\prod_{k\neq i} q_k \big) - e_n\big(\mathbf q\big)\Big) = \Big(\sum_{i=1}^n \prod_{k\neq i} q_k \Big) - \sum_{i=1}^n e_n\big(\mathbf q\big) = e_{n-1}\big(\mathbf q\big) - n \cdot e_n\big(\mathbf q\big)##

where we recognized that
##e_{n-1}\big(\mathbf q\big)## is the ##(n-1)##th elementary symmetric function -- i.e. it is equivalent to choosing ##n-1## of the ##n## q values and multiplying them, and then summing over all ##\binom{n}{n-1} = n## combinations.

- - - - -
This approach may become a bit cumbersome if there is a followup problem of finding "probability of at least 3 happening" while Ray's method will easily capture it.

On the other hand this approach is good practice for emphasizing the role of events and the sample space -- with special emphasis on the power of mutually exclusive events. (It also gives you another way of seeing that the special case of of homogenous probabilities for each coin toss simplifies all the calculations, leading to the binomial distribution.) The common thread between both approaches I suppose is that we are modelling / embedding the probabilities in terms of a polynomial -- Ray's method explicitly does this whereas in my approach it's implicit-- the elementary symmetric functions are intimately connected with roots of polynomials.

Along the same lines: Feller's book (among many others) develops usable formulas for such probabilities as "P{exactly k of the events occur}" and "P{at least k of the events occur}", for events ##A_1, A_2, \ldots, A_n##. These formulas are generalizations of the standard inclusion/exclusion formula. They involve the quantities
$$S_r = \displaystyle \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq n}^{} P(A_{i_1} \cap A_{i_2} \cap \cdots \cap A_{i_r})$$ for ##r = 1,2, \ldots, n.## Basically, they look at higher ##S_j, j > i## to cancel the double-counting contained in ##S_i##, and so the formulas have the ##S_j## occurring with alternating signs, together with some combinatorial coefficients. (These coefficients are different for the "exactly k" and the "at least k" cases.) The alternating signs in this method arise as an alternative to the generating-function method, which uses both ##p =P(A)## and ##q = 1-P(A).## Both methods involve subtraction, but in different ways and places.

I avoided mentioning the generalized inclusion-exclusion method, just because it is harder to justify than the straightforward generating function method. On the other hand, it (the inclusion-exclusion approach) applies generally, even to events that are not independent, while the generating function method needs independence in order to be useful.

StoneTemplePython · Jan 29, 2019

Ray Vickson said:

I avoided mentioning the generalized inclusion-exclusion method, just because it is harder to justify than the straightforward generating function method. On the other hand, it (the inclusion-exclusion approach) applies generally, even to events that are not independent, while the generating function method needs independence in order to be useful.

The generating function approach here really is quite nice and even has very fast computational routines which would give the entire distribution for very large n. I probably don't reach for generating functions often (soon?) enough. (That said if the simple events were 'very rare' yet independent here, I would be tempted to mention Poisson approximation + Le Cam's theorem as extra credit.)

For the case of dependent simple events, it's much harder of course. I suppose inclusion-exclusion methods work in principle, though figuring out probabilities for high degree intersections is not pleasant and is frequently intractable. Much better to approximate and bound the error if at all possible. (Basically, for rare events: Stein-Chen/ Poisson Approximation, otherwise: martingale methods... though managing dependencies is something of an art.) I don't want to deprecate the role of the sample space though... even the Union Bound is surprisingly useful.

Probability of at least two happening

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

1. What does "probability of at least two happening" mean?

2. How is the probability of at least two happening calculated?

3. Can the probability of at least two happening ever be greater than 1?

4. What is the difference between the probability of at least two happening and the probability of both events happening?

5. How does the probability of at least two happening change with more events?

Similar threads

Hot Threads

Recent Insights