The discussion centers on calculating the probability that at least two out of five independent tests will detect an error in a computer program, given their respective detection probabilities of 0.1, 0.2, 0.3, 0.4, and 0.5. The initial calculation for the probability of at least one test detecting the error is established as 0.8488. The correct approach to find the probability of at least two tests detecting the error involves using the complement of the events "none" and "exactly one" test detecting the error, leading to the formula: P(at least 2) = 1 - (P(none) + P(exactly one)). Generating functions are suggested as a robust method for solving such problems.
PREREQUISITES
Understanding of probability theory, specifically independent events.
Familiarity with generating functions and their applications in probability.
Knowledge of elementary symmetric functions and their role in probability calculations.
Basic skills in algebraic manipulation of polynomials.
NEXT STEPS
Study the use of generating functions in probability, focusing on their application to independent random variables.
Learn about elementary symmetric functions and their significance in combinatorial probability.
Explore advanced probability concepts such as Chernoff bounds for inhomogeneous Bernoulli trials.
Practice calculating probabilities using complementary events and their implications in various scenarios.
USEFUL FOR
Mathematicians, statisticians, computer scientists, and anyone involved in software testing and error detection methodologies will benefit from this discussion.
#1
r0bHadz
194
17
Homework Statement
A computer program is tested by 5 independent tests. If there is an error, these tests
will discover it with probabilities 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. Suppose that the
program contains an error. What is the probability that it will be found by at least two tests
Homework Equations
The Attempt at a Solution
I found the probability that it will happen for at least one:
1-(.9*.8*.7*.6*.5) = .8488
Now my question is: what is the compliment of "at least two tests"
I think for some reason it is "1 and none" but how do I know weather it is "1 and none" or "1 or none"
Pr(at least 2) + Pr(1 and none) = 1
Pr(at least 2) + Pr(1 or none) = 1
A computer program is tested by 5 independent tests. If there is an error, these tests
will discover it with probabilities 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. Suppose that the
program contains an error. What is the probability that it will be found by at least two tests
Homework Equations
The Attempt at a Solution
I found the probability that it will happen for at least one:
1-(.9*.8*.7*.6*.5) = .8488
Now my question is: what is the compliment of "at least two tests"
I think for some reason it is "1 and none" but how do I know weather it is "1 and none" or "1 or none"
Pr(at least 2) + Pr(1 and none) = 1
Pr(at least 2) + Pr(1 or none) = 1
it has to be one of these but I'm not sure which.
With Pr(1 and none), Pr(1 and none) = Pr(1)*.8488
With Pr(1 or none) = Pr(1)+Pr(none) - Pr(1)*.8488
Problems like this one cry out for the use of generating functions. Let
$$ I_i = \begin{cases} 1 & \text{if test i discovers the fault} \\
0 & \text{otherwise}
\end{cases} $$ The different random variables ##I_1, I_2, I_3, I_4, I_5## are independent, and the number of tests that detect the fault is:
$$ \text{number that detect the fault} = N = I_1 + I_2 + I_3 + I_4 + I_5 $$
The distribution of ##N## can be found by looking at its generating function ##\tilde{N}(z),## which in this case is a fifth-degree polynomial:
$$ \tilde{N}(z) = \sum_{n=0}^5 P(N = n) z^n $$
The really neat thing about generating functions is the following theorem (which is actually quite easy to prove):
Theorem: The generating function of a sum of independent random variables is the product of their individual generating functions.
In this case, ##\tilde{I_1}(z) = 0.9 + 0.1 z, ## etc., so we have
$$\tilde{N}(z) = (.9+.1 z)(.8+.2 z) (.7 + .3 z) (.6 + .4 z) (.5 +.5 z).$$
If you expand that out algebraically to get the fifth-degree polynomial, the coefficient of ##z^j## is just ##P(N = j)##, so all you need to do is sum some coefficients.
The generating function approach seems like a pretty good one in general. (Alternatively various bounding methods like Chernoff were made for inhomogenous Bernouli trial setups like this.)
In this particular case, since OP recently had a question on sample spaces and events, perhaps a 3rd approach is illuminating.
- - - - -
we can look at the problem in terms of its complement:
##P\big(\text{at least two heads}\big) = 1- P\big(\text{at most one heads}\big)##
and
the (compound) event of exactly zero heads over all trials is denoted by ##A_0## and the event of exactly one heads over all trials is denoted by ##A_1##. These events are mutually exclusive.
- - - -
side note:
it may be a bit overkill, but for avoidance of doubt, we can easily derive the above from first principles. (I spoiler tagged it as it may be too much symbol manipulation for a basic result.)
##P\big(\text{at least two heads}\big) = P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big)##
and over any sequence of ##n## tosses either 0, 1, 2, 3, ..., n-1, or n, heads must have occurred. These events are mutually exclusive, so the associated probabilities add. That is
##P\big(A_0 \cup A_1 \cup A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = 1##
because we've summed over all mutually exclusive probabilities in the sample space.
hence the openning setup is to take the above equation and subtract ##P\big(A_0 \cup A_1\big)## from each side, giving
## P\big(\text{at least two heads}\big)= P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) =1 - P\big(A_0 \cup A_1\big) = 1- P\big(\text{at most one heads}\big) ##
which is what was shown at the top
- - - -
##P\big(\text{at most one heads}\big) = P\big(A_0 \cup A_1\big) = P\big(A_0\big) + P\big(A_1\big)##
because the probabilities of mutually exclusive events add.
now consider the n-tuple that has the probability of failures for each trial, i.e.
##\mathbf q = (q_1, q_2, ..., q_n)##
(where n=5 in the problem)
##P\big(A_0\big) = q_1 \cdot q_2 \cdot ... \cdot q_n = e_n\big(\mathbf q\big)##
where ##e_n## is the nth elementary symmetric function aka just the product of all n terms. These probabilities multiply because underlying trials / coin tosses (aka simple events), are independent
now
##P\big(A_1\big)## is given by the probability of coin ##i## being heads and all other coins being tails and then summing over all i -- equivalently we are 'leaving out' one failure and 'including' one success and we are summing over all ##\binom{n}{n-1} = n## possibilities here.
so consider the case of ##i = 1##, the associated probability is
##(1-q_1)\cdot \big(q_2\cdot q_3 \cdot ... \cdot q_n \big) = \big(q_2\cdot q_3 \cdot ... \cdot q_n\big) - \big(q_1 \cdot q_2\cdot q_3 \cdot ... \cdot q_n\big) = \big( q_2\cdot q_3 \cdot ... \cdot q_n\big) - e_n\big(\mathbf q\big)##and in general for having trial ##i## be the success, this is given by
##\big(\prod_{k\neq i} q_k \big) - e_n\big(\mathbf q\big)##
- - - - - side note:
each one of these corresponds to an event
the above may be interpreted the probability of the event where trial ##i## is heads and all other trials are tails. We can denote this event as ##A_1^{(i)}##. A little bit of thought tells us that
##A_1 = A_1^{(1)} \cup A_1^{(2)} \cup ... \cup A_1^{(n)}##
and each of these events are mutually exclusive (e.g. it is impossible to have both trial 1 be the only one with heads and for trial 2 to be the only one with heads). So
where we recognized that
##e_{n-1}\big(\mathbf q\big)## is the ##(n-1)##th elementary symmetric function -- i.e. it is equivalent to choosing ##n-1## of the ##n## q values and multiplying them, and then summing over all ##\binom{n}{n-1} = n## combinations.
- - - - -
This approach may become a bit cumbersome if there is a followup problem of finding "probability of at least 3 happening" while Ray's method will easily capture it.
On the other hand this approach is good practice for emphasizing the role of events and the sample space -- with special emphasis on the power of mutually exclusive events. (It also gives you another way of seeing that the special case of of homogenous probabilities for each coin toss simplifies all the calculations, leading to the binomial distribution.) The common thread between both approaches I suppose is that we are modelling / embedding the probabilities in terms of a polynomial -- Ray's method explicitly does this whereas in my approach it's implicit-- the elementary symmetric functions are intimately connected with roots of polynomials.
The generating function approach seems like a pretty good one in general. (Alternatively various bounding methods like Chernoff were made for inhomogenous Bernouli trial setups like this.)
In this particular case, since OP recently had a question on sample spaces and events, perhaps a 3rd approach is illuminating.
- - - - -
we can look at the problem in terms of its complement:
##P\big(\text{at least two heads}\big) = 1- P\big(\text{at most one heads}\big)##
and
the (compound) event of exactly zero heads over all trials is denoted by ##A_0## and the event of exactly one heads over all trials is denoted by ##A_1##. These events are mutually exclusive.
- - - -
side note:
it may be a bit overkill, but for avoidance of doubt, we can easily derive the above from first principles. (I spoiler tagged it as it may be too much symbol manipulation for a basic result.)
##P\big(\text{at least two heads}\big) = P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big)##
and over any sequence of ##n## tosses either 0, 1, 2, 3, ..., n-1, or n, heads must have occurred. These events are mutually exclusive, so the associated probabilities add. That is
##P\big(A_0 \cup A_1 \cup A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) = 1##
because we've summed over all mutually exclusive probabilities in the sample space.
hence the openning setup is to take the above equation and subtract ##P\big(A_0 \cup A_1\big)## from each side, giving
## P\big(\text{at least two heads}\big)= P\big(A_2 \cup A_3 \cup... \cup A_{n-1} \cup A_n\big) =1 - P\big(A_0 \cup A_1\big) = 1- P\big(\text{at most one heads}\big) ##
which is what was shown at the top
- - - -
##P\big(\text{at most one heads}\big) = P\big(A_0 \cup A_1\big) = P\big(A_0\big) + P\big(A_1\big)##
because the probabilities of mutually exclusive events add.
now consider the n-tuple that has the probability of failures for each trial, i.e.
##\mathbf q = (q_1, q_2, ..., q_n)##
(where n=5 in the problem)
##P\big(A_0\big) = q_1 \cdot q_2 \cdot ... \cdot q_n = e_n\big(\mathbf q\big)##
where ##e_n## is the nth elementary symmetric function aka just the product of all n terms. These probabilities multiply because underlying trials / coin tosses (aka simple events), are independent
now
##P\big(A_1\big)## is given by the probability of coin ##i## being heads and all other coins being tails and then summing over all i -- equivalently we are 'leaving out' one failure and 'including' one success and we are summing over all ##\binom{n}{n-1} = n## possibilities here.
so consider the case of ##i = 1##, the associated probability is
##(1-q_1)\cdot \big(q_2\cdot q_3 \cdot ... \cdot q_n \big) = \big(q_2\cdot q_3 \cdot ... \cdot q_n\big) - \big(q_1 \cdot q_2\cdot q_3 \cdot ... \cdot q_n\big) = \big( q_2\cdot q_3 \cdot ... \cdot q_n\big) - e_n\big(\mathbf q\big)##and in general for having trial ##i## be the success, this is given by
##\big(\prod_{k\neq i} q_k \big) - e_n\big(\mathbf q\big)##
- - - - - side note:
each one of these corresponds to an event
the above may be interpreted the probability of the event where trial ##i## is heads and all other trials are tails. We can denote this event as ##A_1^{(i)}##. A little bit of thought tells us that
##A_1 = A_1^{(1)} \cup A_1^{(2)} \cup ... \cup A_1^{(n)}##
and each of these events are mutually exclusive (e.g. it is impossible to have both trial 1 be the only one with heads and for trial 2 to be the only one with heads). So
where we recognized that
##e_{n-1}\big(\mathbf q\big)## is the ##(n-1)##th elementary symmetric function -- i.e. it is equivalent to choosing ##n-1## of the ##n## q values and multiplying them, and then summing over all ##\binom{n}{n-1} = n## combinations.
- - - - -
This approach may become a bit cumbersome if there is a followup problem of finding "probability of at least 3 happening" while Ray's method will easily capture it.
On the other hand this approach is good practice for emphasizing the role of events and the sample space -- with special emphasis on the power of mutually exclusive events. (It also gives you another way of seeing that the special case of of homogenous probabilities for each coin toss simplifies all the calculations, leading to the binomial distribution.) The common thread between both approaches I suppose is that we are modelling / embedding the probabilities in terms of a polynomial -- Ray's method explicitly does this whereas in my approach it's implicit-- the elementary symmetric functions are intimately connected with roots of polynomials.
Along the same lines: Feller's book (among many others) develops usable formulas for such probabilities as "P{exactly k of the events occur}" and "P{at least k of the events occur}", for events ##A_1, A_2, \ldots, A_n##. These formulas are generalizations of the standard inclusion/exclusion formula. They involve the quantities
$$S_r = \displaystyle \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq n}^{} P(A_{i_1} \cap A_{i_2} \cap \cdots \cap A_{i_r})$$ for ##r = 1,2, \ldots, n.## Basically, they look at higher ##S_j, j > i## to cancel the double-counting contained in ##S_i##, and so the formulas have the ##S_j## occurring with alternating signs, together with some combinatorial coefficients. (These coefficients are different for the "exactly k" and the "at least k" cases.) The alternating signs in this method arise as an alternative to the generating-function method, which uses both ##p =P(A)## and ##q = 1-P(A).## Both methods involve subtraction, but in different ways and places.
I avoided mentioning the generalized inclusion-exclusion method, just because it is harder to justify than the straightforward generating function method. On the other hand, it (the inclusion-exclusion approach) applies generally, even to events that are not independent, while the generating function method needs independence in order to be useful.
I avoided mentioning the generalized inclusion-exclusion method, just because it is harder to justify than the straightforward generating function method. On the other hand, it (the inclusion-exclusion approach) applies generally, even to events that are not independent, while the generating function method needs independence in order to be useful.
The generating function approach here really is quite nice and even has very fast computational routines which would give the entire distribution for very large n. I probably don't reach for generating functions often (soon?) enough. (That said if the simple events were 'very rare' yet independent here, I would be tempted to mention Poisson approximation + Le Cam's theorem as extra credit.)
For the case of dependent simple events, it's much harder of course. I suppose inclusion-exclusion methods work in principle, though figuring out probabilities for high degree intersections is not pleasant and is frequently intractable. Much better to approximate and bound the error if at all possible. (Basically, for rare events: Stein-Chen/ Poisson Approximation, otherwise: martingale methods... though managing dependencies is something of an art.) I don't want to deprecate the role of the sample space though... even the Union Bound is surprisingly useful.