# Basic Probability Concept

1. Jul 16, 2011

### I_am_learning

Suppose 5% of the Flash-Disk manufactured by a company is faulty.
If we take, 100 flash-Disk how many are expected to be faulty?
It must be 5, right?
If we take 100 flash-Disk, what is the probability that 5 of them are faulty?
This is what bugging me.
We know 5% are most likely to be faulty. So, in 100 flash-Disk 5 are most likely to be faulty. But how much likely?

2. Jul 16, 2011

### micromass

Hi I_am_learning!

If I take an arbitrary flash-disk, what is the chance that it's faulty?

3. Jul 16, 2011

### I_am_learning

This is confusing to me. If EXACTLY 5% of flash disk are faulty in the collection of flash disks from which I take out an arbitrary flash-disk, then the answer would be 0.05.
But can we be sure that exactly 5% are faulty? What is the chance that exactly 5% are faulty?

4. Jul 16, 2011

### micromass

Look at it this way. Say I give you 100 flash-disks, and I take out 1, what is the probability that it's faulty?

By the way, I think this question is actually very ill-presented. I know what they want you to do, but I think they could have done a better job in asking the question...

5. Jul 16, 2011

### I_am_learning

I understand your point micromass. You mean to say this
1.* Probability of any one flash-Disk being faulty is 0.05
So, if I take 100 of them, the probability of 5 being faulty can be worked out.
I can do that.
But, my confusion is in *. When they say, 5% of flash Disk are faulty, do they mean exactly 5% of their total production is faulty or do they mean, the faulty probability distribution peaks at 5?

6. Jul 16, 2011

### micromass

The way I interpret it is that the chance that a disk will be faulty is 0.05. So, I mean that the probability distribution peaks at 5.
If they mean that exactly 5% of their distribution is faulty, then the question doesn't make much sense to me.

Again, I don't like this question. I'm pretty sure that they mean for you to take a binomial(0.05)-distribution and calculate when 5 out of 100 are faulty. But they should have phrased it better.

7. Jul 16, 2011

### I_am_learning

So, in general (i.e. in books) when they say, 5% of # is faulty, they mean, the %faulty probability distribution peaks at 5 and follows such a distribution (binomial?) that the probability of any given # of being faulty is 0.05.
Makes some sense.
Thanks.

8. Jul 16, 2011

### chiro

I am getting an ambiguous interpretation of this.

The reason I say this is because if you do multiple trials (i.e. pick up multiple disks), your probability depends on what the results of your last tests were.

If each trial is independent of every other, you have a binomial distribution with p = 0.05 (for faulty disks). Based on this model, if you want to know the chance of five being faulty you would calculate 100C5 (0.05)^5 x (0.95)^95.

The above model assumes that every disk failing is independent from every other disk failing. If this is not the case, then the calculations become more complex.

9. Jul 17, 2011

### pmsrw3

Not exactly. What they mean is that the probability for any given item is 0.05, and that they're all independent. Independence is the unstated assumption. In defense of whoever wrote the problem, this assumption often is left unstated, although it probably ought not to be. Independence is assumed, not because it's likely to be true (it isn't), but because it's the simplest assumption that will allow you to calculate something useful.

There is no explicit assumption that the distribution peaks at 5 in 100, or that it is binomial. In fact, both those things are true in this case, but they are not assumed: they follow from the other information and the assumption of independence.

10. Jul 27, 2011

### thebiggerbang

out of the 100 that you have, 5 of them have to be faulty (as given)!! So, I guess the probability should be 1

11. Jul 27, 2011

### micromass

That's not what they said!! What they said is that 5% are faulty. So for example, if the company produced 1000000 disks, then it is known that 50000 disks are faulty. The question is: what if we take 100 disks out of these 1000000 disks, what is the probability that 5 are faulty? It's certainly not 1, since we can always take 100 good disks from the 1000000...

12. Jul 28, 2011

### alexfloo

Here's how I think of it. Instead of selecting 100 disks, select one disk, and repeat it 100 times.

You know that each selection is independent (we're going to assume there's a MASSIVE number of disks to select from). You also know that the probability of getting a faulty one is .05, and of getting a good one is .95, right?

Now consider this: you're wondering what the probability is that, out of your 100 trials, 5 of them come out negative (faulty) and 95 come out positive (good).

So now you have a probability assigned to each of your 100 trials. Let's presume that the faulty ones are the first ones you pull. Then you can use simple arithmetic to combine all the trials into one probability. (I won't tell you how, but I think you can figure it out)

Then you just consider the fact that it didn't *need* to be the first 5, right? So you can adjust your probability for the number of reorderings, using some simple counting principles.

When they said 5% are faulty, all they mean is that (like pmsrw3 said) if you select *one* disk, totally at random, P(faulty)=0.05, and P(good)=0.95. There are no distributions or random variables or any of that unless you choose to define them yourself.

13. Jul 30, 2011

### SW VandeCarr

The probability that exactly k=5 units are faulty is calculated from a Poisson distribution. This distribution is used when k is small. When k is exceeds 10-15 the Poisson approaches the binomial.

The probability mass function (pmf) is $f(k,\lambda)= \lambda^k e^{-\lambda}/k!$.

Lambda is the expectation (E=5); so f=(3125)(.0067)/120=0.17448

The cumulative probability of at most five errors involves summing over for k=0,1,2,3,4,5 which turns out to be about 0.615.

Last edited: Jul 30, 2011
14. Jul 30, 2011

### thebiggerbang

Well, just a question. Will it depend on the number of disks that the manufacturer produces?

15. Jul 30, 2011

### alexfloo

Yes. We are assuming that they produced an infinite number of disks. Lets say they produce only 100, and 5% (5) are faulty. The probability that the first one is faulty is 5/100. If it is faulty, the probability that the second one is faulty is 4/99, and if it is not faulty, that probability is 5/99. The number of possibilities branches like that as you go, because each drawing effects the next one.

When your teacher or textbook says tells you only that 5% are faulty, and then says that you "randomly" choose a certain number, though, you can assume that each choice is independent, even though in the real world that only happens if there are infinitely many (that is, it doesn't happen ever).

16. Jul 30, 2011

### SW VandeCarr

I'm not sure I follow that. Clearly if you have only 100 and 5 are faulty, then your fail rate is 0.05. I'm not sure what information you're after if you sequentially remove one unit at a time without replacement. The order of the units in the batch is assumed to be random anyway. What's important is that 5% (on average) are faulty across batches of 100.

It is true that if the error rate is larger (say over 10) you should use the binomial distribution where the variance depends, in part, on the batch size.

With many batches over time, the error rate may change so the overall average becomes less important. This is obviously particularly true if some change is made in the manufacturing process.

EDIT: By the way, I'm not making any assumptions about whether the a batches are random samples, or we have 100% inspection. The same statistics apply without additional information on the manufacturing process.

Last edited: Jul 30, 2011
17. Jul 30, 2011

### alexfloo

You're correct, I didn't think that one all the way through. The total number does not change the probability that a certain number are in a sample of a certain size. It does, however change the probability that a particular one in a (sans replacement) sequence of trials is faulty, if we condition on the previous trials.

18. Jul 31, 2011

### SW VandeCarr

Why would you condition on a previous trial unless you were given a reason to do so? The standard assumption is that the "trials" are independent, although this is not necessarily a sampling problem as I indicated in my last post. If all units are evaluated over a sequence of batches, you would still expect the error rate to vary randomly around the mean. This expectation might not hold over time however which means that an evaluation for any trend over time might be indicated. However, this is still not a situation where the error rate is conditional on prior observations. Wear and tear on machinery, for example, might lead to a trend of increasing errors, but the observations are still independent. Under what circumstances would they not be?

19. Jul 31, 2011

### alexfloo

I think you're over thinking my comment. I was just explaining the logic behind my mistake, not suggesting that a series of dependent trials would in fact be useful. Apologies.