What is the probability of 5 faulty flash-disks out of 100?

In summary, the conversation discusses the concept of faulty flash disks and the probability of obtaining a specific number of faulty disks out of a given sample. The group discusses whether the 5% of faulty disks refers to a specific number or a probability distribution, and the assumption of independence in calculating the probability.
  • #1
I_am_learning
682
16
Suppose 5% of the Flash-Disk manufactured by a company is faulty.
If we take, 100 flash-Disk how many are expected to be faulty?
It must be 5, right?
If we take 100 flash-Disk, what is the probability that 5 of them are faulty?
This is what bugging me.
We know 5% are most likely to be faulty. So, in 100 flash-Disk 5 are most likely to be faulty. But how much likely?
 
Physics news on Phys.org
  • #2
Hi I_am_learning! :smile:

If I take an arbitrary flash-disk, what is the chance that it's faulty?
 
  • #3
This is confusing to me. If EXACTLY 5% of flash disk are faulty in the collection of flash disks from which I take out an arbitrary flash-disk, then the answer would be 0.05.
But can we be sure that exactly 5% are faulty? What is the chance that exactly 5% are faulty?
 
  • #4
I_am_learning said:
This is confusing to me. If EXACTLY 5% of flash disk are faulty in the collection of flash disks from which I take out an arbitrary flash-disk, then the answer would be 0.05.
But can we be sure that exactly 5% are faulty? What is the chance that exactly 5% are faulty?

Look at it this way. Say I give you 100 flash-disks, and I take out 1, what is the probability that it's faulty?

By the way, I think this question is actually very ill-presented. I know what they want you to do, but I think they could have done a better job in asking the question...
 
  • #5
I understand your point micromass. You mean to say this
1.* Probability of anyone flash-Disk being faulty is 0.05
So, if I take 100 of them, the probability of 5 being faulty can be worked out.
I can do that.
But, my confusion is in *. When they say, 5% of flash Disk are faulty, do they mean exactly 5% of their total production is faulty or do they mean, the faulty probability distribution peaks at 5?
 
  • #6
I_am_learning said:
I understand your point micromass. You mean to say this
1.* Probability of anyone flash-Disk being faulty is 0.05
So, if I take 100 of them, the probability of 5 being faulty can be worked out.
I can do that.
But, my confusion is in *. When they say, 5% of flash Disk are faulty, do they mean exactly 5% of their total production is faulty or do they mean, the faulty probability distribution peaks at 5?

The way I interpret it is that the chance that a disk will be faulty is 0.05. So, I mean that the probability distribution peaks at 5.
If they mean that exactly 5% of their distribution is faulty, then the question doesn't make much sense to me.

Again, I don't like this question. I'm pretty sure that they mean for you to take a binomial(0.05)-distribution and calculate when 5 out of 100 are faulty. But they should have phrased it better.
 
  • #7
So, in general (i.e. in books) when they say, 5% of # is faulty, they mean, the %faulty probability distribution peaks at 5 and follows such a distribution (binomial?) that the probability of any given # of being faulty is 0.05.
Makes some sense.
Thanks.
 
  • #8
I_am_learning said:
Suppose 5% of the Flash-Disk manufactured by a company is faulty.
If we take, 100 flash-Disk how many are expected to be faulty?
It must be 5, right?
If we take 100 flash-Disk, what is the probability that 5 of them are faulty?
This is what bugging me.
We know 5% are most likely to be faulty. So, in 100 flash-Disk 5 are most likely to be faulty. But how much likely?

I am getting an ambiguous interpretation of this.

The reason I say this is because if you do multiple trials (i.e. pick up multiple disks), your probability depends on what the results of your last tests were.

If each trial is independent of every other, you have a binomial distribution with p = 0.05 (for faulty disks). Based on this model, if you want to know the chance of five being faulty you would calculate 100C5 (0.05)^5 x (0.95)^95.

The above model assumes that every disk failing is independent from every other disk failing. If this is not the case, then the calculations become more complex.
 
  • #9
I_am_learning said:
So, in general (i.e. in books) when they say, 5% of # is faulty, they mean, the %faulty probability distribution peaks at 5 and follows such a distribution (binomial?) that the probability of any given # of being faulty is 0.05.
Makes some sense.
Thanks.
Not exactly. What they mean is that the probability for any given item is 0.05, and that they're all independent. Independence is the unstated assumption. In defense of whoever wrote the problem, this assumption often is left unstated, although it probably ought not to be. Independence is assumed, not because it's likely to be true (it isn't), but because it's the simplest assumption that will allow you to calculate something useful.

There is no explicit assumption that the distribution peaks at 5 in 100, or that it is binomial. In fact, both those things are true in this case, but they are not assumed: they follow from the other information and the assumption of independence.
 
  • #10
I_am_learning said:
Suppose 5% of the Flash-Disk manufactured by a company is faulty.
If we take, 100 flash-Disk how many are expected to be faulty?
It must be 5, right?
If we take 100 flash-Disk, what is the probability that 5 of them are faulty?
This is what bugging me.
We know 5% are most likely to be faulty. So, in 100 flash-Disk 5 are most likely to be faulty. But how much likely?

out of the 100 that you have, 5 of them have to be faulty (as given)! So, I guess the probability should be 1
 
  • #11
thebiggerbang said:
out of the 100 that you have, 5 of them have to be faulty (as given)! So, I guess the probability should be 1

That's not what they said! What they said is that 5% are faulty. So for example, if the company produced 1000000 disks, then it is known that 50000 disks are faulty. The question is: what if we take 100 disks out of these 1000000 disks, what is the probability that 5 are faulty? It's certainly not 1, since we can always take 100 good disks from the 1000000...
 
  • #12
Here's how I think of it. Instead of selecting 100 disks, select one disk, and repeat it 100 times.

You know that each selection is independent (we're going to assume there's a MASSIVE number of disks to select from). You also know that the probability of getting a faulty one is .05, and of getting a good one is .95, right?

Now consider this: you're wondering what the probability is that, out of your 100 trials, 5 of them come out negative (faulty) and 95 come out positive (good).

So now you have a probability assigned to each of your 100 trials. Let's presume that the faulty ones are the first ones you pull. Then you can use simple arithmetic to combine all the trials into one probability. (I won't tell you how, but I think you can figure it out)

Then you just consider the fact that it didn't *need* to be the first 5, right? So you can adjust your probability for the number of reorderings, using some simple counting principles.

When they said 5% are faulty, all they mean is that (like pmsrw3 said) if you select *one* disk, totally at random, P(faulty)=0.05, and P(good)=0.95. There are no distributions or random variables or any of that unless you choose to define them yourself.
 
  • #13
I_am_learning said:
Suppose 5% of the Flash-Disk manufactured by a company is faulty.
If we take, 100 flash-Disk how many are expected to be faulty?
It must be 5, right?
If we take 100 flash-Disk, what is the probability that 5 of them are faulty?

The probability that exactly k=5 units are faulty is calculated from a Poisson distribution. This distribution is used when k is small. When k is exceeds 10-15 the Poisson approaches the binomial.

The probability mass function (pmf) is [itex] f(k,\lambda)= \lambda^k e^{-\lambda}/k![/itex].

Lambda is the expectation (E=5); so f=(3125)(.0067)/120=0.17448

The cumulative probability of at most five errors involves summing over for k=0,1,2,3,4,5 which turns out to be about 0.615.
 
Last edited:
  • #14
Well, just a question. Will it depend on the number of disks that the manufacturer produces?
 
  • #15
Yes. We are assuming that they produced an infinite number of disks. Let's say they produce only 100, and 5% (5) are faulty. The probability that the first one is faulty is 5/100. If it is faulty, the probability that the second one is faulty is 4/99, and if it is not faulty, that probability is 5/99. The number of possibilities branches like that as you go, because each drawing effects the next one.

When your teacher or textbook says tells you only that 5% are faulty, and then says that you "randomly" choose a certain number, though, you can assume that each choice is independent, even though in the real world that only happens if there are infinitely many (that is, it doesn't happen ever).
 
  • #16
alexfloo said:
Yes. We are assuming that they produced an infinite number of disks. Let's say they produce only 100, and 5% (5) are faulty. The probability that the first one is faulty is 5/100. If it is faulty, the probability that the second one is faulty is 4/99, and if it is not faulty, that probability is 5/99. The number of possibilities branches like that as you go, because each drawing effects the next one.

When your teacher or textbook says tells you only that 5% are faulty, and then says that you "randomly" choose a certain number, though, you can assume that each choice is independent, even though in the real world that only happens if there are infinitely many (that is, it doesn't happen ever).

I'm not sure I follow that. Clearly if you have only 100 and 5 are faulty, then your fail rate is 0.05. I'm not sure what information you're after if you sequentially remove one unit at a time without replacement. The order of the units in the batch is assumed to be random anyway. What's important is that 5% (on average) are faulty across batches of 100.

It is true that if the error rate is larger (say over 10) you should use the binomial distribution where the variance depends, in part, on the batch size.

With many batches over time, the error rate may change so the overall average becomes less important. This is obviously particularly true if some change is made in the manufacturing process.

EDIT: By the way, I'm not making any assumptions about whether the a batches are random samples, or we have 100% inspection. The same statistics apply without additional information on the manufacturing process.
 
Last edited:
  • #17
You're correct, I didn't think that one all the way through. The total number does not change the probability that a certain number are in a sample of a certain size. It does, however change the probability that a particular one in a (sans replacement) sequence of trials is faulty, if we condition on the previous trials.
 
  • #18
alexfloo said:
It does, however change the probability that a particular one in a (sans replacement) sequence of trials is faulty, if we condition on the previous trials.

Why would you condition on a previous trial unless you were given a reason to do so? The standard assumption is that the "trials" are independent, although this is not necessarily a sampling problem as I indicated in my last post. If all units are evaluated over a sequence of batches, you would still expect the error rate to vary randomly around the mean. This expectation might not hold over time however which means that an evaluation for any trend over time might be indicated. However, this is still not a situation where the error rate is conditional on prior observations. Wear and tear on machinery, for example, might lead to a trend of increasing errors, but the observations are still independent. Under what circumstances would they not be?
 
  • #19
I think you're over thinking my comment. I was just explaining the logic behind my mistake, not suggesting that a series of dependent trials would in fact be useful. Apologies.
 

What is probability?

Probability is a mathematical concept that measures the likelihood of an event occurring. It is represented as a number between 0 and 1, where 0 indicates impossibility and 1 indicates certainty.

What is the difference between theoretical and experimental probability?

Theoretical probability is based on mathematical calculations and predicts the likelihood of an event occurring, while experimental probability is based on actual observations and measures the frequency of an event occurring.

What is the Law of Large Numbers?

The Law of Large Numbers states that as the number of trials or experiments increases, the experimental probability will approach the theoretical probability.

What is the difference between independent and dependent events?

Independent events are events that do not affect each other's probability of occurring, while dependent events are events that do affect each other's probability of occurring.

How do you calculate the probability of multiple events occurring?

To calculate the probability of multiple events occurring, you multiply the individual probabilities of each event together. This is known as the multiplication rule of probability.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
340
  • Computing and Technology
Replies
30
Views
2K
Replies
9
Views
2K
  • General Math
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
41
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
Replies
55
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top