How large of a sample do I need?

  • Thread starter Xnn
  • Start date
In summary: Statistically, the usual model for this kind of problem is the Poisson probability distribution. But I don't know how to apply it if you have no failures in your random samples. You stated a 95% confidence interval which corresponds to a P=0.025 significance level. This means you're willing to tolerate 185/40 = 4.625 or 4 failures among all your machines (downward rounding assures you stay within your tolerance level).However, if you can identify a subset of n machines that are more likely to fail, you simply test those machines and get a failure rate for those. This is not a random sample. It's simply the a determination of a failure rate of the subset of n machines. With this
  • #1
Xnn
555
0
Need assurance that a population of 185 devices are OPERABLE.
A MODEL has been developed for prediction of each devices condition.
Periodically, it is necessary to TEST the devices in case the MODEL is wrong.
However, it is expensive to TEST these devices.
Therefore, it would make sense to periodically TEST only a subset of the population.

How should I select a subset to ensure reasonable confidence (95%) that all devices are OPERABLE?


My thoughts are that I could randomly select 20 of the devices to TEST.
I could then calculate the STDEV and CORREL between the MODEL and TEST results.

If the STDEV is sufficiently small enough and the CORREL is almost 1.000, then I should be able to use the MODEL to determine which devices to TEST. However, I'm not sure how to establish acceptable values.

Alternatively, how could the MODEL be modified or biased so that it could be used to determine the population for periodic testing?

Many thanks for all responses!
 
Physics news on Phys.org
  • #2
Xnn said:
Need assurance that a population of 185 devices are OPERABLE.
A MODEL has been developed for prediction of each devices condition.
Periodically, it is necessary to TEST the devices in case the MODEL is wrong.
However, it is expensive to TEST these devices.
Therefore, it would make sense to periodically TEST only a subset of the population.

How should I select a subset to ensure reasonable confidence (95%) that all devices are OPERABLE?


My thoughts are that I could randomly select 20 of the devices to TEST.
I could then calculate the STDEV and CORREL between the MODEL and TEST results.

If the STDEV is sufficiently small enough and the CORREL is almost 1.000, then I should be able to use the MODEL to determine which devices to TEST. However, I'm not sure how to establish acceptable values.

Alternatively, how could the MODEL be modified or biased so that it could be used to determine the population for periodic testing?

Many thanks for all responses!

You want to assure at the p=0,025 significance level that there are no failures in the population. Therefore you are looking for deviations from 0 in random samples. However, you cannot do any statistical analysis unless you actually observe failures which seems to contradict your goal of trying to assure there are no failures. To do that, you need to test every machine.

If you test 40 randomly selected machines and do not observe a failure you possibly could argue that the probability that the next machine is a failure is less than 1/40, but that doesn't tell you anything about the rest of the machines unless you already have a model.
 
Last edited:
  • #3
Thanks SW;

I do have a MODEL for predicting performance of each machine.
Not sure how good it is, but suspect that it is conservative.
That is to say, the MODEL seems to be over stating the deterioration of each machine.
So, perhaps instead of picking machines at random to test, I could pick the worst ones.

Say I test the 20 worst machines (as determined by the MODEL).
If they are all OPERABLE, then could I some how statistically reject the need to test any other machines despite the MODEL predicting that are likely to fail?

Or, could this just be telling me that the MODEL is useless for predicting failures?
If the MODEL is useless, then perhaps machines should be chosen at random for testing.

So, I'm still struggling to come up with a statistically valid approach for minimizing testing. It's obviously tied with validity of the MODEL.
 
  • #4
Xnn said:
Thanks SW;

Or, could this just be telling me that the MODEL is useless for predicting failures?
If the MODEL is useless, then perhaps machines should be chosen at random for testing.

So, I'm still struggling to come up with a statistically valid approach for minimizing testing. It's obviously tied with validity of the MODEL.

Statistically, the usual model for this kind of problem is the Poisson probability distribution. But I don't know how to apply it if you have no failures in your random samples. You stated a 95% confidence interval which corresponds to a P=0.025 significance level. This means you're willing to tolerate 185/40 = 4.625 or 4 failures among all your machines (downward rounding assures you stay within your tolerance level).

However, if you can identify a subset of n machines that are more likely to fail, you simply test those machines and get a failure rate for those. This is not a random sample. It's simply the a determination of a failure rate of the subset of n machines. With this subset removed, you can test a a random sample of about 60 machines from the remaining population. If you get no failures, you may assume with 95% confidence that the failure probability is within your tolerance. If you do get failures, you can use this sample to estimate the fail rate in the 185-n machines using the Poisson model to test your confidence limits. These can be obtained from on line calculators for the Poisson distribution.

However, with low fail rates I don't see this as a practical alternative to testing all machines for N = 185 if you want to be sure of eliminating the bad ones.

EDIT: My previous calculation of 40 without failures in my first response was incorrect.
 
Last edited:
  • #5
Regarding my previous posts; your confidence limit is one sided since the lower limit is zero, Therefore your failure tolerance for 185 machines is 9.25 (9) given a 95% CI. This might be higher than you want. However the calculation of a failure free sample size of about 60 is correct based on [itex]\alpha = 0.05[/itex]. If you go to [itex]\alpha = 0.025[/itex] you will be close to 150 in your sample, which so close to 185 you really don't save much by not testing all machines..

The calculation is based on [itex] (1 - \alpha)^{x}[/itex] where x is the number to be sampled, without observing a failure, to assure the fail rate is within your tolerance.
 
Last edited:
  • #6
Thanks again SW.

A focus team meeting was held today regarding the testing plan.
Practicality is a major factor, so our team recommendation was to test 28 machines.
These are the easiest & cheapest population to test, which also contains the 20 most likely to fail. At least it is a start.

Target date for testing is October 15th. It's going to take a lot of work to get ready. Also, it's not clear how long it will take us to analyze the data and then decide what to do next.

In addition, our team recommendation must pass an internal peer review, Corporate review and then finally a Management review. So, the plan could easily change.

What strikes me as so odd, is that without much of any data, but knowing that a statistical approach is available, most everyone becomes an optimist regarding the testing outcome.
 
  • #7
Xnn said:
Thanks again SW.

What strikes me as so odd, is that without much of any data, but knowing that a statistical approach is available, most everyone becomes an optimist regarding the testing outcome.

OK. But the only way you assure there are no failures is to test every machine. By isolating the most likely to fail, you have a basis for deciding what to do next. The ideal is to be assured that none of the remaining machines will fail. My suggestion was conservative. There's a Bayesian approach where you sequentially recalculate the probability after each test given no failures, conditional on the remaining number to be tested. Assuming no failures, this can reduce the sample size needed. It's fairly technical. My recommendation, if you want to rely on a statistical approach, is for your company to hire an industrial statistician as a consultant if you don't have one in house. Any approach based on a less than total "sample" will only give you a non zero probability of a failure in the untested machines.
 
Last edited:

1. How do I determine the appropriate sample size for my study?

The appropriate sample size for a study depends on various factors such as the research question, the desired level of confidence, and the expected effect size. It is important to consult with a statistician or use a sample size calculator to determine the appropriate sample size for your study.

2. What is the minimum sample size needed for statistical significance?

The minimum sample size needed for statistical significance depends on the desired level of significance and the expected effect size. Generally, a larger sample size is needed for smaller effect sizes and higher levels of significance. It is recommended to consult with a statistician to determine the minimum sample size for your study.

3. Can I use a smaller sample size to save time and resources?

Using a smaller sample size may lead to less reliable results and decrease the power of the study to detect significant effects. It is important to carefully consider the trade-off between time/resources and the validity of the study results before deciding on a sample size.

4. How does the sample size affect the generalizability of my results?

The sample size can affect the generalizability of the study results. A larger sample size allows for more diversity and representation of the population, increasing the generalizability of the results. However, a smaller sample size may still provide valid results if the sample is representative of the population of interest.

5. Is there a maximum sample size that I should not exceed?

There is no specific maximum sample size that should not be exceeded, but it is important to consider practical limitations such as time, resources, and feasibility. It is also important to ensure that the sample size is not too large compared to the population size, as this may lead to overgeneralization of the results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
10K
  • Mechanical Engineering
Replies
0
Views
396
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
39
Views
4K
  • Programming and Computer Science
Replies
1
Views
667
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Quantum Interpretations and Foundations
Replies
25
Views
1K
  • STEM Academic Advising
Replies
4
Views
1K
Back
Top