Sample Size calculation, defect analysis

cinger · Oct 5, 2011

I am testing a change made to a software application; a function had to be rewritten to perform in a completely different way. The specific part of this application iterates through input data sets and outputs the data to another application. In order to test that data was being sent correctly, 15,000 data sets were passed through. This is an time-expensive test, taking about 10 hrs of labor for processing, and 2-3hrs for data analysis. During this test, 1 defect was encountered where the specific function failed to send data. Subsequently, they reviewed the code and attributed the defect to a incorrectly double incrementing a counter where it should have only incremented once. The code was corrected, and another test of 15,000 was performed with 0 defects of this specific type seen. Other defects occurred, but they were completely outside the scope of the specific function in question. Once the software is complete and ships, it will be responsible for passing millions of transactions in this manner.
How would confidence levels be determined for the first and second tests, i.e., how confident can we be with 0 defects seen in the second test?
The first test showed 1 defect in 15000 samples, or a defect per million encounters of 67 defects per million; how confident is this defect per million estimate?
How would adequate sample size be determined?
Would p-charts (http://en.wikipedia.org/wiki/P-chart" ) be appropriately used here?
If p-charts are useful, I have attempted to use the adequate sample size calculation listed on the wiki page; I wasn't sure of the units but could post some of that as well if it is applicable here.

mXSCNT · Oct 5, 2011

It doesn't apply because you fixed the bug, so you should have no defects of that type in the future.

Also, your test data sets are not necessarily representative of the data sets that you will find in practice - a bug that happens once in 15000 runs might actually happen 1 in 100 runs for real data, or it might happen 1 in 2 million runs for real data. So you can't draw conclusions about real usage based on your test run.

cinger · Oct 5, 2011

The data from the tests had been selected to be as representative of actual field data as possible, with an increased percentage of input data that would catch errors associated with the type of bug encountered, due to the software's new functionality and software changes. This particular bug has been fixed, and performing the test again no defects were seen; but perhaps there are other bugs with smaller probability of error, it is very large software. May not have defects of this type attributed to this bug, but it is possible to have defects of this type attributed to perhaps different bug. Can any statistical information be derived from these tests? Can an adequate sample size be determined?

mXSCNT · Oct 5, 2011

The problem is still that your test data is not field data. You could use the student's t distribution to get a confidence interval for the bug rate... but it wouldn't be very meaningful. Especially since there may be many bugs that exist that you have not noticed.

You could just track the absolute number of unfixed functional bugs. Maybe divide it by the total lines of code or by the number of current developers to get a proportional figure.

Sample Size calculation, defect analysis

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad The problem of points

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Sample Size calculation, defect analysis

Similar threads