I am testing a change made to a software application; a function had to be rewritten to perform in a completely different way. The specific part of this application iterates through input data sets and outputs the data to another application. In order to test that data was being sent correctly, 15,000 data sets were passed through. This is an time-expensive test, taking about 10 hrs of labor for processing, and 2-3hrs for data analysis. During this test, 1 defect was encountered where the specific function failed to send data. Subsequently, they reviewed the code and attributed the defect to a incorrectly double incrementing a counter where it should have only incremented once. The code was corrected, and another test of 15,000 was performed with 0 defects of this specific type seen. Other defects occurred, but they were completely outside the scope of the specific function in question. Once the software is complete and ships, it will be responsible for passing millions of transactions in this manner.

How would confidence levels be determined for the first and second tests, i.e., how confident can we be with 0 defects seen in the second test?

The first test showed 1 defect in 15000 samples, or a defect per million encounters of 67 defects per million; how confident is this defect per million estimate?

How would adequate sample size be determined?

Would p-charts (http://en.wikipedia.org/wiki/P-chart" [Broken]) be appropriately used here?

If p-charts are useful, I have attempted to use the adequate sample size calculation listed on the wiki page; I wasn't sure of the units but could post some of that as well if it is applicable here.

# Sample Size calculation, defect analysis

