# Calculating the confidence interval of your data

1. Sep 5, 2014

### tsaitea

Hello guys,

I would like to calculate the confidence interval of the data in which the data is correct. In otherwords I would like to know how much confidence we have that the data is correct.

Could anyone direct me where to start?

Thank you.

2. Sep 5, 2014

### Simon Bridge

Google for "confidence integral" is a good place to start.
Note: you decide how much confidence you want i.e. 95% - the confidence interval the the range within which you can have that confidence.

3. Sep 9, 2014

### tsaitea

That definitely helped me understand the confidence interval. I am struggling to put it into context now. Say for example I am loading 300 lines of data into a database. Now I want to figure out the # of errors that would occur during the load (data not loaded properly).

What I was thinking is maybe I would have to perform the load, and calculate the sample size I would need based on a 95% confidence interval and randomly check each line until I have checked up to the sample size. And based on the # of errors found, I could determine the confidence interval?

4. Sep 9, 2014

### DrDu

Let me try to repeat your problem. You maybe want to load millions of lines into a database and what to estimate what percentage of the lines is corrupt. You decide to check at random say N=250 lines. Let's assume that the distribution of corrupt lines follows a poisson distribution and you find that n=25 lines (i.e. 10% ) are corrupt, hence your estimate of the probability or fraction of corrupted lines is p=n/N=0.1. The variance is also 25 for a poisson distribution and the standard deviation $\sigma=\sqrt{n}$. Using a normal approximation you can construct a 95% confidence interval
for the true percentage as $[ (n- z \sigma)/N, (n+z \sigma)/N]$ where for a 95% interval z=2 (1.96 to be exact).
So your true value is in the range [15/250, 35/250].