Calculating the confidence interval of your data

tsaitea · Sep 5, 2014

Hello guys,

I would like to calculate the confidence interval of the data in which the data is correct. In otherwords I would like to know how much confidence we have that the data is correct.

Could anyone direct me where to start?

Thank you.

Simon Bridge · Sep 5, 2014

Google for "confidence integral" is a good place to start.
Note: you decide how much confidence you want i.e. 95% - the confidence interval the the range within which you can have that confidence.

tsaitea · Sep 9, 2014

Thank you for your reply Simon.

That definitely helped me understand the confidence interval. I am struggling to put it into context now. Say for example I am loading 300 lines of data into a database. Now I want to figure out the # of errors that would occur during the load (data not loaded properly).

What I was thinking is maybe I would have to perform the load, and calculate the sample size I would need based on a 95% confidence interval and randomly check each line until I have checked up to the sample size. And based on the # of errors found, I could determine the confidence interval?

DrDu · Sep 9, 2014

Let me try to repeat your problem. You maybe want to load millions of lines into a database and what to estimate what percentage of the lines is corrupt. You decide to check at random say N=250 lines. Let's assume that the distribution of corrupt lines follows a poisson distribution and you find that n=25 lines (i.e. 10% ) are corrupt, hence your estimate of the probability or fraction of corrupted lines is p=n/N=0.1. The variance is also 25 for a poisson distribution and the standard deviation ## \sigma=\sqrt{n}##. Using a normal approximation you can construct a 95% confidence interval
for the true percentage as ##[ (n- z \sigma)/N, (n+z \sigma)/N]## where for a 95% interval z=2 (1.96 to be exact).
So your true value is in the range [15/250, 35/250].

Calculating the confidence interval of your data

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight