Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Calculating the confidence interval of your data

  1. Sep 5, 2014 #1
    Hello guys,

    I would like to calculate the confidence interval of the data in which the data is correct. In otherwords I would like to know how much confidence we have that the data is correct.

    Could anyone direct me where to start?

    Thank you.
     
  2. jcsd
  3. Sep 5, 2014 #2

    Simon Bridge

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    Google for "confidence integral" is a good place to start.
    Note: you decide how much confidence you want i.e. 95% - the confidence interval the the range within which you can have that confidence.
     
  4. Sep 9, 2014 #3
    Thank you for your reply Simon.

    That definitely helped me understand the confidence interval. I am struggling to put it into context now. Say for example I am loading 300 lines of data into a database. Now I want to figure out the # of errors that would occur during the load (data not loaded properly).

    What I was thinking is maybe I would have to perform the load, and calculate the sample size I would need based on a 95% confidence interval and randomly check each line until I have checked up to the sample size. And based on the # of errors found, I could determine the confidence interval?
     
  5. Sep 9, 2014 #4

    DrDu

    User Avatar
    Science Advisor

    Let me try to repeat your problem. You maybe want to load millions of lines into a database and what to estimate what percentage of the lines is corrupt. You decide to check at random say N=250 lines. Let's assume that the distribution of corrupt lines follows a poisson distribution and you find that n=25 lines (i.e. 10% ) are corrupt, hence your estimate of the probability or fraction of corrupted lines is p=n/N=0.1. The variance is also 25 for a poisson distribution and the standard deviation ## \sigma=\sqrt{n}##. Using a normal approximation you can construct a 95% confidence interval
    for the true percentage as ##[ (n- z \sigma)/N, (n+z \sigma)/N]## where for a 95% interval z=2 (1.96 to be exact).
    So your true value is in the range [15/250, 35/250].
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Calculating the confidence interval of your data
  1. Confidence intervals (Replies: 3)

  2. Confidence Interval (Replies: 4)

Loading...