Calculating the confidence interval of your data

Click For Summary

Discussion Overview

The discussion revolves around calculating the confidence interval for data accuracy, particularly in the context of loading data into a database and estimating the number of errors that may occur during this process. Participants explore the theoretical underpinnings and practical applications of confidence intervals.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant seeks guidance on how to calculate the confidence interval for their data, expressing a desire to understand the confidence in the correctness of the data.
  • Another participant suggests starting with a Google search for "confidence integral" and notes that the user can choose their desired confidence level, such as 95%.
  • A participant describes a scenario involving loading 300 lines of data into a database and proposes checking a sample size based on a 95% confidence interval to estimate the number of errors during the load.
  • Another participant interprets the scenario as estimating the percentage of corrupt lines in a larger dataset, suggesting a Poisson distribution model and providing a method to calculate a 95% confidence interval using a normal approximation.

Areas of Agreement / Disagreement

Participants present various approaches to calculating confidence intervals, but there is no consensus on a single method or interpretation of the problem. Multiple viewpoints and methods are discussed without resolution.

Contextual Notes

The discussion includes assumptions about the distribution of corrupt lines and the choice of sample size, which may affect the calculations and interpretations of confidence intervals. Specific mathematical steps and their implications remain unresolved.

tsaitea
Messages
19
Reaction score
0
Hello guys,

I would like to calculate the confidence interval of the data in which the data is correct. In otherwords I would like to know how much confidence we have that the data is correct.

Could anyone direct me where to start?

Thank you.
 
Physics news on Phys.org
Google for "confidence integral" is a good place to start.
Note: you decide how much confidence you want i.e. 95% - the confidence interval the the range within which you can have that confidence.
 
Thank you for your reply Simon.

That definitely helped me understand the confidence interval. I am struggling to put it into context now. Say for example I am loading 300 lines of data into a database. Now I want to figure out the # of errors that would occur during the load (data not loaded properly).

What I was thinking is maybe I would have to perform the load, and calculate the sample size I would need based on a 95% confidence interval and randomly check each line until I have checked up to the sample size. And based on the # of errors found, I could determine the confidence interval?
 
Let me try to repeat your problem. You maybe want to load millions of lines into a database and what to estimate what percentage of the lines is corrupt. You decide to check at random say N=250 lines. Let's assume that the distribution of corrupt lines follows a poisson distribution and you find that n=25 lines (i.e. 10% ) are corrupt, hence your estimate of the probability or fraction of corrupted lines is p=n/N=0.1. The variance is also 25 for a poisson distribution and the standard deviation ## \sigma=\sqrt{n}##. Using a normal approximation you can construct a 95% confidence interval
for the true percentage as ##[ (n- z \sigma)/N, (n+z \sigma)/N]## where for a 95% interval z=2 (1.96 to be exact).
So your true value is in the range [15/250, 35/250].
 
  • Like
Likes   Reactions: 1 person

Similar threads

Replies
6
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
4
Views
2K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 21 ·
Replies
21
Views
4K