# Finding the confidence interval

• Pietair
In summary, the database contains 4000 entries, of which 98% are consistent with the source information. However, it is not possible to say anything about the confidence level and interval of this 98% considering the entire database.

## Homework Statement

What formula do I need to find the confidence interval, when I have got:

- Number of samples
- Level of Confidence
- The assumed (1st guess) accuracy

## Homework Equations

I found the following equation online: µ = z * [p * (1 - p) / n] ^ (-1/2)

## The Attempt at a Solution

When I fill in this formula, I get µ = 125.5, while I think the confidence interval should be around 3 percent.

If you want a 95% CI, then you want P(-a<Z<a)=0.95 where

$$Z=\frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$$.

So $\bar{x} \pm a \frac{\sigma}{\sqrt{n}}$ will be a 95% CI for μ

Pietair said:

## Homework Statement

What formula do I need to find the confidence interval, when I have got:

- Number of samples
- Level of Confidence
- The assumed (1st guess) accuracy

## Homework Equations

I found the following equation online: µ = z * [p * (1 - p) / n] ^ (-1/2)
This may or may not be relevant to your problem. This formula looks vaguely related to a binomial distribution. You haven't said what the distribution is, so it's hard to say if this is something you need to use.
Pietair said:

## The Attempt at a Solution

When I fill in this formula, I get µ = 125.5, while I think the confidence interval should be around 3 percent.

Again, you have provided enough information for me to tell if this is a reasonable value for µ. What you said about the confidence interval makes no sense at all. A confidence interval is an interval, with a left endpoint and a right endpoint. It is not given as a percentage.

All the information I have got considering this practice situation:

Information written down on a form will be put in a database. The information in the database can be correct (match the information written on the form) or can be incorrect (do not match the information written on the form). A mismatch occurs when the database administrator enters the wrong information (for example: putting "b" in the database when "a" is written on the form).

Now I would like to execute a sample to judge whether the data found in the database is reliable (ie consistent with the source information) or not. The database contains a total of 4000 entries. I would like to execute a sample because it is quite time consuming to check if all 4000 entries are correct or not. With this sample I would like to state something about the reliability of the entire database (4000 entries).

So, suppose I have 100 entries checked, and 2 of them do not match. Then I find that 98% of the database entries of the corresponding sample is consistent with the source information. But what can I say about the confidence level and interval of this 98% considering the entire database (4000 entries).

Has anyone got an idea regarding this practical situation?

## What is a confidence interval?

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a certain level of confidence. It is typically calculated from a sample of data and is used to estimate the range of values that the population parameter may fall within.

## Why is it important to calculate a confidence interval?

Calculating a confidence interval allows for a more accurate estimation of a population parameter. It takes into account the variability in the data and provides a range of values instead of a single point estimate. This can help to avoid making incorrect conclusions based on a single estimate.

## How is a confidence interval calculated?

A confidence interval is calculated using a specific formula that takes into account the sample size, the standard deviation of the data, and the desired level of confidence. This formula may vary depending on the type of data and the population parameter being estimated.

## What is the significance level in a confidence interval?

The significance level, also known as alpha, is the probability of making a Type I error. In other words, it is the probability of incorrectly rejecting the null hypothesis when it is actually true. It is typically set at 0.05 or 0.01, but can vary depending on the desired level of certainty in the results.

## How can I interpret a confidence interval?

A confidence interval can be interpreted as a range of values that is likely to contain the true value of the population parameter. The wider the interval, the less precise the estimate is. A higher confidence level also results in a wider interval. It is important to note that the true value may or may not fall within the calculated interval.