# I Correct understanding of confidence intervals...

#### fog37

Hello,
I am attempting to correctly interpret what a confidence interval means.

This is what I know: a confidence interval is a a continuous interval of values with a lower bound and an upper bound centered around a sample mean.For example given a certain population, we are interested in the true population mean and the 95% confidence interval CI:
• We can extract from the population N equal samples (all sample having identical size $n$). Assume we pick N=100 samples.
• Each sample will have its own sample mean and its own sample standard deviation $s$.
• Each sample will also generate its own confidence interval center at its own sample mean. The CI limits of each sample depends on standard deviation $s$ and the $z$ score we choose (the $z$ score value will determine if we talk about a 95% or 99% or 100% confidence interval). We pick $z=1.96$.
• We end up with 100 samples and 100 confidence intervals. 95 among those 100 confidence intervals will contain the true population mean and 5 confidence intervals will surely not.
• The best estimate of the population mean is the average of the sample means. And as far as confidence interval...which confidence interval do we pick among the 95 CI that we are sure all contain the true population mean?
Thanks!

Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### mathman

Why not use the standard deviation of the 100n samples?

#### fog37

Apparently that is not how the CI limits are calculated. see https://www.mathsisfun.com/data/confidence-interval.html

They show that the interval is centered at the sample mean and the limits are calculated using the sample size $n$, the $Z$ score, and the sample standard deviation $s$: $$\pm Z \frac{s}{\sqrt{n}}$$

#### Dale

Mentor
The best estimate of the population mean is the average of the sample means. And as far as confidence interval...which confidence interval do we pick among the 95 CI that we are sure all contain the true population mean?
I am not sure what you are asking here. There is no such thing as a population confidence interval. So what are you trying to estimate here?

#### fog37

If there are 100 samples, each sample generates its own 95% CI with its own limits. 95 of these CI contain the true population value for sure and 5% does not.
Which, among the 95 interval, should we consider? As far as a point estimate, we take the average of the sample average. But as far as these many, 95, interval, which one do we choose to represent the interval estimate?

#### WWGD

Gold Member
Overall, an x% confidence interval means or is interpreted as saying that x% of the samples conducted in the same way will contain the true population statistic ( mean, variance, etc.)

#### Dale

Mentor
If there are 100 samples, each sample generates its own 95% CI with its own limits. 95 of these CI contain the true population value for sure and 5% does not.
Which, among the 95 interval, should we consider? As far as a point estimate, we take the average of the sample average. But as far as these many, 95, interval, which one do we choose to represent the interval estimate?
None of the individual CI will be as good as a single CI formed with all 100*n samples. Calculating the mean of the means is the same as calculating the mean of the overall data set of 100*n samples (provided each sample is the same size). The CI is not so simple and you need to actually calculate it on the whole data set.

#### fog37

Ok, if we know the entire population, as one huge sample containing all the items, we could calculate its true mean, variance standard deviation, and confidence interval CI.

But we often need to do sampling statistics and work with a finite number of samples of finite size. And we get a confidence interval from each sample. I guess any confidence interval is good as long as the confidence interval contain the true population parameter...

#### stevendaryl

Staff Emeritus
I just want to point out something that is well-known, but is often glossed-over. A confidence interval of 95% doesn't really mean that there is a 5% chance that the true number (the true mean, for instance) is outside the interval. If you don't know the actual distribution, then you don't have any idea whether your sample accurately reflects that distribution.

Suppose that you're measuring the heights of American males. For your particular sample, you find the mean, say $1.8$ meters, and the standard deviation, say $0.2$ meters. What you would like to know is: What's the probability that the true mean (if we checked every American male) is between $1.75$ and $1.85$? You don't know. You have no way of knowing in a non-subjective way.

Let's define some variables:
• $M$ = the actual (unknown) mean among all American males.
• $\sigma$ = the actual standard deviation
• $M_s$ = the mean for our sample
• $\sigma_s$ = the standard deviation for our sample
What you would like to be able to compute is:

$P(M, \sigma | M_s, \sigma_s)$

the probability that the actual mean is $M$ and the actual standard deviation is $\sigma$ given that our sample mean is $M_s$ and the sample standard deviation is $\sigma_s$. You'd like to be able to say:

Claim 1: The probability that $M < 1.75$ is less than 5%.

But you can't compute that. What you can compute is the reverse: $P(M_s, \sigma_s | M, \sigma)$

This is the probability of getting a sample mean $M_s$ and sample standard deviation $\sigma_s$ under the assumption that the true mean and standard deviation are $M, \sigma$. So you can say:

Claim 2: If the true mean were 1.75 or less, and the true standard deviation were 0.2, then the probability that our sample mean would be 1.8 is less than 5%.

That's a different statement. People very often sloppily act as if confidence intervals tell you something like Claim 1, when they actually tell you something like Claim 2.

#### Dale

Mentor
But you can't compute that
Well, you can with Bayesian statistics, but I assume you know that and were just making a point about misunderstanding of frequentist statistics.

#### Stephen Tashi

• Each sample will also generate its own confidence interval center at its own sample mean.
That's false unless we use an ambiguous definition of "confidence interval". For example, if our sampling scheme has a 95% confidence interval of $\pm 5.3$ for the population mean then there is a .95 probability that the randomly selected sample mean will lie within $\pm 5.3$ of the unknown population mean. But if a particular sample has a sample mean of 47, this does not imply that there is a .95 probability that the unknown population mean is within $\pm 5.3$ of 47.

Calling $( 47 - 5.3, 47 + 5.3)$ a "confidence interval" is technically incorrect, although people unfamiliar with mathematical statistics call it such. The "frequentist" analysis of data regards the population mean as having a fixed but unknown value. This is different than modeling the population mean as a random variable. From the point of view that the population mean has a fixed but unknown value, is it is logically inconsistent to assign a probability for the population mean to be in an interval with specific numerical endpoints.

If you want to assign a probability that the population mean is in a specific interval, you need to formulate the problem in a Bayesian way and model the population mean as a random variable. Then specific Bayesian "credible intervals" can be computed from sample data.

• The best estimate of the population mean is the average of the sample means.

If you want to get the concepts of statistics straight in you mind, you must be clear what you mean by "best". Study the different concepts of "unbiased estimators", "minimum variance estimators", "maximum liklihood estimators".

For example, suppose we know a random variable has equal probability of taking on each of the values $x, x+1, x+3$ and we take 3 independent samples. If the samples we observe are {8,8,10}, is it "best" to estimate the population mean as (8+8+10)/3 or is it "best" to estimate it as (7+8+10)/3 ?

(An estimator need not be defined by a simple algebraic expression. It can be defined by a complicated algorithm that employs various if-then branches.)

#### FactChecker

Gold Member
2018 Award
That's false unless we use an ambiguous definition of "confidence interval".
The definition does leave us with a difficult interpretation. https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation
For example, if our sampling scheme has a 95% confidence interval of $\pm 5.3$ for the population mean then there is a .95 probability that the randomly selected sample mean will lie within $\pm 5.3$ of the unknown population mean. But if a particular sample has a sample mean of 47, this does not imply that there is a .95 probability that the unknown population mean is within $\pm 5.3$ of 47.

Calling $( 47 - 5.3, 47 + 5.3)$ a "confidence interval" is technically incorrect, although people unfamiliar with mathematical statistics call it such.
This is how "confidence interval" is usually defined. That leaves us with the common misconception that we can interpret it in a simple probability manner.

Last edited:

#### stevendaryl

Staff Emeritus
The definition does leave us with a difficult interpretation. https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation
This is how "confidence interval" is usually defined. That leaves us with the common misconception that we can interprit it in a simple probability manor.
As I have said about it, what you really want to know is "There is a 95% chance that the true mean is within $\Delta x$ of the sample mean". But there is no way to know that without making subjective assumptions (Bayesian reasoning does that, as @Dale points out). So what people instead calculate is something that is kind of complicated and whose significance is questionable, the confidence interval.

The alternative quantity described in the Wikipedia article would allow you to say something backwards like "There is a 95% chance that a randomly collected sample mean is within $\Delta x$ of the true mean". That sounds similar, but isn't exactly the same. Of course, you can't actually calculate that, either, unless you make assumptions about the true standard deviation. (The Wikipedia article says that using the "student-t distribution", you can avoid making assumptions about the true standard deviation, but I don't understand how that works.)

#### FactChecker

Gold Member
2018 Award
As I have said about it, what you really want to know is "There is a 95% chance that the true mean is within $\Delta x$ of the sample mean".
Yes, that is what people want. But that is not what the confidence interval really does. The originator of the method, Neyman, knew that and warned about it. A proper interpritation is "If the true parameter is outside of this interval, the odds of getting a sample like this is less than xxx." The OP asks about the confidence interval and the interpritation of it. He seems to recognize that there are some issues, and he is correct. It is good to address this issue because it comes up in hypothesis testing in general.
But there is no way to know that without making subjective assumptions (Bayesian reasoning does that, as @Dale points out).
At least the Bayesian approach identifies the issue in concrete terms, but it opens up an entire set of questions and issues.

Last edited:

#### stevendaryl

Staff Emeritus
At least the Bayesian approach identifies the issue in concrete terms, but it opens up an entire set of questions and issues.
Sure. It would be very difficult and confusing to try to rewrite statistical results in a way that uses Bayesian analysis. For one thing, you couldn't just publish experimentally derived numbers, you would have to also publish your prior probabilities, and it would be difficult to establish what are sensible priors, and it would be difficult for researchers to use the results of researchers that used different priors. It would be a mess.

On the other hand, "confidence intervals" don't actually give any confidence at all, if you understand what they mean, unless you supplement them with subjective judgments. If someone proves that "The probability of gathering this sample data by chance if the tooth fairy doesn't exist is less than 5%", that doesn't actually tell us anything about the likelihood that the tooth fairy exists.

#### atyy

Sure. It would be very difficult and confusing to try to rewrite statistical results in a way that uses Bayesian analysis. For one thing, you couldn't just publish experimentally derived numbers, you would have to also publish your prior probabilities, and it would be difficult to establish what are sensible priors, and it would be difficult for researchers to use the results of researchers that used different priors. It would be a mess.
https://arxiv.org/abs/astro-ph/9812133
This is an example of a paper that uses Bayesian analysis (in addition to Frequentist statistics). They describe their priors.

#### FactChecker

Gold Member
2018 Award
On the other hand, "confidence intervals" don't actually give any confidence at all, if you understand what they mean, unless you supplement them with subjective judgments.
I tend to disagree, although the "confidence" does not have an authoritative numerical value. If the sample results make one very skeptical of a certain parameter range, that is confidence. IMHO, it is better to leave it there than to think that the Bayesian results are any more authoritative. They formalize the issue, but do not really put the results on any more firm ground -- so it can be deceptive.

#### Dale

Mentor
It would be very difficult and confusing to try to rewrite statistical results in a way that uses Bayesian analysis.
That is true, but only because we have been doing it a different way for so long. If we had more institutional experience with Bayesian methods then it would not be any more difficult and confusing. Particularly since the Bayesian quantities are typically those that are actually of interest and more aligned with how people think about their data. E.g. the Bayesian “95% credible interval” for a parameter is an interval that has a 95% probability that it contains the parameter. That is directly what people wish a confidence interval told them.

you would have to also publish your prior probabilities, and it would be difficult to establish what are sensible priors, and it would be difficult for researchers to use the results of researchers that used different priors.
On the contrary, with Bayesian methods you can directly use another researchers’ data in your own analysis with your own priors and together with you new data. You cannot do that using standard methods because of the multiple comparisons issue.

In my mind, the reuse of data and naturalness of the results are the two main reasons to switch to Bayesian analyses. In particular, I would think that the data reuse is something that scientists using public funds should feel ethically obligated to do.

If your fellow-citizens have paid millions of dollars to purchase your data and if you can make it “use once and dispose” or “permanent” simply by choice of analysis methods, then how can anyone justify the “disposable data” approach? Unfamiliarity with the alternative seems a poor excuse for not making the best use of the public stewardship.

IMHO, it is better to leave it there than to think that the Bayesian results are any more authoritative. They formalize the issue, but do not really put the results on any more firm ground -- so it can be deceptive.
I don’t think it is about “authoritative” or not. I think it is about naturalness. You have to tie yourself in mental knots to understand what a frequentist result means and how to interpret it in the context of your study. Yes, the Bayesian tools are a little more complicated, but in the end they get you where you wanted to go.

What researcher actually cares about the probability of their data given the null hypothesis? They are interested in their hypothesis, and yet have to test the null hypothesis simply because of the statistical tools.

https://arxiv.org/abs/astro-ph/9812133
This is an example of a paper that uses Bayesian analysis (in addition to Frequentist statistics). They describe their priors.
That is my preferred approach. Use both sets of tools. Swap them as appropriate for your use case.

Last edited:

#### FactChecker

Gold Member
2018 Award
I don’t think it is about “authoritative” or not. I think it is about naturalness. You have to tie yourself in mental knots to understand what a frequentist result means and how to interpret it in the context of your study. Yes, the Bayesian tools are a little more complicated, but in the end they get you where you wanted to go.
I think it just hides the subjective prior beneath a thin veneer of another layer of math. And it creates a great many questions, such as, how much does it take to overcome an erroneous prior to any given accuracy.

#### stevendaryl

Staff Emeritus
On the contrary, with Bayesian methods you can directly use another researchers’ data in your own analysis with your own priors and together with you new data.
Maybe you could (in an Insight article, or a regular article) explain how to combine Bayesian results from different researchers using different priors?

The basic formula used by Bayesian statistics is: (where $\lambda$ is the parameter that you're interested in, and $D$ is the data that is supposed to shed light on the value of $\lambda$)

$P(\lambda | D) = \frac{P(D | \lambda) P(\lambda)}{P(D)} = \frac{P(D|\lambda) P(\lambda)}{\sum_{\lambda'} P(D|\lambda') P(\lambda')}$

where $P(\lambda)$ is the prior likelihood for $\lambda$.

If someone wants to use a different prior, then it seems to me that that means throwing out the actual value for $P(\lambda|D)$, because that is sensitive to the choice of priors. You could still reuse the value of $P(D | \lambda)$, because that is theory-dependent but not dependent on the prior.

#### Dale

Mentor
If someone wants to use a different prior, then it seems to me that that means throwing out the actual value for P(λ|D), because that is sensitive to the choice of priors.
You are exactly correct, you could discard their analysis including both the prior and their posterior. The point I was making is that you can reuse their data in your own analysis with your own priors and your own hypotheses.

You cannot do the same with frequentist methods. If you do one test and then later do another test on the same data then you have actually done a more complicated experiment which effectively increases the p value of the original test. This is the root of the multiple comparisons issue. In frequentist methods data should only be used once, and if you use it multiple times then you need to account for it by an adjustment for multiple comparisons.

That is not an issue for Bayesian methods. This guy is a little “partisan” for Bayesian methods, but he highlights the statistical issue well http://www.indiana.edu/~kruschke/articles/Kruschke2010WIRES.pdf

#### stevendaryl

Staff Emeritus
You are exactly correct, you could discard their analysis including both the prior and their posterior. The point I was making is that you can reuse their data in your own analysis with your own priors and your own hypotheses.
Yes, that's true.

You cannot do the same with frequentist methods. If you do one test and then later do another test on the same data then you have actually done a more complicated experiment which effectively increases the p value of the original test. This is the root of the multiple comparisons issue. In frequentist methods data should only be used once, and if you use it multiple times then you need to account for it by an adjustment for multiple comparisons.

That is not an issue for Bayesian methods. This guy is a little “partisan” for Bayesian methods, but he highlights the statistical issue well http://www.indiana.edu/~kruschke/articles/Kruschke2010WIRES.pdf
I would be all in favor of switching to Bayesian methods, because I actually think it's the correct way to think about probabilities, but there is a lot of inertia to overcome.

#### Dale

Mentor
there is a lot of inertia to overcome
That is the number one problem. I see no quick cure for that.

"Correct understanding of confidence intervals..."

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving