- #1

liquidFuzz

- 102

- 6

You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- I
- Thread starter liquidFuzz
- Start date

In summary, the choice between using standard deviation σ or confidence interval ci depends on personal preference, the preference of peer-reviewers, the field and the analysis. In particle physics, confidence intervals are often used when experimental precision is low and the analysis is looking for new effects. However, once the existence of an effect has been shown, standard deviations are often used for measurements. The confidence interval is divided by the square root of the number of samples because it is an estimate from the sample and not a known value. A confidence interval cannot be interpreted as the probability that the interval contains the true value of the parameter, and making any statement about this requires Bayes theorem. Asymmetric distributions can result in asymmetric confidence intervals, which is

- #1

liquidFuzz

- 102

- 6

Physics news on Phys.org

- #2

mfb

Mentor

- 37,246

- 14,083

In particle physics, it is typical to give a confidence interval if the experimental precision is not good, especially if the analysis tries to find some new effect (e. g. see if parameter is different from zero, like the signal strength of the Higgs boson). If the existence of the effect has been shown, measurements will give the standard deviation (e. g. for the mass of the Higgs boson), confidence intervals are often given in addition to that.

What do you mean here?liquidFuzz said:but I don't see why the the ci is divided by the root of the number of samples

- #3

liquidFuzz

- 102

- 6

Thanks!

- #4

MarneMath

Education Advisor

- 550

- 198

I presume that when you ask regarding the square root of the sample you are referring to something like this:

x_bar + t*(s/√n). The square root of the n just comes from the estimate of that particular parameter. If you're interested in learning more, feel free to research bias and unbiased estimators.

- #5

Number Nine

- 813

- 25

MarneMath said:Both are generally good things to include. Your C.I. is basically saying that if repeated samples were taken than 95% of the time the interval will contain the mean. Thus you're 95% certain that the interval contains the population true mean.

You cannot interpret a confidence interval in that way. A confidence interval says nothing about the probability that the interval contains the true value of the parameter. To illustrate: Flip a biased coin (one which lands on heads with probability .95); if the coin lands on heads, then define our confidence interval to span the entire real line; if it lands on tails, then set the interval to be empty. This is a 95% confidence interval for the mean (it is a random interval which contains the true mean 95% of the time). If the coin lands on tails, then it is still a 95% confidence interval, but it clearly cannot contain the true mean.

In general, making any statement about the probability that a hypothesis is true or false, or the probability that a parameter takes a particular value, requires Bayes theorem.

- #6

MarneMath

Education Advisor

- 550

- 198

Your example of making the entire line the interval based on one flip is odd. It makes no reference to alpha level, deviations or sample size.

- #7

MarneMath

Education Advisor

- 550

- 198

Your confidence interval is integral from -alpha to alpha of p(x)dx = some_fraction where p(x) is some pdf. Do with that as you wish.

- #8

FactChecker

Science Advisor

Homework Helper

Gold Member

2023 Award

- 8,876

- 4,317

If your assumed distribution is not symmetric about the mean, the confidence interval can be asymmetric. That is better than using a standard deviation symmetrically around the mean.liquidFuzz said:I have a question regarding when to go for standard deviation σ or confidence interval ci. I am not sure of how to interpret ci.

If either the confidence interval or the standard deviation is estimated from the sample, there will be a denominator with the root of the number of samples. If they are known some how and not estimated from the sample, neither will have that. So the confidence interval and the standard deviation are the same that way.It seems to give me an option to choose the level of confidence, but I don't see why the the ci is divided by the root of the number of samples.

- #9

mfb

Mentor

- 37,246

- 14,083

Your protocol requires prior knowledge about the coin to work. A proper confidence interval cannot use that.Number Nine said:You cannot interpret a confidence interval in that way. A confidence interval says nothing about the probability that the interval contains the true value of the parameter. To illustrate: Flip a biased coin (one which lands on heads with probability .95); if the coin lands on heads, then define our confidence interval to span the entire real line; if it lands on tails, then set the interval to be empty. This is a 95% confidence interval for the mean (it is a random interval which contains the true mean 95% of the time). If the coin lands on tails, then it is still a 95% confidence interval, but it clearly cannot contain the true mean.

In general, making any statement about the probability that a hypothesis is true or false, or the probability that a parameter takes a particular value, requires Bayes theorem.

You also ask that all true values within the confidence interval lead to a higher likelihood than values outside to make the interval unambiguous (at least in the general case).

Uncertainties can also be asymmetric: ##35^{+3}_{-4}##.FactChecker said:If your assumed distribution is not symmetric about the mean, the confidence interval can be asymmetric. That is better than using a standard deviation symmetrically around the mean.

- #10

FactChecker

Science Advisor

Homework Helper

Gold Member

2023 Award

- 8,876

- 4,317

Right. That is an asymmetric confidence interval.mfb said:Uncertainties can also be asymmetric: ##35^{+3}_{-4}##.

- #11

Number Nine

- 813

- 25

mfb said:Your protocol requires prior knowledge about the coin to work. A proper confidence interval cannot use that.

I'm not sure what you mean by this -- the coin is the procedure used to generate the confidence interval, it is not the parameter being estimated. The procedure generates a perfectly valid confidence interval for any real valued parameter. It just happens to be a particularly terrible CI, which doesn't take into account any information from the sample.

Of course, if that example isn't compelling, there's always the great example from Jaynes

The point is that a confidence interval can't be interpreted as our certainty that a parameter lies within a given range.

Last edited:

- #12

Stephen Tashi

Science Advisor

- 7,861

- 1,600

The subtle point about a "confidence interval" that is being discussed highlights the important and subtle distinction between assuming a quantity has a "fixed but unknown value" versus assuming it is a random variable with an associated probability distribution.

For example, suppose we have 10 empty boxes that are distinguished by labels "1", "2",..."10". If we are only given the information that one of the boxes was opened and a ball was placed inside it, we cannot prove statements like "There is probability of 1/10 that the ball is in box "2"". No probabilistic procedure for placing the ball inside the box was specified. For example, we can't rule out the possibility that the ball was put in a box by someone who had a preference for the number "7".

In a practical treatment of the the above example, it is common to take a Bayesian approach by assuming that there is a probability of 1/10 of the ball being placed in a given box. The interesting justification for this assumption is that a uniform distribution accurately portrays* our knowledge * of how the ball was placed. (Our "knowledge" , in this case happens to be total ignorance!)

The definition of "confidence interval" in frequentist statistics (the kind commonly taught in introductory statistics courses) takes the view that the parameter to be estimated has a "fixed, but unknown" value. There are two distinct types of "confidence interval" discussed in frequentist statistics -formally defined confidence intervals versus intervals that are informally called "confidence intervals".

The formal definition of a confidence interval doesn't require specifying enough information to give the interval any numerical endpoints. Such intervals are usually stated using a variable representing the "fixed but known parameter" . For example, ##(\mu - 20.3, \mu+ 20.3)## is such an interval. The numerical endpoints of this confidence interval can't be known unless we are given the value of the ##\mu##, which is the "fixed, but unknown parameter". ( If we were given that value, then ##\mu## would no longer be unknown and there would be no point in creating confidence intervals).

The informal type of "confidence interval" is the kind where definite numerical endpoints are obtained by using an*estimated* value of the "fixed, but unknown parameter". For example, if a sample mean is ##\hat{\mu} = 9.0 ## then an informal confidence interval might be ##( 9.0 - 20.3, 9.0 + 20.3)##. However, even if this is a "95% confidence interval", it cannot be proven that there is a 95% probability that the "fixed, but unknown" value of ##\mu## is within this interval. The assumption "fixed, but unknown" does not provide any information about a random process that was used in setting the "fixed, but unknown value". For example, maybe the population was generated by a person who liked to have ##\mu##'s greater than 30.

The informal Bayesian approach to such an informal confidence interval is simply to believe that there is a 95% probability that the value of ##\mu## is within (9.0 - 20.3, 9.0 + 20.3).

The rigorous Bayesian approach is to make the assumption that ##\mu## was selected as a random variable having some particular probability distribution. It is possible to use that information and the values in the sample to compute the probability that ##\mu## is within (9.0 - 20.3, 9.0 + 20.3). (In this approach, some people use the terminology "credible interval" instead of a "confidence interval" ) However, the calculated probability depends on what particular distribution is assumed for ##\mu##.

A further point about (formal) confidence intervals. There is nothing in the definition of a "confidence interval" that specifies that the procedure for generating confidence intervals must make sense! The scenario for "confidence intervals" is that you have some function ##f(S)## that maps each possible sample ##S## to some interval. As we, know, mathematical functions are permitted to do strange things. For example ##f## might be a constant function that mapped all samples to the interval (-1,1). Or ##f## might be a a function with eccentric rules - like "Map ##S## to (-1,1) if the sample contains the value 7.36, otherwise map ##S## to the interval (-2, 19)".

Of course the commonly encountered kind of confidence intervals are those that make the interval a function of both the sample data and the "fixed, but unknown" parameter of interest.

For example, suppose we have 10 empty boxes that are distinguished by labels "1", "2",..."10". If we are only given the information that one of the boxes was opened and a ball was placed inside it, we cannot prove statements like "There is probability of 1/10 that the ball is in box "2"". No probabilistic procedure for placing the ball inside the box was specified. For example, we can't rule out the possibility that the ball was put in a box by someone who had a preference for the number "7".

In a practical treatment of the the above example, it is common to take a Bayesian approach by assuming that there is a probability of 1/10 of the ball being placed in a given box. The interesting justification for this assumption is that a uniform distribution accurately portrays

The definition of "confidence interval" in frequentist statistics (the kind commonly taught in introductory statistics courses) takes the view that the parameter to be estimated has a "fixed, but unknown" value. There are two distinct types of "confidence interval" discussed in frequentist statistics -formally defined confidence intervals versus intervals that are informally called "confidence intervals".

The formal definition of a confidence interval doesn't require specifying enough information to give the interval any numerical endpoints. Such intervals are usually stated using a variable representing the "fixed but known parameter" . For example, ##(\mu - 20.3, \mu+ 20.3)## is such an interval. The numerical endpoints of this confidence interval can't be known unless we are given the value of the ##\mu##, which is the "fixed, but unknown parameter". ( If we were given that value, then ##\mu## would no longer be unknown and there would be no point in creating confidence intervals).

The informal type of "confidence interval" is the kind where definite numerical endpoints are obtained by using an

The informal Bayesian approach to such an informal confidence interval is simply to believe that there is a 95% probability that the value of ##\mu## is within (9.0 - 20.3, 9.0 + 20.3).

The rigorous Bayesian approach is to make the assumption that ##\mu## was selected as a random variable having some particular probability distribution. It is possible to use that information and the values in the sample to compute the probability that ##\mu## is within (9.0 - 20.3, 9.0 + 20.3). (In this approach, some people use the terminology "credible interval" instead of a "confidence interval" ) However, the calculated probability depends on what particular distribution is assumed for ##\mu##.

A further point about (formal) confidence intervals. There is nothing in the definition of a "confidence interval" that specifies that the procedure for generating confidence intervals must make sense! The scenario for "confidence intervals" is that you have some function ##f(S)## that maps each possible sample ##S## to some interval. As we, know, mathematical functions are permitted to do strange things. For example ##f## might be a constant function that mapped all samples to the interval (-1,1). Or ##f## might be a a function with eccentric rules - like "Map ##S## to (-1,1) if the sample contains the value 7.36, otherwise map ##S## to the interval (-2, 19)".

Of course the commonly encountered kind of confidence intervals are those that make the interval a function of both the sample data and the "fixed, but unknown" parameter of interest.

Last edited:

- #13

MarneMath

Education Advisor

- 550

- 198

Jaynes paper is rather famous but it's also deceptive. The problem is Jaynes interprets the frequentist confidence interval as something it isn't usually interpreted as (i.e. a probability where we might expect the true value of whatever statistic we are interested into live in). Hence, naturally Jaynes finds some form of contradiction.Number Nine said:I'm not sure what you mean by this -- the coin is the procedure used to generate the confidence interval, it is not the parameter being estimated. The procedure generates a perfectly valid confidence interval for any real valued parameter. It just happens to be a particularly terrible CI, which doesn't take into account any information from the sample.

Of course, if that example isn't compelling, there's always the great example from JaynesConfidence Intervals vs. Bayesian Inference(p. 197), where he constructs a confidence interval for the lower bound of a shifted exponential distribution that lies entirely above the smallest data point (which is clearly impossible).

The point is that a confidence interval can't be interpreted as our certainty that a parameter lies within a given range.

The fact of the matter if you ask the question Jaynes is basically ask (give me some interval where the true parameter resides) then yeah it's obviously better to use a credible interval over a confidence interval. However, if the question is give me an interval where if I were to repeat this experiment a "large" number of times and the true value of the statistic will reside in that interval p*100 % of the time, then Confidence intervals clearly give you that answer.

I generally agree with Jaynes view that often times researchers will apply a confidence interval when a credible interval makes more sense, but that doesn't make a confidence interval any less reasonable when used for the right intent. Confusing the two often causes odd problems (i.e. a 95% credible interval eventually leading to 0% coverage in a freq approach (over the long-run-whatever that means.)

- #14

mfb

Mentor

- 37,246

- 14,083

That is the point.MarneMath said:The fact of the matter if you ask the question Jaynes is basically ask (give me some interval where the true parameter resides) then yeah it's obviously better to use a credible interval over a confidence interval. However, if the question is give me an interval where if I were to repeat this experiment a "large" number of times and the true value of the statistic will reside in that interval p*100 % of the time, then Confidence intervals clearly give you that answer.

If you look at the result after the measurement, you cannot say "the true value will be in the interval with 95% probability". That statement doesn't even make sense in frequentist statistics and it is wrong in Bayesian statistics.

But you can say, before doing the measurement, that you have a 95% probability to produce an interval which will cover the true value. And you can make sure in advance that your confidence intervals will be meaningful (e. g. not cover the whole real axis with 95% probability).

- #15

Stephen Tashi

Science Advisor

- 7,861

- 1,600

MarneMath said:However, if the question is give me an interval where if I were to repeat this experiment a "large" number of times and the true value of the statistic will reside in that interval p*100 % of the time, then Confidence intervals clearly give you that answer.

The use of singular and plural in that statement could be made more precise ( "give me an interval", "that interval", "reside in that interval" vs the plural "confidence intervals"). If the question is "give me a procedure for generating intervals from sample data such that the true value of the parameter will reside in the generated interval p*100% of the time" then (the commonly used) procedures for confidence intervals give you that answer.

So it's fair to ask if a procedure for credible intervals can answer the same question. It can.

The difference between a procedure for generating credible intervals and a procedure for generating confidence intervals is not that they necessarily answer different questions in the same mathematical problem. They are different because they answer questions about different mathematical problems. The difference begins when the same real life situation is modeled in two different ways. It's analogous to have having a real life situation of a pulleys and strings and making distinct sets of decisions about how to model it - e.g. to consider the pulleys mass-less or not? - to assume the strings obey Hooke's law or not?.

- #16

Stephen Tashi

Science Advisor

- 7,861

- 1,600

mfb said:If you look at the result after the measurement, you cannot say "the true value will be in the interval with 95% probability". That statement doesn't even make sense in frequentist statistics

True.

and it is wrong in Bayesian statistics.

That can be debated! It might be a vocabulary exercise in defining "credible interval".

Returning to the example of 10 empty boxes, if a ball is put "at random" in one of the ten boxes (by a procedure that gives a probability of 1/10 of putting it in each particular box) then, mathematically the probability that ball is in box "2" is 1/10. If we pick box 2 and open it then the ball either will or will-not be in box 2. So, with the new information, the probability in box 2 changes to either 0 or 1. This is not because mathematics gives different answers to the same problem. It is because two different mathematical problems have been stated. One problem states that box 2 was examined; the other doesn't assert that.

If we have a specific Bayesian credible interval (i.e one with numerical endpoints like ( 19.2 , 28.6) ) then mathematically we can compute the probability that the value of the parameter of interest is in that specific interval. That mathematical result is done after the data that produced that interval is known.

Some real life event may change the mathematical model for the problem. For example, suppose the data is being generated by a simulation. If, in addition to the specific sample data, we also get a glimpse of some print-out from the simulation program that reveals the value of the parameter of interest, then the probability that the parameter is in (19.2, 28.6) changes to either 0 or 1. But the change in the possible probability values is due to a change in the "givens". The mathematical problem has been changed.

The usual scenario for dealing with a credible interval like (19.2, 28.6) isn't going to be that the sample data reveals the the true value of the parameter of interest. So it is not analgous to example of opening box "2" to see if the ball is in it.

if the Bayesian model of the problem implies there is a 95% probability that the actual value of the parameter of interest in (19.2, 28.6) one may always raise the philosophical objection that the actual value either is in the interval or is not in the interval - thus the probability it is in the interval is either 0 or 1. The same objection can be raised to saying that the probability that fair coin lands heads is 1/2 - because it either will land heads or will not land heads.

- #17

mfb

Mentor

- 37,246

- 14,083

Take 20 coins, throw them 100 times each to determine how likely they are to land heads up, for every coin, calculate the usual confidence intervals. If one coin gets a 95% confidence interval that does not include 0.5, are you 95% confident that this coin is biased? I hope not!Stephen Tashi said:That can be debated! It might be a vocabulary exercise in defining "credible interval".

- #18

Stephen Tashi

Science Advisor

- 7,861

- 1,600

mfb said:Take 20 coins, throw them 100 times each to determine how likely they are to land heads up, for every coin, calculate the usual confidence intervals. If one coin gets a 95% confidence interval that does not include 0.5, are you 95% confident that this coin is biased? I hope not!

That example isn't a well-posed mathematical problem, so I wouldn't be "confident" about any mathematical conclusion.

- #19

mfb

Mentor

- 37,246

- 14,083

You get the result of an experiment, and you are asked about your Bayesian confidence that something is true. And see above: If you just take the confidence interval as your confidence, you are doing it wrong.

- #20

Stephen Tashi

Science Advisor

- 7,861

- 1,600

mfb said:What is not well-posed about it?

If you intend to illustrate something about Bayesian credible intervals, you need to state a Bayesian model for the problem. You haven't stated anything about a prior distribution.

If you are making the point that creating a Bayesian model for real life problem is controversial or unreliable then that sort of objection can be made to any method of mathematically modelling a real life problem.

- #21

mfb

Mentor

- 37,246

- 14,083

That was my point.Stephen Tashi said:You haven't stated anything about a prior distribution.

If you perform an analysis and make a confidence interval, then looking at this confidence interval and claiming "the true value will be in the interval with 95% probability" does not work (in general).

- #22

Stephen Tashi

Science Advisor

- 7,861

- 1,600

mfb said:That was my point.

If you perform an analysis and make a confidence interval, then looking at this confidence interval and claiming "the true value will be in the interval with 95% probability" does not work (in general).

I agree.

When I'm speaking of "credible intervals", I mean those intervals that are constructed from an actual Bayesian model. I don't mean confidence intervals created from a frequentist model and then misinterpreted as if they were Bayesian credible intervals.

A confidence interval is a range of values that is likely to contain the true population mean with a specified level of confidence. It is calculated from a sample of data and gives an estimate of the range of values within which the true population mean is likely to fall.

The confidence interval is calculated using the sample mean, standard deviation, and the desired level of confidence. The formula for a confidence interval is: CI = x̄ ± z * (σ / √n), where x̄ is the sample mean, z is the z-score corresponding to the desired confidence level, σ is the population standard deviation, and n is the sample size.

The purpose of a confidence interval is to provide a range of values that is likely to contain the true population mean with a specified level of confidence. This allows us to estimate the true population mean from a sample of data and assess the precision of our estimate.

A confidence interval is calculated using the standard deviation of the sample data. The larger the standard deviation, the wider the confidence interval will be. This is because a larger standard deviation indicates more variability in the data, making it more difficult to estimate the true population mean.

A confidence interval is a range of values that is likely to contain the true population mean, while standard deviation is a measure of the spread of the data around the mean. Confidence interval is used to estimate the true population mean, while standard deviation is used to describe the variability of the data. Additionally, a confidence interval is calculated using the sample mean and standard deviation, while standard deviation is calculated using the individual data points.

- Replies
- 4

- Views
- 1K

- Replies
- 22

- Views
- 3K

- Replies
- 39

- Views
- 874

- Replies
- 3

- Views
- 943

- Replies
- 4

- Views
- 2K

- Replies
- 6

- Views
- 3K

- Replies
- 7

- Views
- 3K

- Replies
- 24

- Views
- 4K

- Replies
- 9

- Views
- 2K

- Replies
- 5

- Views
- 1K

Share: