errorbars

Plus/minus What? How to Interpret Error Bars

[Total: 2    Average: 3.5/5]

People some times find themselves staring at a number with a ± in it when a new physics result is presented. But what does it mean? The aim of this Insight is to give a fast overview of how physicists (and other scientist) tend to present their results in terms of statistics and measurement errors. If we are faced with a value ##m_H = 125.7\pm 0.4## GeV, does that mean that the Higgs mass definitely has to be within the range 125.3 to 126.1 GeV? Is 125.3 GeV as likely as the central value of 125.7 GeV?

Confidence Levels

When performing a statistical analysis of an experiment there are two possible probability interpretations. We will here only deal with the one most common in high-energy physics, called the frequentist interpretation. This interpretation answers questions regarding how likely a certain outcome was given some underlying assumption, such as a physics model. This is quantified by quoting with which frequency we would obtain the outcome, or a more extreme one, if we repeated the experiment an infinite number of times.

Naturally, we cannot perform any experiment an infinite, or even a large enough, number of times, which is why this is generally inferred through assumptions on the distribution of the outcomes or by numerical simulation. For each outcome, the frequency with which it, or a more extreme outcome, would occur is called the “p-value” of the outcome.

When an experiment is performed and a particular outcome has occurred, we can use the p-value to infer the “confidence level” (CL) at which the underlying hypothesis can be ruled out. The CL is given by one minus the p-value of the outcome that occured, i.e., if we have a hypothesis where the outcome from our experiment is among the 5% most extreme ones, we would say that the hypothesis is ruled out at the 100%-5% = 95% CL.

Sigmas

Another commonly encountered nomenclature is that of using a number of σ. This is really not much different from the CL introduced above and is simply a way to referring the CL at which an outcome would be ruled out if the distribution was Gaussian and it was a given number of standard deviations away from the mean. The following list summarises the confidence levels associated with the most common numbers of sigmas:

1σ = 68.27% CL
2σ = 95.45% CL
3σ = 99.73% CL
5σ = 99.999943% CL

In particle physics, the confidence levels of 3 and 5σ have a special standing. If the hypothesis that a particle does not exist can be ruled out at 3 sigma, we refer to the outcome as “evidence” for the existence of the particle, while if it can be ruled out at 5σ, we refer to it as a “discovery” of the particle. Therefore, when we say that we have discovered a particle, what is really being implied is that, if the particle did not exist, the experimental outcome would only happen by chance in 0.000057% of the experiments if we repeated the experiment an infinite number of times. This means that if you could perform an experiment every day, it would take you on average about 5000 years to get such an extreme result by chance.

It is worth noting that this interpretation of probabilities makes no attempt to quantify how likely it is that the hypothesis is true or false, but only refers to how likely the outcomes are if it is true. Many physicists get this wrong too! You may often hear statements such as “we are 99.999943% certain that the new particle exists”, such statements will generally not be accurate and rather belong to the other interpretation of probability, which we will not cover here.

Error Bars

So what does all of this have to do with the ± we talked about in the beginning when discussing the errors in some parameter? Unless otherwise specified, the quoted errors generally refer to the errors at 1σ, i.e., 68.27% CL. What this means is that all of the values outside the error bars are excluded at the 1σ level or stronger. Consequently, all the values inside the error bar are not excluded at this confidence level, meaning that the observed outcome was among the 68.27% less extreme ones for those values of the parameter.

This also means that the error bars are not sharp cutoffs, a value just outside the error bars will generally not be excluded at a level much stronger than 1σ and a value just inside the error bars will generally be excluded at almost 1σ.

The last observation we will make is that the confidence level of a given interval may not be interpreted as the probability that the parameters are actually in that interval. Again, this is a question which is not treated by frequentist statistics. You may therefore not say that the true value of the parameter will be within the 1σ confidence interval with a probability of 68.27%. The confidence interval is only a means of telling you for which values of the parameter the outcome was not very unlikely. To conclude, let us return to the statement ##m_H = 125.7\pm 0.4## GeV and interpret its meaning:

“If the Higgs mass is between 125.3 and 126.1 GeV, then what we have observed so far is among the 68.27% less extreme results. If the Higgs mass is outside of the interval, it is among the 31.73% more extreme results.”

Associate professor in theoretical astroparticle physics. He did his thesis on phenomenological neutrino physics and is currently also working with different aspects of dark matter as well as physics beyond the Standard Model. Author of “Mathematical Methods for Physics and Engineering” (see Insight “The Birth of a Textbook”). A member at Physics Forums since 2014.

20 replies
Newer Comments »
  1. Stephen Tashi
    Stephen Tashi says:

    I think you should clarify the point that there are two distinct types of confidence intervals in frequentist statistics.  The kind of confidence interval treated by statistical theory is an interval defined with respect to a fixed but unknown population parameter such as the unknown population mean.   This type of interval can have a known numerical  width but it does not have known numerical endpoints since the population parameter is unknown. For this type interval, we can state a probability (e.g. .6827)  that the true population parameter is within it – we just don't know where the interval is!The second type of confidence interval is an interval stated with respect to an observed  or estimated value of a population parameter, such as 125.7.  (Some statistics textbooks say that is "an abuse of language" to call such an interval a "confidence interval".)As you point out, the interpretation of a "confidence interval" with numerical endpoints is best done by putting the scenario for statistical  confidence intervals out of one's mind and replacing it by the  scenario for  hypothesis testing.

  2. Orodruin
    Orodruin says:
    Stephen Tashi

    I think you should clarify the point that there are two distinct types of confidence intervals in frequentist statistics.

    While I understand your concern, my main aim was to give an as short as possible overview of what people should take away from a number quote from a physics experiment and not to be strictly correct in a mathematical statistics sense. Far too many times I have seen people drawing the conclusion that 0 is impossible because the experiment said 1.1±1.0.

  3. Dale
    Dale says:

    Good overview of confidence intervals.

    Sometimes a plot with an error bar just shows a mean and standard deviation. Would you still interpret those as confidence intervals rather than descriptive statistics?

  4. Orodruin
    Orodruin says:
    DaleSpam

    Sometimes a plot with an error bar just shows a mean and standard deviation. Would you still interpret those as confidence intervals rather than descriptive statistics?

    No. If you have a family of measurements (essentially repeated experiments) it is something different. Of course, it is up to the authors to make sure that what is shown cannot be misinterpreted. Of course, from the data given and a model of how data relates to the model parameters, you could do parameter estimation with the data and obtain confidence intervals for the parameters. With such a collection of measurements, the descriptive statistic which is the standard deviation will generally not encompass any systematic error made in the measurement. In HEP, it is most often the case that you do not have several measurements of the same quantity, rather you are concerned with binned (or unbinned) energy spectra (i.e., a number of events per bin, or a number of events with associated energies). Error bars in such plots will generally contain both the statistical uncertainty as well as any systematic uncertainties.

  5. Dale
    Dale says:
    Orodruin

    Of course, it is up to the authors to make sure that what is shown cannot be misinterpreted.

    I agree 100%. I work mostly in the medical literature. I think that they are less careful about such things, and it always irritates me.

  6. Orodruin
    Orodruin says:
    DaleSpam

    I agree 100%. I work mostly in the medical literature. I think that they are less careful about such things, and it always irritates me.

    I do understand what you mean. My mother is a medical doctor and was writing a paper in the beginning of the year and their experimental result was in conflict with a seminal paper in the field which was used to advice patients due to a claimed linear correlation between two quantities (given with four significant digits!), so she asked for some advice. When looking into it, it turned out that the statistical material available to the seminal paper was not sufficient to even claim a positive correlation within the linear model used (and on top of it the model provided an awful best fit — the one given with four significant digits). This is what happens when you just look at your data and want to see a line.

  7. Dale
    Dale says:
    Orodruin

    When looking into it, it turned out that the statistical material available to the seminal paper was not sufficient to even claim a positive correlation within the linear model used (and on top of it the model provided an awful best fit — the one given with four significant digits).

    Unfortunately that is quite common. I review a lot of papers with really basic statistical mistakes. It probably did not even occur to the authors of the seminal paper that they should put a confidence interval around their fit parameter.

    I think that medical researchers would do better at understanding Bayesian statistics in terms of what they mean, but it will be quite some time before those methods are developed enough to be "routine" enough for them to use.

  8. gleem
    gleem says:

    It might be instructive to show the type of data that is  used  to get the result mH=125.7±0.4 GeV,   Unlike the biological sciences Physics experiments are not usually  dominated by statistical uncertainties.

  9. mfb
    mfb says:

    Nice post!

    gleem

    Unlike the biological sciences Physics experiments are not usually dominated by statistical uncertainties.

    Most analyses in particle physics are limited by statistics. The Higgs mass measurement is one of them – the statistic uncertainty is two times the systematic uncertainty in the most recent combination.

Newer Comments »

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply