Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Confidence interval interpretation mistake

  1. Jul 4, 2013 #1
    Hi all. I've been thinking about this question a lot for the past few days and it seems to me that I'm committing a mistake somewhere along the way, but certainly can't figure out where. Here's one of the interpretations which I've encountered most frequently and think is the right one (here's the Wiki version):

    The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter x% of the time."

    Here's what I derive from this statement. Let's say that we set x=95. This means that if I keep sampling from a population whose true mean is μ and I repeat the procedure for obtaining the 95% confidence interval, say, 100 billion times, about 95 billion of those intervals will include μ. Now imagine that you put all those hypothetically calculated 100 billion confidence intervals in a giant imaginary bag. If I know that I'm about to sample from the population and calculate the confidence interval based on the sample mean and the standard error, this is equivalent to randomly drawing one confidence interval from the giant bag. I know that 95% of the confidence intervals inside the bag cover the (unknown to me) population mean μ, so the probability of the confidence interval I randomly picked from the bag covering μ is 95%.

    After calculating my 95% CI, I reason like this.

    1. Since there is a 95% chance that this interval covers μ, then there is a 95% chance that μ's true value is one of the values inside the CI (call the bolded statement p).

    2. The complementary statement Not-p = μ's true value is not one of the values inside the CI must, therefore, have a probability of 5%.

    3. Therefore, the probability that any value outside of the CI is the true population mean is at most 5%.

    4. Therefore, if the value associated with the null hypothesis lies outside of the CI, we can say there is at most a 5% chance that the null hypothesis is true.

    Now, I know the last statement is dead wrong. I'm quite aware of it and I don't need any convincing. But I keep looking back at the logical steps I took and I just can't figure out where I'm making a mistake.

    I'm already quite confused, so please only respond if you're an expert in the field and/or you are very confident in your response. Don't respond based only on intuition because that would confuse me even more :)
  2. jcsd
  3. Jul 4, 2013 #2


    User Avatar
    2017 Award

    Staff: Mentor

    That is true only if you did not look at the result yet.
    Otherwise, you do not have a random value any more - you have a single, specific interval, which is either "true" (true value is in the interval) with 100% probability or "false" with 100% probability. You just don't know which of those cases you have.

    Between 2 and 3, you switch between the probability to get some result in your study, and a probability for μ (which is not even well-defined with a frequentists approach).
  4. Jul 4, 2013 #3

    Stephen Tashi

    User Avatar
    Science Advisor

    That's isn't correct. Confidence intervals are from "frequentist" statistics. The "frequentist" point of view is that a particular interval either does or does not contain the true mean - there is no probability about the matter (except for a 1 or a 0).

    The distinction between a result having a probability and a result being "definite but unknown" is very interesting. Superficially, it seems that allowing this distinction could eliminate probability altogether. For example, if a person flips a fair coin then you might say it lands with a "definite but unknown result", not "with a probability of 0.5 of being heads".

    There can be contradictory mathematical models for the same real world situation and, as far as I know, there is no way to look at a real world problem and distinguish whether an unmeasured outcome should be called "definite but unknown" or regarded as a having various probabilities. In mathematics, you can model it either way. But after you pick a point of view, you can't contradict it.

    The Bayesian statistical outlook allows and requires some "prior" probability about thel location of the mean. It doesn't say the mean is in a "definite but unknown" location. You can imagine that Nature could have put the mean in many different locations and that our particular world is a random draw from those choices. The frequentist point of view is that the mean is in a definite but unknown location. So there is no probability about it being somewhere - before or after you make a measurement.

    The way you phrase your argument shows that you have Bayesian inclinations. In a practical problem, you can approximately reach the conclusion that you got by picking a "prior" distribution for the mean that spreads the possibilities over a wide range of values. (If you want to hear an extreme Bayesian point of view, look up Ed Jayne's Probability The Logic Of Science on the web.)

    What the commonsense person wants to know from data is "What is the probability that such-and-such is true given the data I observed". If you don't get hynotized by the impressive vocabulary of fequentist statistics ("confidence", "significance") and look at the numbers that can can be calculated, they all quantify "What is the probability of the data given that such-and-such is true". It's a matter of [itex] P(X|Y) [/itex] vs [itex] P(Y|X) [/itex]. The most common misinterpretation of frequentist statistics by laymen is to confuse one of those for the other.

    If we claim that we've calculated [itex] P(A|B) [/itex] = 0.95, it's an interesting question whether we can make a mathematical model where [itex] P(A) [/itex] doesn't exist, or is 1 or 0.
    Last edited: Jul 4, 2013
  5. Jul 4, 2013 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    Here's a toy version of the confidence interval problem. A coin was tossed and landed in a definite but unknown position. It was either heads or tails. If it was heads, a red dot was drawn on the x axis at the location x = -1/2. If it was tails, a red dot was drawn on the x-axis at the postion x = +1/2. From a set of 3 intervals on the x-axis [-1,0],[0,1],[5,7], an interval is chosen at random.

    1.) What is the probability that the interval chosen contains the red dot?

    2.) The interval chosen is [-1,0]. What is the probability that it contains the red dot?

    If you give an answer for 2.) , it has to be correct for any kind of coin because the problem didn't say that "fair" coin was tossed.
  6. Jul 4, 2013 #5
    No. Careful.

    Frequentist statistics (including confidence intervals) generally doesn't attach probabilities to hypotheses, since you can't say anything specific about the probability of a hypothesis without some kind of knowledge of the associated prior probability. The standard example of why this kind of reasoning fails is breast cancer detection.

    It's entirely possible to calculate p-values very close to zero even when the probability of the null hypothesis being true is close to 100%.
  7. Jul 5, 2013 #6
    Thank you all for the replies.

    I don't understand the difference between before I looked at the result and after I looked at it. Since looking at it doesn't actually change anything (it's a passive process), why should the probability collapse from 95% to 0 or 1?

    Do I understand correctly that frequentists don't give probabilities to single propositions or events? Let's say I know for sure that a coin is fair. I flip it and it lands under a table so that I don't see which side it landed on. If I'm a frequentist, I'm not allowed to say that the coin is showing heads with a 0.5 probability?

    If that's the case, I'm having a hard time believing that "true" consistent frequentists exist. Why do I say that? Imagine I measure the height of every male police officer in a random US police department (say, NY) and calculate the mean. If I asked a frequentist the question "what is the probability that the mean is in the interval [160 cm, 220 cm], they would say "this question is meaningless, you can't attach a probability to the mean". But now imagine that I am a very generous rich person and ask them to pick one of the following three intervals:

    [0 cm, 150 cm]
    [150 cm, 220 cm]
    [220 cm, ∞)

    Once they pick one of the three intervals, I will reveal the true mean to them and if it falls inside the interval they picked, I will give them $50 000. I think there should be no doubt that all frequentists would pick the second interval without hesitation. This suggests that they think it's more likely for the true mean to be in the second interval than in any of the other two intervals. This already suggests that they don't just view the probabilities as 0s and 1s, but are capable of assigning a number between 0 and 1.

    I don't see why one would argue against the idea of quantifying any existing uncertainty if it complies with Kolmogorov's axioms.

    Also, do frequentists think that Bayes' theorem is not about probabilities?

    But the 95% CI has a probability of covering the mean, correct? If that's the case, then the disagreement seems just terminological to me.

    Okay, but if the problem had said the coin was fair, then I don't see a problem with answering the second question.
  8. Jul 5, 2013 #7

    Stephen Tashi

    User Avatar
    Science Advisor

    Frequentists do assign probabilities to certain propositions, but they don't assign "prior" probabiliites to things they wish to estimate or test. For example, in hypothesis testing, they don't assign any prior probability to the null hypothesis being true. They compute probabilities based on the assumption that the null hypothesis is true, not that it is true with certain probability.

    Statistically, I think among the wide population of people who use frequentist statistics, only the small fraction who understand the underlying theory of probability can resist the interpretations you describe. As you say, given a particular result and a particular "confidence", the typical user of statistics wants to interpret the "confidence" as a probability of the true value being in the particular interval. This demonstrates my point that the average person wants to know "What is the probability of this fact, given the data" and not vice-vesa.

    Nevertheless, if you are mathematically consistent, you cannot use "confidence" as a synonym for "probability". I explain the emotional behavior of the average user of statistics as showing a correct practical approach to life. Frequentist statistics can't justify the approach. However, as I said, you can justify the approach in practical problems if you assume a prior that is uniform over the range of plasible values. (Most unknowns in practical problems can't really be "anywhere between minus infinity and plus infinity".) In my opinion, people's basic survival instincts cause them to behave like Bayesians. Most find it easier to misinterpret the well-known frequentist approach that to apply the Bayesian approach.

    There are big philosophical arguments about that. Being a Bayesian, I personally see nothing wrong with it either.

    Competent frequentists understand that Bayes theorem is a theorem of probability theory. It's just that they don't set up statistical problems by using prior distributions, so they have less recourse to use Bayes theorem.

    No. You might say " 95% of the confidence intervals cover the mean", but a particular confidence interval either covers the mean or it doesn't if we are given only that the mean is in a "definite but unknown location".

    You might try the verbal argument: "95% of the confidence intervals cover the mean. I picked a confidence interval at random. Hence there is 95% chance that it covers the mean". However, you can't justify the statement that you picked a confidence interval "at random" unless you can specify a probability distribution for the confidence intervals. If you try to do that precisely, I think you'll find it is impossible unless you also assume some prior distribution for the location of the mean.

    I agree, of course. But the problem is posed in a frequentist manner. The dot is at a "specific but uknown location".
    Last edited: Jul 5, 2013
  9. Jul 5, 2013 #8


    User Avatar
    Science Advisor

    But does the breast cancer example have anything to do with Frequentist versus Bayesian ideas? If Number of people with cancer / Total number of people ≈ 0.01, then the prior is "objective", and doesn't necessarily require a Bayesian "subjective" interpretation.

    Most physicists are not "true" Frequentists or Bayesians. A short discussion on both concepts written by a physicist is http://pdg.lbl.gov/2012/reviews/rpp2012-rev-statistics.pdf, and a different group of physicists did their analysis both ways in http://arxiv.org/abs/astro-ph/9812133 . Concerning Bayesian analyses, the former remarks "Bayesian statistics supplies no unique rule for determining the prior ... this reflects the experimenter’s subjective degree of belief (or state of knowledge) ... before the measurement was carried out. For the result to be of value to the broader community, whose members may not share these beliefs, it is important to carry out a sensitivity analysis, that is, to show how the result changes under a reasonable variation of the prior probabilities."

    Usually a theory is accepted as useful as it makes more and more successful predictions, regardless of past confidence intervals. In principle, one can use Bayesian updating to incorporate all the additional data, but I'm not aware of anyone who really has an estimate of the probability that Maxwell's equations are correct.
    Last edited: Jul 5, 2013
  10. Jul 5, 2013 #9
    But I'm a little suspicious that even that minority would be able to resist the temptation. In any case, they would secretly know that if they choose the second interval, they can be all but sure they will leave with $50 000 in their pocket, whereas if they choose any of the other intervals, they will get nothing. From my understanding so far, it seems like there is no rational way to justify this certainty if one sticks to the frequentist philosophy. But perhaps I'm missing something.

    This is a bit off-topic but I'm wondering how many people would continue using frequentist statistical methods if they found out that the answers they get don't actually answer the questions they ask.

    If they agree that Bayes' theorem is a theorem of probability theory, shouldn't they also agree that one can assign probabilities to hypotheses? Or is it that they agree with this but just refuse to do it for philosophical reasons?

    Isn't it possible to justify that I picked it at random with the principle of indifference?
  11. Jul 5, 2013 #10
    Thank you for the links! I was actually curious if there was a discussion like this from the perspective of physics, so it sounds like a really interesting read.

    Do physicists ever use p-values and the like? Clearly, the most important theories were developed without any reference to them (and I don't think there was any need for that either). I am not a physicist and haven't really read any papers from physics journals, so can one find t-tests and F-tests in the average physics publication?
  12. Jul 5, 2013 #11


    User Avatar
    Science Advisor

    Yes, p-values are used in physics. For example, http://arxiv.org/abs/1207.7235 and http://arxiv.org/abs/1207.7214 talk about "statistical significance". Fig. 13 of the former shows the data compared to the expected given their null hypothesis.
  13. Jul 5, 2013 #12
    You misunderstand the use of the term "subjective". The terms "objective probability" and "subjectively probability" are used to describe philosophical positions on the nature of probabilistic inference; it doesn't mean that one party doesn't "believe" in the use of Bayes' theorem. Bayes' theorem is a theorem; it's true provided that you accept the Kolmogorov axioms of probability. It relates the probability of some number of occurences to their conditional probabilities; it's going to come up when you start talking about conditional probability. In particular, it comes up whenever you ask a research questions of the form "What is the probability that X hypothesis is true, given my data?" (frequentist methods, in general, don't asks these kinds of questions).

    The breast cancer example above requires the use of Bayes' theorem; there's no getting around it. In general, any time you start talking about conditional probabilities, you're going to run into Bayes' theorem pretty quickly.

    What's funny to me is that most researchers will happily assume that their data are normal, that the variances of all their groups are equal, and that all of their measurements are independent, etc. But assume a little bit of information about the prior probability of an outcome, and all of a sudden you're not doing "objective science" -- you're assuming too much.
  14. Jul 5, 2013 #13

    Stephen Tashi

    User Avatar
    Science Advisor

    Well, that's an issue about the statistics of human behaviors, not about mathematics.

    That's another question about human behavior. As I've said, the common misinterpretation of a confidence interval amounts to assuming a uniform Bayesian prior. So if you take the Bayesian point of view, the common misinterpretation of the frequentist result is the correct answer. It's just the way they arrive at it that's wrong.

    I haven't read extensively about the history of statistics, but my impression is that widespread application of it is a relatively new development in human history. I think it got off the ground in th 1800's in England. The early practitioners would have sought to imitate the "purely objective" approach of other sciences. That wouldn't make the assignment of prior probability distributions a suspicious practice. I suppose different frequentists have different philosophical objections to Bayesian priors.

    Can you state that principle as an assumption or theorem of probability theory?

    Think of concrete examples. Suppose the mean [itex]\mu [/itex] of a random variable X is at a "definite but unknown" location. Suppose the X is uniformaly distributed on the the interval [itex] [\mu - 1/2, \mu + 1/2] [/itex]. Generate an interval by the following random process. Pick the left endpoint [itex] a [/itex] of the interval at random from the distribution of X and let the interval be [itex] [a,\ a+1/4] [/itex].

    You can answer the question "What is the probability that the randomly selected interval contains the mean?" since this only involves knowing the distribution of X relative to the unknown mean [itex]\mu [/itex].

    But can you answer the question "What is the probability that the interval [itex][0.25, 0.50][/itex] contains the mean?"? If you could answer that without knowing a definite location for the mean, the answer would be the same number regardless of where the mean is located.
  15. Jul 5, 2013 #14


    User Avatar
    Science Advisor

    No, you assume Frequentists reject Bayes's theorem. That is untrue.
  16. Jul 5, 2013 #15
    What? When have I ever said such a ridiculous thing?
  17. Jul 5, 2013 #16


    User Avatar
    Science Advisor

    Good, so maybe we agree, ie. that both Frequentists and Bayesians accept Bayes's theorem?
  18. Jul 5, 2013 #17
    What does this have to do with the discussion in this thread? When has anyone suggested that Frequentists somehow don't acknowledge Bayes' theorem?
  19. Jul 5, 2013 #18


    User Avatar
    Science Advisor

    Here I thought you were trying to give an example in which Frequentist statistics fails:

  20. Jul 6, 2013 #19


    User Avatar
    2017 Award

    Staff: Mentor

    It's like rolling a die. In advance, you don't know which result will show up - but once you looked at the result, you know it, and the probability that you got a 6 is either 0 or 1.

    They do, but they laugh at 0.05, and often the p-values are translated to standard deviations of a (sometimes hypothetical) gaussian distribution. At least in particle physics, everything below 3σ (p<0.0014 single-sided) is not considered as significant, and it requires 5σ (p<0.0000003) to call it "observation".

    At least in astronomy and particle physics, you can do that.

    Bayes' theorem is a mathematical theorem, rejecting it is pointless. I think there are no pure frequentists and bayesians (as scientists) - just something in between, maybe with a tendency towards one side.
    You have to make decisions beyond frequentist statistics on a daily basis. Should you carry an umbrella around? Either it will rain or it will not today - but you need some estimate about the subjective probability to make a decision.
  21. Jul 6, 2013 #20

    Stephen Tashi

    User Avatar
    Science Advisor

    One can debate whether probabilities exist as a part of physical reality. Whether anything collapses in any physical sense can be submerged in that debate!

    Putting the question of physical reality aside, looking at the result changes only changes the opinion of an observer who consistently applies the laws of probability.

    An interesting aspect of probability theory is that getting additional information can seem to destroy what you know about a situation.

    For example, thing a set of outcomes that are numbered by the integers 1,2...,20. (Don't assume all outcomes have the same probability). Let A be the set {1,2,3,4,5,6,7,8,9,10}, Let B be the set {6,7,8,9,10,11,12,13,14,15}.

    Suppose we are given that P(A|B) = 0.35. Does it help to be given more information?

    For example what is P(A| Given the elements in B that are odd numbers)?

    Or what is P(A| {6,15})?

    Adding information to the "given" part makes the question more specialized.

    As another example, suppose you have a process that generates 95% confidence intervals for the mean of a continuous distribution. Suppose your process generates an interval whose left hand endpoint has a "2" immediately after the decimal point. What is the probability that the interval contains the mean?

    I don't think you can answer that question. You don't have to be given the whole interval to be cast into ignorance.
    Last edited: Jul 6, 2013
  22. Jul 7, 2013 #21
    The thing is that, I actually tried making a philosophical point about the statistical method itself. Clearly, it is irrational NOT to consider the second interval as much more likely than the other 2. But if the theory underlying frequentist statistics says otherwise, there must be something wrong with it, it seems to me.

    Do I understand correctly that the conclusion I reached in the OP would be true, as long as I assume a uniform prior for H0?

    I basically view it as a lack of assumption.

    I am not sure I understand the example correctly. Do we know that X is uniformly distributed with some mean μ when doing the analysis? If so, then I like the conclusion that whatever the true mean actually is, it will have the same probability of being in the interval, since the interval we generated with that process is dependent on μ's location.
  23. Jul 7, 2013 #22
    But once I've seen the interval, I still don't know what the mean is, so the analogy isn't exactly correct. In my opinion the coin example I gave exactly captures the CI situation. Before I tossed a coin, what is the probability that it will come up heads? It's 0.5 After I toss it, but before I see the result, what's the probability that it came up heads? Still 0.5 result.

    In both cases I know that the result was obtained but don't know what it is. Sure, I know what the CI interval is, but I actually care about estimating the mean which is still as unknown as it was before I generated the CI.

    To be honest, I don't think the problem with NHST is the 0.05 convention. I've seen plenty of 0.000001 results which are in no way more interesting. From the conversation here I am left with the impression that the approach has more serious philosophical problems.

    But I'm just curious, how widespread is the use of p-values in physics? Is it really as widely used as in the social sciences? Somehow, I always thought that it wasn't.
  24. Jul 7, 2013 #23
    I certainly agree, but that happens according to the result. Meaning, depending on what the result actually is, we can increase, decrease, or not change our probability. But this isn't the case with the CI problem. Regardless of what the interval actually is, we cannot say that it has a probability of X to contain the mean, but we at the same time we know that X of the generated intervals will contain the mean. I just can't see the disconnect.
  25. Jul 7, 2013 #24


    User Avatar
    2017 Award

    Staff: Mentor

    You can use the multi-quote button to quote multiple posts in a single reply: [Broken]

    That is in agreement with my post. I considered the situation after you looked.

    If the maths has been done right, and there were not thousands of different possible studies, it shows at least that the null hypothesis is probably wrong.
    In social sciences, I can imagine that this can be pointless - if the null hypothesis assumes no correlation, for example, as there is always some correlation.
    In physics, on the other hand, the null hypothesis is usually "the current laws of physics". Any deviation from that is extremely interesting.

    For the discovery of new particles or for deviations from the Standard Model, it is used frequently, together with confidence levels for the results. If there is no theory value for a comparison, it does not make sense to give a p-value, of course.
    Exoplanet searches use it, too.
    In other areas, I don't see it so frequently, but there I read fewer publications.
    Last edited by a moderator: May 6, 2017
  26. Jul 7, 2013 #25

    Stephen Tashi

    User Avatar
    Science Advisor

    Your feeling about the matter is shared by many Bayesians, but there is nothing mathematically unsound about the frequentist analysis of the problem once you accept the frequentist way of posing the problem. If you don't accept that way of posing the problem and prefer another, that's matter of philosophy and personal taste.

    For "HO?". I'm not talking about hypothesis testing. I'm talking about estimation. For many commonly used distributions, such as the Gaussian, suppose you assume a prior distribution for the mean that is uniform over "all plausible values" (e.g. if the population is people's heights, you could assume a uniform prior over 0 to 10 ft. (There is no uniform distribution from minus infinity to infinity )). Suppose you compute a 95% confidence interval for the mean based on a sample of size N. Then suppose you take a sample of size N. The Bayesian posterior probability that the confidence interval containing that particular sample mean also contains the population mean will be approximately 0.95 if the sample mean was within the plausible range of values you used.

    I don't claim the probability of the population mean being in the interval is precisely 0.95. It's approximately 0.95. - close enough so that people who reason by the wrong interpretation of frequentist satistics are getting approximately the correct answer from a Bayesian way of setting up the problem.


    You might like the conclusion but it is mathematically incorrect.

    You're claiming that if C is a proper subset of B then P(A|B) = P(A|C) provided we select an outcome from a probability distribution defined on B. Think of a probability distribution on the elements of B. Pick an outcome at random from that distribution. (This process "generates" the sample.). The probability that the process generated an outcome that is in A is P(A|B). This doesn't imply that if the process happens to pick an outcome in C that P(A|C) = P(A|B).

    Let B be the set of confidence itervals and let A be the set of confidence intervals that contain the population mean and assume P(A|B) = 0.95. Now take some proper subset C of the confidence intervals (such as one particular interval, or the intervals where the left endpoint of the interval has a "2" immediately after the decimal point). You are making the claim that P(A|C) = P(A|B).
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook