Interpretation of Confidence Intervals and the like

MadRocketSci2 · Apr 12, 2012

Statistics question: I'm having trouble understanding my statistics professor's objection to this interpretation of confidence intervals.

If you have some distribution with parameter x: X = Dist(x), and you perform a random experiment drawing N random variables from it, and derive from those some estimator y related to x, then I *want* to say the following, but my statistics prof insists it's an invalid interpretation:

If you have a space with the (unknown) parameter x and the statistic y, for each x there is a conditional probability distribution for the y that will be yielded from a draw of N variables. This leads to a joint probability distribution linking x and y. Now that you have a value for y, you are in a subspace of the space (y=whatever you got), and there is a distribution of probability for x. The probability that you find x within a certain range relates to this distribution.

My statistics prof objects that x is a specific value, not a random variable. While I understand that x *in fact* is either within or not within the interval, the best you can give with the imperfect information available is a finite probability. You have limited knowledge due to the statistic y, which is more than no knowledge and less than exact knowledge of what the parameter x is.

He insists you cannot make a probability statement about it, and that known confidence intervals don't give you any information about the underlying distribution, only future ones. That x either is or is not within any distribution, and that you cannot say anything about it.

(I already understand the point about a certain proportion of future confidence interval draws containing the parameter).

I don't know why you can't do this. Can anyone try to enlighten me? Any perspectives you have might help me understand the error.

chiro · Apr 12, 2012

MadRocketSci2 said:

Statistics question: I'm having trouble understanding my statistics professor's objection to this interpretation of confidence intervals.

If you have some distribution with parameter x: X = Dist(x), and you perform a random experiment drawing N random variables from it, and derive from those some estimator y related to x, then I *want* to say the following, but my statistics prof insists it's an invalid interpretation:

If you have a space with the (unknown) parameter x and the statistic y, for each x there is a conditional probability distribution for the y that will be yielded from a draw of N variables. This leads to a joint probability distribution linking x and y. Now that you have a value for y, you are in a subspace of the space (y=whatever you got), and there is a distribution of probability for x. The probability that you find x within a certain range relates to this distribution.

My statistics prof objects that x is a specific value, not a random variable. While I understand that x *in fact* is either within or not within the interval, the best you can give with the imperfect information available is a finite probability. You have limited knowledge due to the statistic y, which is more than no knowledge and less than exact knowledge of what the parameter x is.

He insists you cannot make a probability statement about it, and that known confidence intervals don't give you any information about the underlying distribution, only future ones. That x either is or is not within any distribution, and that you cannot say anything about it.

(I already understand the point about a certain proportion of future confidence interval draws containing the parameter).

I don't know why you can't do this. Can anyone try to enlighten me? Any perspectives you have might help me understand the error.

I think your professor is correct.

If you really don't know the distribution (which is what happens in most cases), then you can't really say what the distribution is for an unknown process.

But let's say that you absolutely know that your process is constrained to always be of that distribution, but your parameters are unknown.

If you construct a confidence interval you need to be aware in hypothesis testing that there are four probabilities to consider: P(H1 True| Interval), P(H1 False| Interval), P(H0 True | Interval) and P(H0 False | Interval), where H1 and H0 are alternative and null hypotheses respectively.

The thing is that we can get a false interpetation which means we can't say that the value has to lie in a given interval with so and so probability. It's a very subtle thing and depending on how you say it, some people will interpret it one way and some another but in terms of understanding what I am saying, think about the the four situations above involving H0 and H1 and in particular look at Type I and Type II errors to understand in detail what I am talking about.

Also to understand it a bit clearer, let's say you play a game called the 'trust game'.

It's very simple: you have to figure out after N questions and N observations of checking the answers whether someone is telling the truth.

Now let's say you do it for 100 times. Everything is true. Now 10,000 times. Again all true. Now 1,000,000 times. All true. But then on the 1,000,001th time the person lies. Although the guy has told the truth he is still not entirely truthful and now the premise that the guy is truthful is now false.

This is the kind of thing that you have to be aware of in statistics. Future data will make the conclusion stronger but it won't guarantee it and if you fall into the trap of assuming that with the huge number of data points being 1,000,000 observations above thinking that it says that the guy is truthful, then you are setting yourself up for some very bad interpretations and reasoning.

Stephen Tashi · Apr 13, 2012

MadRocketSci2 said:

If you have a space with the (unknown) parameter x and the statistic y, for each x there is a conditional probability distribution for the y that will be yielded from a draw of N variables. This leads to a joint probability distribution linking x and y.

It doesn't lead to a joint probability distribution of x and y, since you only know a conditional density that tells P(Y = y | X = x). You could get a joint density, if you also had a density that gave P(X = x) unconditioinally. There is a type of statistics that involves assuming such information, which is called a "prior distribution" for X. That type of statistics is Bayesian statistics. The type of statistics that your professor is teaching is "frequentist" statistics and he is correct that they confidence intervals cannot be interpreted as giving a probability that the parameter X is in a particular interval. (The type of interval that does this, in Bayesian statistics, is called a "credible interval".)

Most commonsense people ask questions like "What is the probability that my idea is true given the data?" or "What is the probability that the parameter is in this interval, given the data?". If you can cut through the terminology of frequentist statistics, you find that these questions are never answered Instead you get numbers that quantify the probability of the data, given the assumption that a hypothesis is true or that a parameter is somewhere specific. It's the difference between the probability of A given B versus the probability of B given A.

MadRocketSci2 · Apr 13, 2012

Thanks, Stephen, chiro.

That clears a few things up. I forgot about the prior probability of P(X=x) which is necessary to construct the joint distribution.

Without making some assumption there, you can't get there from here. That's probably what my professor was getting at.

In the specific example we were working through (normal distributions with a known variance, and the sample mean), a uniform prior probability was what I was unconsciously assuming. Now that I think about it, for general statistics and general parameters, this might not always work.

If you were to choose a prior probability distribution to assume, you need a distribution such that integral(f(x,y)dx | y) = 1 for all y. (or sums if discrete) This places constraints on P(X=x), right? Not all prior fx(x) satisfy this, it would seem, and so you aren't completely free in your choice of prior probability? (If a choice meeting this constraint exists in the first place!)

Stephen Tashi · Apr 13, 2012

MadRocketSci2 said:

, a uniform prior probability was what I was unconsciously assuming. Now that I think about it, for general statistics and general parameters, this might not always work.

If you were thinking about a uniform probability over all real numbers, this would never work since there cannot be such a distribution. However! , it is possible to assume a uniform probability over the interval [-L,L] where L is a large number. From that you can compute a credible interval. It is (in your example) also possible to take the limit of this answer (as a function of L) as L approaches infinity. What you get, as I recall, is a Bayesian "credible interval" for the mean that is exactly the same numerical interval as the frequentist "confidence interval" - only now the interval has the interpretation that you want. Of course, taking such a limit raises interesting philosophical questions.

If you were to choose a prior probability distribution to assume, you need a distribution such that integral(f(x,y)dx | y) = 1 for all y. (or sums if discrete) This places constraints on P(X=x), right? Not all prior fx(x) satisfy this, it would seem, and so you aren't completely free in your choice of prior probability?

You'll find that almost any legitimate probability density for X will satisfy that in real life problems. You do need for the density f(y|x) to exist for all x where the prior is non-zero

The question is what prior density can a researcher assume without being accused of "fixing" the outcome of his statistical tests. Sometimes there is actual prior data about X and you just fit a prior distribution to it. People study which mathematical families of prior distributions give results that are easily worked with and use these families. (Look up "conjugate prior distribution".) A more philosophical approach called "The Maximum Entropy Principle" (advocated by William Jaynes, some of whose writings you can find online) is to assume a prior distribution that has maximum entropy subject to other constraints that the researcher knows about X.

Interpretation of Confidence Intervals and the like

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect