Confidence interval:

Main Question or Discussion Point

when we calculate a value, say
@95%
we get 9.99+- .002

this value is describing a parameter of the true population right?
it is not saying 95% of the time we get samples, it will fall in this range correct?

We never infer anything about statistics, only parameters of populations right?

I'm pretty sure that is correct, I just came across something that interprets this differently.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Homework Helper
The $$9.99 \pm .002$$ is an estimate of some parameter. The 95% is the confidence level , also called the confidence coefficient . We typically say that we have 95% confidence that the given confidence interval contains the parameter .

The construction of one of these intervals is but one use of a procedure that, in the long run, produces result that will fall around the parameter 95% of the time. it does not mean that 95% of the time we take samples anything will fall inside this particular interval.

D H
Staff Emeritus
Confidence intervals are closely allied with the concept of statistical significance. Suppose that a finite sample size experiment yields 9.99 as an estimate of some parameter. Whether the true value of that parameter is 9.991, 9.992, 9.993, or 200 thousand, we don't know.

That 200 thousand value can easily be rejected if one has any idea of the underlying random process. That the true parameter value is 200 thousand will not be credible at any reasonable level of significance. How about a true value of 9.993 value? In this case, we can reject the possibility that the true value of the parameter exceeds 9.99+0.002 at the 5% (100%-95%) significance level.

We never infer anything about statistics, only parameters of populations right?
I don't know what exactly you mean by this statement. However, we often infer many things about and from statistics. For example, one might ask "are these the right statistics?" Suppose you use the standard set of equations to compute a sample mean and standard deviation. These equations implicitly assume that the underlying data is normally distributed. What if it isn't? Those estimates might be very wrong if the random process is not normal. The collected data gives me some ammo to test whether that assumption is correct. You can infer something about the statistics as well as about the statistics themselves from your collected data.

The purpose of statistical inference is to infer information about and from statistics.

DrDu
The situation is even more complicated: When you repeat the experiment, you will in general get both a different estimate, e.g. 9.97 instead of 9.99 and different confidence interval e.g. 9.97+-0.003 instead of 9.99+-0.002. Of all this different confidence intervals from different experiments, 95% will cover the true value.

Homework Helper
"These equations implicitly assume that the underlying data is normally distributed. What if it isn't? Those estimates might be very wrong if the random process is not normal. "

? The formula for variance doesn't assume normality at all - many distributions have a variance.

Second - nothing is truly normally distributed - the normal distribution, like every distribution, is an idealized description of population behavior. the "test about the assumption" really indicates whether things differ enough from this idealized behavior to make the assumption of normality a poor one.

D H
Staff Emeritus
The formula for variance doesn't assume normality at all - many distributions have a variance.
The simple expression for the sample variance,

$$s^2 = \frac 1 N \sum_{i=1}^N (x_i-\bar x)$$

is a biased estimator. Removing the bias leads to

$$s^2 = \frac 1 {N-1} \sum_{i=1}^N (x_i-\bar x)$$

But this is still a biased estimator of the standard deviation. The UMVU estimator for one distribution will not be the same as the UMVU estimator for another.

Homework Helper
$$s^2$$ is not an estimator of the standard deviation - you misspoke.

The sample variance (your second version) is unbiased for the population variance regardless of the distribution, as long as second moments exist. When you begin speaking about other properties then distributional assumptions may come forward: but that isn't the original comment: that comment was "the formula for variance doesn't assume normality at all"

Also, remember it isn't simply distributional assumptions: even with normality, if $$\mu$$ is known $$s^2$$ is no longer UMVU for $$\sigma^2$$.

D H
Staff Emeritus
$$s^2$$ is not an estimator of the standard deviation - you misspoke.
Taking the square root of s2 obviously provides an estimate of the standard deviation. Did I really need to spell that out? Taking the square root of this unbiased estimate of the variance leads to a biased estimate of the standard deviation.

Homework Helper
I understand all this, and I said you simply misspoke. I simply pointed out that

1) The formulae for variance and standard deviation do not depend on the assumption of
normality
2) Properties of those estimators can change when the distributional assumptions
change.

The result you state about the sample standard deviation not being an unbiased estimator of $$\sigma$$ has nothing to do with normality either: it is because

$$E\bigg[\sqrt{s^2}\bigg] \ne \sqrt{E[s^2]}$$

This isn't a big problem since

$$\sqrt{\frac{n-2}2} \frac{\Gamma\left(\frac{n-1}2\right)}{\Gamma\left(\frac n 2\right)} s$$

is UMVU for $$\sigma$$ in the normally distributed setting.