Confidence Interval: Calculating Parameters of Populations

Click For Summary

Discussion Overview

The discussion revolves around the interpretation and calculation of confidence intervals in statistics, particularly focusing on their relationship to population parameters and the assumptions underlying statistical methods. Participants explore the implications of confidence levels, the nature of statistical inference, and the properties of estimators.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants assert that a confidence interval, such as 9.99 ± 0.002, describes a parameter of the true population rather than indicating that 95% of samples will fall within this range.
  • Others clarify that the 95% confidence level means there is a 95% confidence that the interval contains the parameter, not that 95% of samples will fall within this specific interval.
  • One participant discusses the relationship between confidence intervals and statistical significance, noting that certain values can be rejected based on the underlying random process.
  • Concerns are raised about the assumptions of normality in statistical methods, with some arguing that variance formulas do not depend on normality, while others emphasize the importance of distributional assumptions.
  • Participants debate the properties of estimators, particularly the sample variance and standard deviation, discussing biases and the implications of different distributions on these properties.

Areas of Agreement / Disagreement

Participants express differing views on the interpretation of confidence intervals and the assumptions underlying statistical methods. There is no consensus on the implications of normality for variance and standard deviation estimators, and the discussion remains unresolved regarding the nuances of these statistical concepts.

Contextual Notes

Limitations include the potential misunderstanding of statistical inference, the dependence on underlying distribution assumptions, and the unresolved nature of biases in estimators across different distributions.

ktran03
Messages
2
Reaction score
0
when we calculate a value, say
@95%
we get 9.99+- .002

this value is describing a parameter of the true population right?
it is not saying 95% of the time we get samples, it will fall in this range correct?

We never infer anything about statistics, only parameters of populations right?



I'm pretty sure that is correct, I just came across something that interprets this differently.
 
Physics news on Phys.org
The 9.99 \pm .002 is an estimate of some parameter. The 95% is the confidence level , also called the confidence coefficient . We typically say that we have 95% confidence that the given confidence interval contains the parameter .

The construction of one of these intervals is but one use of a procedure that, in the long run, produces result that will fall around the parameter 95% of the time. it does not mean that 95% of the time we take samples anything will fall inside this particular interval.
 
Confidence intervals are closely allied with the concept of statistical significance. Suppose that a finite sample size experiment yields 9.99 as an estimate of some parameter. Whether the true value of that parameter is 9.991, 9.992, 9.993, or 200 thousand, we don't know.

That 200 thousand value can easily be rejected if one has any idea of the underlying random process. That the true parameter value is 200 thousand will not be credible at any reasonable level of significance. How about a true value of 9.993 value? In this case, we can reject the possibility that the true value of the parameter exceeds 9.99+0.002 at the 5% (100%-95%) significance level.
ktran03 said:
We never infer anything about statistics, only parameters of populations right?
I don't know what exactly you mean by this statement. However, we often infer many things about and from statistics. For example, one might ask "are these the right statistics?" Suppose you use the standard set of equations to compute a sample mean and standard deviation. These equations implicitly assume that the underlying data is normally distributed. What if it isn't? Those estimates might be very wrong if the random process is not normal. The collected data gives me some ammo to test whether that assumption is correct. You can infer something about the statistics as well as about the statistics themselves from your collected data.

The purpose of statistical inference is to infer information about and from statistics.
 
The situation is even more complicated: When you repeat the experiment, you will in general get both a different estimate, e.g. 9.97 instead of 9.99 and different confidence interval e.g. 9.97+-0.003 instead of 9.99+-0.002. Of all this different confidence intervals from different experiments, 95% will cover the true value.
 
"These equations implicitly assume that the underlying data is normally distributed. What if it isn't? Those estimates might be very wrong if the random process is not normal. "

? The formula for variance doesn't assume normality at all - many distributions have a variance.

Second - nothing is truly normally distributed - the normal distribution, like every distribution, is an idealized description of population behavior. the "test about the assumption" really indicates whether things differ enough from this idealized behavior to make the assumption of normality a poor one.
 
statdad said:
The formula for variance doesn't assume normality at all - many distributions have a variance.
The simple expression for the sample variance,

s^2 = \frac 1 N \sum_{i=1}^N (x_i-\bar x)

is a biased estimator. Removing the bias leads to

s^2 = \frac 1 {N-1} \sum_{i=1}^N (x_i-\bar x)

But this is still a biased estimator of the standard deviation. The UMVU estimator for one distribution will not be the same as the UMVU estimator for another.
 
s^2 is not an estimator of the standard deviation - you misspoke.

The sample variance (your second version) is unbiased for the population variance regardless of the distribution, as long as second moments exist. When you begin speaking about other properties then distributional assumptions may come forward: but that isn't the original comment: that comment was "the formula for variance doesn't assume normality at all"

Also, remember it isn't simply distributional assumptions: even with normality, if \mu is known s^2 is no longer UMVU for \sigma^2.
 
statdad said:
s^2 is not an estimator of the standard deviation - you misspoke.
Taking the square root of s2 obviously provides an estimate of the standard deviation. Did I really need to spell that out? Taking the square root of this unbiased estimate of the variance leads to a biased estimate of the standard deviation.
 
I understand all this, and I said you simply misspoke. I simply pointed out that

1) The formulae for variance and standard deviation do not depend on the assumption of
normality
2) Properties of those estimators can change when the distributional assumptions
change.

The result you state about the sample standard deviation not being an unbiased estimator of \sigma has nothing to do with normality either: it is because

<br /> E\bigg[\sqrt{s^2}\bigg] \ne \sqrt{E[s^2]}<br />

This isn't a big problem since

<br /> \sqrt{\frac{n-2}2} \frac{\Gamma\left(\frac{n-1}2\right)}{\Gamma\left(\frac n 2\right)} s<br />

is UMVU for \sigma in the normally distributed setting.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K