- #1
la6ki
- 53
- 0
Hi all. I've been thinking about this question a lot for the past few days and it seems to me that I'm committing a mistake somewhere along the way, but certainly can't figure out where. Here's one of the interpretations which I've encountered most frequently and think is the right one (here's the Wiki version):
The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter x% of the time."
Here's what I derive from this statement. Let's say that we set x=95. This means that if I keep sampling from a population whose true mean is μ and I repeat the procedure for obtaining the 95% confidence interval, say, 100 billion times, about 95 billion of those intervals will include μ. Now imagine that you put all those hypothetically calculated 100 billion confidence intervals in a giant imaginary bag. If I know that I'm about to sample from the population and calculate the confidence interval based on the sample mean and the standard error, this is equivalent to randomly drawing one confidence interval from the giant bag. I know that 95% of the confidence intervals inside the bag cover the (unknown to me) population mean μ, so the probability of the confidence interval I randomly picked from the bag covering μ is 95%.
After calculating my 95% CI, I reason like this.
1. Since there is a 95% chance that this interval covers μ, then there is a 95% chance that μ's true value is one of the values inside the CI (call the bolded statement p).
2. The complementary statement Not-p = μ's true value is not one of the values inside the CI must, therefore, have a probability of 5%.
3. Therefore, the probability that any value outside of the CI is the true population mean is at most 5%.
4. Therefore, if the value associated with the null hypothesis lies outside of the CI, we can say there is at most a 5% chance that the null hypothesis is true.
Now, I know the last statement is dead wrong. I'm quite aware of it and I don't need any convincing. But I keep looking back at the logical steps I took and I just can't figure out where I'm making a mistake.
I'm already quite confused, so please only respond if you're an expert in the field and/or you are very confident in your response. Don't respond based only on intuition because that would confuse me even more :)
The confidence interval can be expressed in terms of samples (or repeated samples): "Were this procedure to be repeated on multiple samples, the calculated confidence interval (which would differ for each sample) would encompass the true population parameter x% of the time."
Here's what I derive from this statement. Let's say that we set x=95. This means that if I keep sampling from a population whose true mean is μ and I repeat the procedure for obtaining the 95% confidence interval, say, 100 billion times, about 95 billion of those intervals will include μ. Now imagine that you put all those hypothetically calculated 100 billion confidence intervals in a giant imaginary bag. If I know that I'm about to sample from the population and calculate the confidence interval based on the sample mean and the standard error, this is equivalent to randomly drawing one confidence interval from the giant bag. I know that 95% of the confidence intervals inside the bag cover the (unknown to me) population mean μ, so the probability of the confidence interval I randomly picked from the bag covering μ is 95%.
After calculating my 95% CI, I reason like this.
1. Since there is a 95% chance that this interval covers μ, then there is a 95% chance that μ's true value is one of the values inside the CI (call the bolded statement p).
2. The complementary statement Not-p = μ's true value is not one of the values inside the CI must, therefore, have a probability of 5%.
3. Therefore, the probability that any value outside of the CI is the true population mean is at most 5%.
4. Therefore, if the value associated with the null hypothesis lies outside of the CI, we can say there is at most a 5% chance that the null hypothesis is true.
Now, I know the last statement is dead wrong. I'm quite aware of it and I don't need any convincing. But I keep looking back at the logical steps I took and I just can't figure out where I'm making a mistake.
I'm already quite confused, so please only respond if you're an expert in the field and/or you are very confident in your response. Don't respond based only on intuition because that would confuse me even more :)