Statistics - Confidence interval

Click For Summary

Homework Help Overview

The problem involves constructing a 95% confidence interval for the standard deviation σ based on a sample of 50 balls drawn from a bucket, where 9 of the balls are red. The context includes statistical concepts such as hypothesis testing and the application of different probability distributions, specifically the normal and binomial distributions.

Discussion Character

  • Exploratory, Assumption checking, Problem interpretation

Approaches and Questions Raised

  • Participants discuss the expected number of red balls and the calculation of the null hypothesis. There are attempts to use a reference variable for constructing confidence intervals, with some questioning the appropriateness of the normal distribution in this context. Others explore the transition from normal to binomial distribution and the implications of sample size on the choice of distribution.

Discussion Status

The discussion is ongoing, with participants providing calculations and questioning the validity of assumptions regarding the distribution of the balls. Some participants have offered alternative perspectives on the use of the binomial distribution and its variance formula, while others are exploring the implications of the sample size on the statistical model.

Contextual Notes

Participants note the potential for confusion regarding the assumptions about the total number of balls in the bucket and how this affects the choice between binomial and hypergeometric distributions. There is also mention of the need for clarity on the definition of the probability of drawing red balls and how it relates to the overall population of balls.

nossren
Messages
23
Reaction score
0

Homework Statement


Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
$$
\begin{align}
X &= 9 \\
\mu &= 7.5 \\
\sigma^* & \approx 0.581 \\
\alpha &= 0.05 \\
H_0: \sigma &= \sigma^* \\
H_1: \sigma &\neq \sigma^*
\end{align}
$$

Homework Equations


$$
\begin{align}
V(X) &= E[(X-\mu)^2] \\
D(X) &= \sqrt{V(X)} \\
\end{align}
$$

The Attempt at a Solution


The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1) and putting
$$
1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)
$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.
 
Last edited:
Physics news on Phys.org
nossren said:

Homework Statement


Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
$$
\begin{align}
X &= 9 \\
\mu &= 7.5 \\
\sigma^* & \approx 0.581 \\
\alpha &= 0.05 \\
H_0: \sigma &= \sigma^* \\
H_1: \sigma &\neq \sigma^*
\end{align}
$$

Homework Equations


$$
\begin{align}
V(X) &= E[(X-\mu)^2] \\
D(X) &= \sqrt{V(X)} \\
\end{align}
$$

The Attempt at a Solution


The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1) and putting
$$
1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)
$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.

Where do you get the value ##\sigma^* \doteq 0.581?## This is wrong.
 
I redid the calculation using the definition
$$
\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579
$$
 
nossren said:
I redid the calculation using the definition
$$
\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579
$$

If you use the binomial distribution for ##X## there is a standard formula for the variance---look it up. It gives results much different from yours.
 
The variance for X is then, according to my book, V(X) = nqp = 50\cdot(1-0.15)\cdot0.15. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise
 
Last edited:
nossren said:
The variance for X is then, according to my book, V(X) = nqp = 50\cdot(1-0.15)\cdot0.15. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise

Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit ##N \to \infty##. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.
 
Ray Vickson said:
Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit ##N \to \infty##. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.
Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learned is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution ##N(0,1),\ t(n-1),\ x^2(n-1)## (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?
 
Last edited:
nossren said:
Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learned is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution ##N(0,1),\ t(n-1),\ x^2(n-1)## (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?

If the distribution is binomial you do not need a "confidence interval" for ##\sigma##; you just compute it from the formula. After all, if you are entitled to say ##\mu = 0.15 \times 50 = 7.5## you are also entitled to say ##\sigma^2 = 0.15 \times 0.85 \times 50 = 6.375##. In fact, for the binomial it makes no sense at all to even speak of a confidence interval for ##\sigma##.

It is difficult to see how to make any sense of the question, but one possibility might be to take the hypergeometric case; that is, the bucket contains ##N## balls (where ##N \geq 50## is unknown). Somehow you know that the number ##R## of red balls in the bucket is ##R = 0.15 N##; the other ##N-R## balls are not red. You draw ##n = 50## balls (without replacement) from the bucket and observe that ##k = 9## are red. If ##X## = number of reds in the sample, ##X## has a hypergeometric distribution. While the expected value of ##X## is still given by ##EX = 0.15 \times 50 = 7.5##, the variance does, in fact, depend on the unknown ball population ##N##:
\text{Var}(X) = n p (1-p)\, \frac{N-n}{N-1}
where ##p = R/N = 0.15##. (See, eg., http://en.wikipedia.org/wiki/Hypergeometric_distribution .)
Presumably, you can use the observation ##X = 9## to cook up a maximum-likelihood estimation of ##N## and find some type of probable interval for ##N##. Then you could translate that ##N##-interval into a ##\sigma##-interval. However, that all seems far-fetched to me, and so I continue to be baffled by what on Earth the question could possibly mean.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
4
Views
2K
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
Replies
2
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 10 ·
Replies
10
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K