Statistics - Confidence interval

nossren
Messages
23
Reaction score
0

Homework Statement


Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
$$
\begin{align}
X &= 9 \\
\mu &= 7.5 \\
\sigma^* & \approx 0.581 \\
\alpha &= 0.05 \\
H_0: \sigma &= \sigma^* \\
H_1: \sigma &\neq \sigma^*
\end{align}
$$

Homework Equations


$$
\begin{align}
V(X) &= E[(X-\mu)^2] \\
D(X) &= \sqrt{V(X)} \\
\end{align}
$$

The Attempt at a Solution


The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1) and putting
$$
1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)
$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.
 
Last edited:
Physics news on Phys.org
nossren said:

Homework Statement


Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
$$
\begin{align}
X &= 9 \\
\mu &= 7.5 \\
\sigma^* & \approx 0.581 \\
\alpha &= 0.05 \\
H_0: \sigma &= \sigma^* \\
H_1: \sigma &\neq \sigma^*
\end{align}
$$

Homework Equations


$$
\begin{align}
V(X) &= E[(X-\mu)^2] \\
D(X) &= \sqrt{V(X)} \\
\end{align}
$$

The Attempt at a Solution


The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1) and putting
$$
1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)
$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.

Where do you get the value ##\sigma^* \doteq 0.581?## This is wrong.
 
I redid the calculation using the definition
$$
\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579
$$
 
nossren said:
I redid the calculation using the definition
$$
\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579
$$

If you use the binomial distribution for ##X## there is a standard formula for the variance---look it up. It gives results much different from yours.
 
The variance for X is then, according to my book, V(X) = nqp = 50\cdot(1-0.15)\cdot0.15. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise
 
Last edited:
nossren said:
The variance for X is then, according to my book, V(X) = nqp = 50\cdot(1-0.15)\cdot0.15. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise

Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit ##N \to \infty##. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.
 
Ray Vickson said:
Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit ##N \to \infty##. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.
Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learned is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution ##N(0,1),\ t(n-1),\ x^2(n-1)## (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?
 
Last edited:
nossren said:
Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learned is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution ##N(0,1),\ t(n-1),\ x^2(n-1)## (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?

If the distribution is binomial you do not need a "confidence interval" for ##\sigma##; you just compute it from the formula. After all, if you are entitled to say ##\mu = 0.15 \times 50 = 7.5## you are also entitled to say ##\sigma^2 = 0.15 \times 0.85 \times 50 = 6.375##. In fact, for the binomial it makes no sense at all to even speak of a confidence interval for ##\sigma##.

It is difficult to see how to make any sense of the question, but one possibility might be to take the hypergeometric case; that is, the bucket contains ##N## balls (where ##N \geq 50## is unknown). Somehow you know that the number ##R## of red balls in the bucket is ##R = 0.15 N##; the other ##N-R## balls are not red. You draw ##n = 50## balls (without replacement) from the bucket and observe that ##k = 9## are red. If ##X## = number of reds in the sample, ##X## has a hypergeometric distribution. While the expected value of ##X## is still given by ##EX = 0.15 \times 50 = 7.5##, the variance does, in fact, depend on the unknown ball population ##N##:
\text{Var}(X) = n p (1-p)\, \frac{N-n}{N-1}
where ##p = R/N = 0.15##. (See, eg., http://en.wikipedia.org/wiki/Hypergeometric_distribution .)
Presumably, you can use the observation ##X = 9## to cook up a maximum-likelihood estimation of ##N## and find some type of probable interval for ##N##. Then you could translate that ##N##-interval into a ##\sigma##-interval. However, that all seems far-fetched to me, and so I continue to be baffled by what on Earth the question could possibly mean.
 
Back
Top