# Statistics - Confidence interval

1. May 22, 2014

### nossren

1. The problem statement, all variables and given/known data
Suppose you have a bucket containing a lot of balls with different colors. You randomly pick 50 balls, 9 of which are red (X = 9, where X ~ N(μ, σ²)). The probability of picking a red ball is 15%. From this you want to construct a 95% confidence interval for the standard deviation σ and do a hypothesis test.
\begin{align} X &= 9 \\ \mu &= 7.5 \\ \sigma^* & \approx 0.581 \\ \alpha &= 0.05 \\ H_0: \sigma &= \sigma^* \\ H_1: \sigma &\neq \sigma^* \end{align}
2. Relevant equations
\begin{align} V(X) &= E[(X-\mu)^2] \\ D(X) &= \sqrt{V(X)} \\ \end{align}
3. The attempt at a solution
The expected amount of red balls per 50 balls, μ, ought to be 0.15*50 = 7.5. I estimated σ as σ* (above) to obtain a null hypothesis to test. Then I tried using a reference variable $R = \frac{X-\mu}{\sigma}\ \tilde\ \ N(0,1)$ and putting
$$1-\alpha = P(-\lambda_{\alpha/2} < R < \lambda_{\alpha/2}) = P(-1.96 < \frac{X-\mu}{\sigma} < 1.96) \Rightarrow I = \left(\frac{X-\mu}{\sigma} \pm 1.96\right)$$
but this doesn't seem to make any sense. Is there another reference variable/distribution I can use? I tried t-distribution, but it leads to division by 0 due to the N-1 denominator in the sample standard deviation.

Last edited: May 22, 2014
2. May 22, 2014

### Ray Vickson

Where do you get the value $\sigma^* \doteq 0.581?$ This is wrong.

3. May 22, 2014

### nossren

I redid the calculation using the definition
$$\sqrt{V(X)} = \sqrt{\sum_k (k-\mu)^2p(k)} = \sqrt{(9-7.5)^2\cdot 0.149} \approx 0.579$$

4. May 22, 2014

### Ray Vickson

If you use the binomial distribution for $X$ there is a standard formula for the variance---look it up. It gives results much different from yours.

5. May 22, 2014

### nossren

The variance for $X$ is then, according to my book, $V(X) = nqp = 50\cdot(1-0.15)\cdot0.15$. How can I justify going from N to Bin?

edit: p was supposed to be 0.15, mixed it up with another exercise

Last edited: May 22, 2014
6. May 22, 2014

### Ray Vickson

Justification depends on the "model". When the problem states that the probability of drawing a red is 15% (without giving other details) you more-or-less have to assume that the same 15% applies to the first, second, third,..., 50th balls. Then, if the drawings are independent, you get the Binomial distribution for sure.

However, if the 15% figure really means that 15% of the balls are red, then whether or not a binomial is good depends on the size of the ball population. For example, if there are only slightly more than 50 balls altogether, then the initial drawing of some red balls changes the red percentage in later draws, and so you do not get the binomial---instead, you get the so-called hypergeometric distribution. The variance formula is a bit more complicated, and depends explicitly on the total ball population size, N. However, if N is much larger than 50 the binomial distribution is a good approximation---becoming exact in the limit $N \to \infty$. Exactly how large N should be and how good the approximation is can be studied numerically, by comparing the binomial and hypergeometric results.

7. May 22, 2014

### nossren

Yes, the amount of balls in the "bucket" can be assumed to tend towards infinity, therefore the probability is constant. However, what I have learnt is that when you have a sample with distribution N(μ, σ²) you want to construct a reference variable with some distribution $N(0,1),\ t(n-1),\ x^2(n-1)$ (depending on what is given), in order to construct a confidence interval using the quantiles.

The "model" in this case is basically: 50 balls are drawn simultaneously, 9 of them turned out to be red and the red ball mean for 50 balls is 7.5 (expected value). Is it necessary to get into binomial distribution in order to get a confidence interval for σ?

Last edited: May 22, 2014
8. May 22, 2014

### Ray Vickson

If the distribution is binomial you do not need a "confidence interval" for $\sigma$; you just compute it from the formula. After all, if you are entitled to say $\mu = 0.15 \times 50 = 7.5$ you are also entitled to say $\sigma^2 = 0.15 \times 0.85 \times 50 = 6.375$. In fact, for the binomial it makes no sense at all to even speak of a confidence interval for $\sigma$.

It is difficult to see how to make any sense of the question, but one possibility might be to take the hypergeometric case; that is, the bucket contains $N$ balls (where $N \geq 50$ is unknown). Somehow you know that the number $R$ of red balls in the bucket is $R = 0.15 N$; the other $N-R$ balls are not red. You draw $n = 50$ balls (without replacement) from the bucket and observe that $k = 9$ are red. If $X$ = number of reds in the sample, $X$ has a hypergeometric distribution. While the expected value of $X$ is still given by $EX = 0.15 \times 50 = 7.5$, the variance does, in fact, depend on the unknown ball population $N$:
$$\text{Var}(X) = n p (1-p)\, \frac{N-n}{N-1}$$
where $p = R/N = 0.15$. (See, eg., http://en.wikipedia.org/wiki/Hypergeometric_distribution .)
Presumably, you can use the observation $X = 9$ to cook up a maximum-likelihood estimation of $N$ and find some type of probable interval for $N$. Then you could translate that $N$-interval into a $\sigma$-interval. However, that all seems far-fetched to me, and so I continue to be baffled by what on earth the question could possibly mean.