# Confidence interval for a cohort

1. Sep 8, 2014

### Mogarrr

1. The problem statement, all variables and given/known data
Latency Time(years): Number of patients
0: 2
1: 6
2: 9
3: 33
4: 49
5: 66
6: 52
7: 37
8: 18
9: 11
10: 4

(I don't know how to make a proper table with latex... tried \being{tabular}{l r} but this doesn't work)

2. Relevant equations

When the variance is unknown, the t-distribution may be used
$$\mu = \bar{x} \pm t_{n,1- \frac {\alpha}2} \cdot \frac {s}{\sqrt {n}}$$

and estimating the variance, we have...

$(n-1) \cdot \frac {s^2}{ \chi^2_{n-1,1- \frac {\alpha}2}} \leq \sigma^2 \leq (n-1) \cdot \frac {s^2}{ \chi^2_{n-1,\frac {\alpha}2}}$

lastly, for the poisson distribution the confidence interval is given by $\mu_1, \mu_2$, that satisfies

$\frac {\alpha}2 = P(X \geq \mu | \mu = \mu_1) = \sum_{k=x}^{\infty} \frac {e^{-\mu_1} \mu_1^{k}}{k!}$

$\frac {\alpha}2 = P(X \leq \mu | \mu = \mu_2) = \sum_{k=0}^{x} \frac {e^{-\mu_2} \mu_2^{k}}{k!}$

3. The attempt at a solution

I'm not really sure how to handle this. I'm used to just once column where I can compute the mean and sample variance. Here I'm asked to compute the mean and variance of the latency time. Since this is a time interval, I think I should be using the Poisson distribution, however it's given that the distribution is normal.

I don't know how to proceed. Any help would be appreciated.

2. Sep 8, 2014

### Staff: Mentor

You can ignore the medical background - it is an interval that always starts at zero, so mean and variance of the interval are the mean and variance of your data. Sure, the values cannot get negative (so it cannot be a perfect gaussian distribution), but that's not important here.

3. Sep 8, 2014

### Mogarrr

Not sure what you mean by a perfect guassian distribution. Stripping away the medical terminology, I still don't know what to do.

Given that I have two columns with data, I picture (perhaps incorrectly) the first column as values for X, and another column as the associated probabilities. Fom there I don't know what to do.

Perhaps I find an interval for $\mu$, contuining the previous tangent, how could I relate this back to the first column. It's not like I know $f^{-1}(x)$.

4. Sep 8, 2014

### Mogarrr

Wait a sec...

Am I just finding a confidence interval for the data in the first column?

5. Sep 9, 2014

### Ray Vickson

Yes, using probabilities estimated from the second column.

6. Sep 9, 2014

### Ray Vickson

Yes, using probabilities estimated from the second column.

7. Sep 9, 2014

### Mogarrr

Well, then I'd use students's t test, but....

How I am supposed to use the probabilities? I thought I could just find $\bar{x}$ and $s^2$ from the 1st column.

8. Sep 9, 2014

### Ray Vickson

You can, if you expand it all out to get 287 sample points, like this:
X = 0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3, ....,9,9,10,10,10,10. Of course there is an easier way, and that is what you need to figure out.

9. Sep 9, 2014

### Mogarrr

Ok, I think I got it now. This is like a table describing the frequency of a distribution.

Computing $s^2$ might take a while.

10. Sep 9, 2014

### Ray Vickson

Not if you think first and calculate later.

11. Sep 9, 2014

### Mogarrr

Thought about, though I already did the calculation in Excel the long way.

Expressing the rows as order pairs $(a_i , b_i)$, then I have...

$\bar{x} = (\sum b_i \cdot a_i)/( \sum b_i)$, and

$s^2 = \frac 1{\sum b_i - 1} \cdot \sum b_i (a_i - \bar{x} )^2$....

That's what I think the easy computation is.