Confidence interval for a cohort

Click For Summary

Homework Help Overview

The discussion revolves around calculating confidence intervals for the mean and variance of latency times to AIDS among a cohort of hemophiliacs following seroconversion. The problem involves interpreting a frequency distribution table and applying statistical methods, particularly under the assumption of a normal distribution.

Discussion Character

  • Exploratory, Conceptual clarification, Mathematical reasoning

Approaches and Questions Raised

  • Participants explore how to compute the mean and variance from a frequency distribution, questioning the relevance of the medical context. Some express uncertainty about using the Poisson distribution versus the normal distribution. Others discuss the implications of the data structure and how to handle the two columns of data.

Discussion Status

There is an ongoing exploration of how to approach the problem, with some participants suggesting that the confidence interval can be derived directly from the data in the first column, using the second column for probabilities. Multiple interpretations of the data and its implications are being considered, with no explicit consensus reached yet.

Contextual Notes

Participants note the challenge of working with a frequency distribution and the need to calculate sample points from the provided data. There is also mention of prior calculations done in Excel, indicating varying levels of familiarity with statistical methods.

Mogarrr
Messages
120
Reaction score
6

Homework Statement


A cohort of hemophiliacs is followed to elicit information on the distribution of time to onset of
AIDS following seroconversion (referred to as latency time). All patients who seroconvert become
symptomatic within 10 years, according to the distribution in Table 6.11.

Table 6.11 Latency time to AIDS among hemophiliacs who become HIV positive
Latency time (years) Number of patients

Latency Time(years): Number of patients
0: 2
1: 6
2: 9
3: 33
4: 49
5: 66
6: 52
7: 37
8: 18
9: 11
10: 4

(I don't know how to make a proper table with latex... tried \being{tabular}{l r} but this doesn't work)

6.64 Assuming an underlying normal distribution, compute 95% CIs for the mean and variance of
the latency times.

Homework Equations



When the variance is unknown, the t-distribution may be used
\mu = \bar{x} \pm t_{n,1- \frac {\alpha}2} \cdot \frac {s}{\sqrt {n}}

and estimating the variance, we have...

(n-1) \cdot \frac {s^2}{ \chi^2_{n-1,1- \frac {\alpha}2}} \leq \sigma^2 \leq (n-1) \cdot \frac {s^2}{ \chi^2_{n-1,\frac {\alpha}2}}

lastly, for the poisson distribution the confidence interval is given by \mu_1, \mu_2, that satisfies

\frac {\alpha}2 = P(X \geq \mu | \mu = \mu_1) = \sum_{k=x}^{\infty} \frac {e^{-\mu_1} \mu_1^{k}}{k!}

\frac {\alpha}2 = P(X \leq \mu | \mu = \mu_2) = \sum_{k=0}^{x} \frac {e^{-\mu_2} \mu_2^{k}}{k!}

The Attempt at a Solution



I'm not really sure how to handle this. I'm used to just once column where I can compute the mean and sample variance. Here I'm asked to compute the mean and variance of the latency time. Since this is a time interval, I think I should be using the Poisson distribution, however it's given that the distribution is normal.

I don't know how to proceed. Any help would be appreciated.
 
Physics news on Phys.org
You can ignore the medical background - it is an interval that always starts at zero, so mean and variance of the interval are the mean and variance of your data. Sure, the values cannot get negative (so it cannot be a perfect gaussian distribution), but that's not important here.
 
mfb said:
You can ignore the medical background - it is an interval that always starts at zero, so mean and variance of the interval are the mean and variance of your data. Sure, the values cannot get negative (so it cannot be a perfect gaussian distribution), but that's not important here.

Not sure what you mean by a perfect guassian distribution. Stripping away the medical terminology, I still don't know what to do.

Given that I have two columns with data, I picture (perhaps incorrectly) the first column as values for X, and another column as the associated probabilities. Fom there I don't know what to do.

Perhaps I find an interval for \mu, contuining the previous tangent, how could I relate this back to the first column. It's not like I know f^{-1}(x).
 
Wait a sec...

Am I just finding a confidence interval for the data in the first column?
 
Mogarrr said:
Wait a sec...

Am I just finding a confidence interval for the data in the first column?

Yes, using probabilities estimated from the second column.
 
Mogarrr said:
Wait a sec...

Am I just finding a confidence interval for the data in the first column?

Yes, using probabilities estimated from the second column.
 
Ray Vickson said:
Yes, using probabilities estimated from the second column.

Well, then I'd use students's t test, but...

How I am supposed to use the probabilities? I thought I could just find \bar{x} and s^2 from the 1st column.
 
Mogarrr said:
Well, then I'd use students's t test, but...

How I am supposed to use the probabilities? I thought I could just find \bar{x} and s^2 from the 1st column.

You can, if you expand it all out to get 287 sample points, like this:
X = 0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3, ...,9,9,10,10,10,10. Of course there is an easier way, and that is what you need to figure out.
 
  • Like
Likes   Reactions: 1 person
Ok, I think I got it now. This is like a table describing the frequency of a distribution.

Computing s^2 might take a while.
 
  • #10
Mogarrr said:
Ok, I think I got it now. This is like a table describing the frequency of a distribution.

Computing s^2 might take a while.

Not if you think first and calculate later.
 
  • #11
Thought about, though I already did the calculation in Excel the long way.

Expressing the rows as order pairs (a_i , b_i), then I have...

\bar{x} = (\sum b_i \cdot a_i)/( \sum b_i), and

s^2 = \frac 1{\sum b_i - 1} \cdot \sum b_i (a_i - \bar{x} )^2...

That's what I think the easy computation is.
 

Similar threads

  • · Replies 18 ·
Replies
18
Views
2K
  • · Replies 17 ·
Replies
17
Views
2K
Replies
1
Views
4K
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
1
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K