1. Oct 31, 2008

### kenewbie

1. The problem statement, all variables and given/known data

The probability that a random person has disease A is 0.04. The probability that the person has disease B is 0.05 The probability that he has both disease A and B is 0.002.

Are the two diseases independent?

3. The attempt at a solution

I can solve this, that is not the problem. What bothers me is that my book states _as a rule_ that if P(A [intersection] B) = P(A) * P(B) then A and B are independent of each other.

This sounds mindboggingly wrong to me. You cannot simply pick 100.000 people, tally how many that has AIDS and how many have the sniffles, and if the product happens to agree with the intersection then lo and behold, they are dependant on eachother!

What am I missing here?

k

Last edited: Oct 31, 2008
2. Oct 31, 2008

### Avodyne

There is not enough information to tell. You would also have to know the probabilities that a random person has disease A and NOT disease B, and vice versa.

And, the question gives you the probabilities, which strictly speaking have to come from an infinite sample size. There is no way to know the actual probabilities from a finite sample.

3. Oct 31, 2008

### kenewbie

Hm? I dont quite follow. The probability that a person has A and not B would be .038, since there is a .002 intersection between the two.

k

4. Nov 1, 2008

These are independent.

There are two (equivalent) ways to check for independence of events. If

$$\Pr(A \mid B) = \Pr(A)$$

then the events $$A, B$$ are independent.

This condition is equivalent to the one you give:

$$\Pr(A \cap B) = \Pr(A) \cdot \Pr(B)$$

Both your condition and the condition I gave above are satisfied by the numbers the OP gave.

5. Nov 1, 2008

### kenewbie

Allow me to restate: I know the formulas, I know how to solve this. What I am interested in the scope in which the formula is valid. I have my doubts that if you count the occurence of two diseases in a population and find the intersection of the two, that the simple check would be valid medical proof of the diseases being dependent on each other or not.

k

6. Nov 1, 2008

"Allow me to restate: I know the formulas, I know how to solve this. What I am interested in the scope in which the formula is valid. I have my doubts that if you count the occurence of two diseases in a population and find the intersection of the two, that the simple check would be valid medical proof of the diseases being dependent on each other or not. "

First, I never implied you didn't know how to use the formulas. I did misunderstand the focus of your question: I took it to mean you didn't understand the probability interpretation.

The use of the word "independent" in these problems refers only to the statistical/probabilistic interpretation, which is what the calculations address. I do not know whether this carries to a medical interpretation of the type you are questioning.
However, in studies, when different phenomenon are investigated, if we have proof of statistical independence, as here, that is taken as implying a lack of interaction between the two.

7. Nov 2, 2008

8. Nov 2, 2008

### borgwal

That probability IS given of course.

Probabilities certainly don't have to come from an infinite sample. I thought that idea was refuted in the 50s or 60s.

9. Nov 2, 2008

### borgwal

if you would want to show there IS some dependence between having disease A and disease B, then you'd agree you'd need to *violate* the "independence" condition, right?
That's all that is meant.

10. Nov 2, 2008

### D H

Staff Emeritus
You are correct. This is not what researchers do. The problem in the original post is an artificial problem for what appears to be an introductory statistics class. You were given the true probabilities. On the other hand, when you compute the frequency of some disease in a population of 1,000 (or 100,000, or whatever), you are arriving at an estimate of the true probability. Suppose in a sample of 1,000 people, 43 have disease A, 54 have disease B, and 3 have both disease A and B. Are the diseases independent? The simple test says they are not independent since 0.043*0.054=0.002322 rather than 0.003.

What researchers do instead is use various statistical tests of independence, the most common being the chi squared test. In my artificial example, one cannot say with a reasonable degree of certainty that the diseases are not statistically independent. Note the double negative. That was intentional. Researchers develop a "null hypothesis" (diseases A and B are statistically independent) and an "alternate hypothesis" (diseases A and B are correlated). The statistical tests indicate whether one should reject the null hypothesis on the basis of it being incredibly unlikely.

Several problems can arise in doing this kind of analysis. Systematic errors in the collection process make make the gathered statistics suspect. Even after removing these, there is a chance that the researcher rejected the null hypothesis when the null hypothesis was in fact true (a type I error) or accepted the null hypothesis when the null hypothesis was in fact false (a type II error).

11. Nov 3, 2008

### kenewbie

Thanks a lot for clarifying DH.

I think my book is doing a very bad job at disclaiming these formulas. In fact they are doing flat out misguiding the way they explain them. Then again it is not from a statistics-book, rather a general math sort of thing. I guess I should get a more narrow focused source to get better information.

k