Badly worded probability question?

kenewbie · Oct 31, 2008

Homework Statement

The probability that a random person has disease A is 0.04. The probability that the person has disease B is 0.05 The probability that he has both disease A and B is 0.002.

Are the two diseases independent?

The Attempt at a Solution

I can solve this, that is not the problem. What bothers me is that my book states _as a rule_ that if P(A [intersection] B) = P(A) * P(B) then A and B are independent of each other.

This sounds mindboggingly wrong to me. You cannot simply pick 100.000 people, tally how many that has AIDS and how many have the sniffles, and if the product happens to agree with the intersection then lo and behold, they are dependent on each other!

What am I missing here?

k

Avodyne · Oct 31, 2008

There is not enough information to tell. You would also have to know the probabilities that a random person has disease A and NOT disease B, and vice versa.

And, the question gives you the probabilities, which strictly speaking have to come from an infinite sample size. There is no way to know the actual probabilities from a finite sample.

kenewbie · Oct 31, 2008

Avodyne said:

There is not enough information to tell. You would also have to know the probabilities that a random person has disease A and NOT disease B, and vice versa.

Hm? I don't quite follow. The probability that a person has A and not B would be .038, since there is a .002 intersection between the two.

k

statdad · Nov 1, 2008

These are independent.

There are two (equivalent) ways to check for independence of events. If

[tex] \Pr(A \mid B) = \Pr(A)[/tex]

then the events [tex]A, B[/tex] are independent.

This condition is equivalent to the one you give:

[tex] \Pr(A \cap B) = \Pr(A) \cdot \Pr(B)[/tex]

Both your condition and the condition I gave above are satisfied by the numbers the OP gave.

kenewbie · Nov 1, 2008

Allow me to restate: I know the formulas, I know how to solve this. What I am interested in the scope in which the formula is valid. I have my doubts that if you count the occurrence of two diseases in a population and find the intersection of the two, that the simple check would be valid medical proof of the diseases being dependent on each other or not.

k

statdad · Nov 1, 2008

"Allow me to restate: I know the formulas, I know how to solve this. What I am interested in the scope in which the formula is valid. I have my doubts that if you count the occurrence of two diseases in a population and find the intersection of the two, that the simple check would be valid medical proof of the diseases being dependent on each other or not. "

First, I never implied you didn't know how to use the formulas. I did misunderstand the focus of your question: I took it to mean you didn't understand the probability interpretation.

The use of the word "independent" in these problems refers only to the statistical/probabilistic interpretation, which is what the calculations address. I do not know whether this carries to a medical interpretation of the type you are questioning.
However, in studies, when different phenomenon are investigated, if we have proof of statistical independence, as here, that is taken as implying a lack of interaction between the two.

kenewbie · Nov 2, 2008

statdad;1938836 The use of the word "independent" in these problems refers only to the statistical/probabilistic interpretation said:

See, THAT is what I should have been asking; what does the word independent infer in this context. Thanks for the reply. I'm still sort of thrown that such a simple test has meaning outside of trivial or special cases, but I guess I have to live with this until I get a better understanding of statistics in general.

Thanks again.

k

borgwal · Nov 2, 2008

Avodyne said:

There is not enough information to tell. You would also have to know the probabilities that a random person has disease A and NOT disease B, and vice versa.

That probability IS given of course.

And, the question gives you the probabilities, which strictly speaking have to come from an infinite sample size. There is no way to know the actual probabilities from a finite sample.

Probabilities certainly don't have to come from an infinite sample. I thought that idea was refuted in the 50s or 60s.

borgwal · Nov 2, 2008

if you would want to show there IS some dependence between having disease A and disease B, then you'd agree you'd need to *violate* the "independence" condition, right?
That's all that is meant.

D H · Nov 2, 2008

kenewbie said:

Allow me to restate: I know the formulas, I know how to solve this. What I am interested in the scope in which the formula is valid. I have my doubts that if you count the occurrence of two diseases in a population and find the intersection of the two, that the simple check would be valid medical proof of the diseases being dependent on each other or not.

You are correct. This is not what researchers do. The problem in the original post is an artificial problem for what appears to be an introductory statistics class. You were given the true probabilities. On the other hand, when you compute the frequency of some disease in a population of 1,000 (or 100,000, or whatever), you are arriving at an estimate of the true probability. Suppose in a sample of 1,000 people, 43 have disease A, 54 have disease B, and 3 have both disease A and B. Are the diseases independent? The simple test says they are not independent since 0.043*0.054=0.002322 rather than 0.003.

What researchers do instead is use various statistical tests of independence, the most common being the chi squared test. In my artificial example, one cannot say with a reasonable degree of certainty that the diseases are not statistically independent. Note the double negative. That was intentional. Researchers develop a "null hypothesis" (diseases A and B are statistically independent) and an "alternate hypothesis" (diseases A and B are correlated). The statistical tests indicate whether one should reject the null hypothesis on the basis of it being incredibly unlikely.

Several problems can arise in doing this kind of analysis. Systematic errors in the collection process make make the gathered statistics suspect. Even after removing these, there is a chance that the researcher rejected the null hypothesis when the null hypothesis was in fact true (a type I error) or accepted the null hypothesis when the null hypothesis was in fact false (a type II error).

kenewbie · Nov 3, 2008

Thanks a lot for clarifying DH.

I think my book is doing a very bad job at disclaiming these formulas. In fact they are doing flat out misguiding the way they explain them. Then again it is not from a statistics-book, rather a general math sort of thing. I guess I should get a more narrow focused source to get better information.

k

Badly worded probability question?

Homework Help Overview

Discussion Character

Approaches and Questions Raised

Discussion Status

Contextual Notes

Homework Statement

The Attempt at a Solution

Similar threads

The optimal way of dividing the bet three ways

"Critical" Triangle Problem

What does "compute Aut(G)" mean?

Hedging on a weather prediction

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect