Can't wrap my head around conditional probability

berdan · Jan 1, 2013

So,I'm studying statistics (for engineers) now,and this is one of them courses that really gives me a headache time after time.
Here for example,I can't seem to get the difference between P(A[itex]\bigcap[/itex]B),and P(A|B).I'l give an example in for of a question given to us in some class.

"70 percent of students know statistics well.The probability that a student who doesn't know statistics well, answers statistics question in exam is 0.2 .
Probability for a student who knows statistics well and answers statistics question correctly is 0.95.

What is the probability of a random student who does know statistics well ,to answer the question wrong?"

So,when they answered the question,they said that A-event of a student who knows statistics well.
B-event that the student answered correctly.

So,what we looking for is P(A[itex]\bigcap[/itex]B^c),as far as I've understood.And yes,in the answer to the question it goes :
P(A[itex]\bigcap[/itex]B^c)=P(B^c|A)*P(A)=0.05*0.7

Why why on Earth why??
And why does P(A[itex]\bigcap[/itex]B^c)=P(A)-P(A[itex]\bigcap[/itex]B) does not work here?

Stephen Tashi · Jan 2, 2013

berdan said:

the difference between P(A[itex]\bigcap[/itex]B),and P(A|B)

I guess you've learned that probability is defined on some sort of "space" of events. It's helpful to think of [itex]P(A \cap B)[/itex] as the probability of an event in the "original space" of a problem and [itex]P(A|B)[/itex] as a probability in a different event space space.

What is the probability of a random student who does know statistics well ,to answer the question wrong?"

Did you quote the question exactly? Note that the question does not ask "What is the probability that a random student will both know statistics well and also answer the question wrong?"

So,when they answered the question,they said that A-event of a student who knows statistics well.
B-event that the student answered correctly.

So,what we looking for is P(A[itex]\bigcap[/itex]B^c

No. [itex]P(A \cap B)[/itex] is an event in the "original" probabiity space, the one that gives probabilities for picking a student at random out of all students, including both those who know statistics well and those who don't.

The question is asking for [itex]P(B^C | A)[/itex]. The event [itex]B^C | A[/itex] is an event in the space of picking a student at random from those who know statistics well. This is not the same probability space as the original space.

The laws relating conditional probability to probability in the "original" space are unusual because they contain probabilities from two different probability spaces. By contrast a law like [itex]P(A \cup B) = P(A) + P(B) - P(A \cap B)[/itex] only contains probabilities in one probability space.

In a manner of speaking, all probabilities are conditional, in the sense that their space of events is defined by certain given information. In the the context of a given problem, one set of information defines the "unconditional" probability space and we dont' bother to annotate its probabilities by [itex]P(A| Info)[/itex] where [itex]Info[/itex] is information stated in the problem. When the problem defines additional probability spaces by adding more information (such as the information that we assume an event [itex]C[/itex] in the original space did definitely happen, it uses the notation [itex]P(A|C)[/itex] to indicate that we have changed the probability space.

Since the "original" probability space is really a conditional probability, it isn't suprising that laws such as [itex]P(A \cup B) = P(A) + P(B) - P(A \cap B)[/itex] apply to other conditional probability spaces. So, for example [itex]P(A \cup B | S) = P(A| S) + P(B|S) - P(A \cap B | S)[/itex].

Stephen Tashi · Jan 2, 2013

berdan said:

yes,in the answer to the question it goes :
P(A[itex]\bigcap[/itex]B^c)=P(B^c|A)*P(A)=0.05*0.7

That is a correct formula for [itex]P(B^c \cap A)[/itex]. It isn't the correct formula for [itex]P(B^c | A)[/itex].

Why why on Earth why??

Are you asking why the formula is correct for [itex]P(B^c \cap A)[/itex] ? Or are you asking why [itex]P(B^c \cap A)[/itex] is the correct way to translate the question? In the way that you phrased the question, I think it is not the correct way to translate the question into symbols.

And why does P(A[itex]\bigcap[/itex]B^c)=P(A)-P(A[itex]\bigcap[/itex]B) does not work here?

It does work. (.7)(.05) does equal (.7) - (.7)(.95).

berdan · Jan 2, 2013

Stephen Tashi said:

Did you quote the question exactly? Note that the question does not ask "What is the probability that a random student will both know statistics well and also answer the question wrong?"

Ok,now as you say it,the question is actually :
"What is the probability that there is a student ,who knows statistics well,and answers the question wrong".
Ok,now I start to see the difference.In this question,they do ask for a student from an original set,P(A [itex]\bigcap[/itex]B^c).Ok,now,I'm getting what you mean.
They didn't said that a student knows probability ,as a given-cause in that case,I would have to look from a different set,and the probability would be P(B^c|A).

Ok,ok,thanks now.It looks like half of answering statistics questions is reading comprehension :)Damn,they didn't bother to explain that to us at all :(.Thanks a bunch ,man!

Preno · Jan 2, 2013

Well, consider the trivial case: P(A given A) is trivially one, whereas P(A and A) is trivially equal to P(A).

Tac-Tics · Jan 2, 2013

berdan said:

So,I'm studying statistics (for engineers) now,and this is one of them courses that really gives me a headache time after time.
Here for example,I can't seem to get the difference between P(A[itex]\bigcap[/itex]B),and P(A|B).I'l give an example in for of a question given to us in some class.

I'm not sure it would be entirely helpful for you, but I'll throw it out here anyway.

Probability admits a certain kind of logic. In logic, you have the operators ⊤ (true) ⊥ (false) ¬ (not) ∧ (and) ∨ (or) and → (implies). These correspond neatly with concepts in probability:

⊤ - An event with 100% certainty
⊥ - An event with 0% certainty.
¬ - The probability of something NOT happening
∧ - The probability of two things happening
∨ - The probability of either thing happening
→ - Conditional probability. (Note the order is reversed with the vertical bar).

The conditional probability law is:

P(B | A) * P(A) = P(A ∩ B)

Which, with a little squinting, corresponds to the logical notion of modus ponens:

(A → B) ∧ A ⇔ A ∧ B

It might not help you develop an intuition, but at least it should help you think of it in a new way.

chiro · Jan 2, 2013

Hey berdan and welcome to the forums.

You can think of conditional probability as a probability where the new universal probability space is now a subset of the true space instead of the actual universal set.

This is all that is going on: you are selecting a "slice" of the expanding universal set and finding a probability of an event relative to a particular subset.

Its a lot easier to draw a Venn diagram to see this in action but this is all conditioning is.

1MileCrash · Jan 3, 2013

I think the way to think of it is that in Pr(A|B), B is the sample space.

It is quite intuitive to see this when looking at the definition:

Pr(A|B) = Pr(A intersect B) / Pr(B)

So, we consider only the instances of A occurring in the case that B also has occurred. We then divide by the probability of B to "adjust it" because our probability is one of saying "B certainly happened, given this, what is the probability that A happened as well."

You could even write any standard probability as a conditional probability. For example, Pr(A) in some sample space S is exactly Pr(A|S). Using the definition for conditional probability,

Pr(A|S) = Pr(A intersect S)/Pr(S)

We know that A intersect S is just A because A is a subset of S. We also know that Pr(S) = 1, so we just have Pr(A). Conditional probability for some other even B is the same notion, we are "moving" from sample space S to "sample space" B.

You also seem to be trying to relate Pr(A|B) to Pr(A intersect B). There are important relations between the two.

If Pr(A intersect B) = Pr(A|B), then it must be the case that:

Pr(A int B) = Pr(A int B)/Pr(B)

But the only solution to this is that Pr(A int B) has a probability of 0, and thus A and B are disjoint events.

There is also another.

If Pr(A|B) = Pr(A),

Then

Pr(A intersect B)/Pr(B) = Pr(A)
Pr(A intersect B) = Pr(A)Pr(B)

This is the definition of independent events. If the probability of A is the same whether or not B occurred then the probability of A intersect B is equal to the product of the probabilities of A and B.

Can't wrap my head around conditional probability

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect