Multiplication of conditional probability with several variables

Ronald_Ku · Aug 6, 2014

Dear All,

I am a starter to machine learning and i am currently confused about the following problem:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z). But I don't think it is correct since
P(X|Z)= P(X|Y,Z)P(Y|Z)
But clearly P(X|Y)=/= P(X|Y,Z)

Assuming all Events are not independent.

I have simplified the problem in the above equation. The true equation is
p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

Any helps and ideas will be very appreciated.

Stephen Tashi · Aug 6, 2014

Ronald_Ku said:

since
P(X|Z)= P(X|Y,Z)P(Y|Z)

Are you saying the above is given as a special condition in the problem?

Or did you mean [itex]P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z)[/itex] ?

Ronald_Ku · Aug 6, 2014

Stephen Tashi said:

Are you saying the above is given as a special condition in the problem?

Or did you mean [itex]P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z)[/itex] ?

yes you are correct.
what I mean is P(x,y|z)=P(x|y,z)P(y|z)

Stephen Tashi · Aug 6, 2014

Ronald_Ku said:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z)

I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.

Ronald_Ku · Aug 7, 2014

It is in the introduction chapter of the book and is talking about polynomial curve fitting.
X,T refer to a training set while t refers to the predicted point at position x
W refers to the set of parameters of M-order polynomial, that
y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

it claims the following equation for the prediction of t with help of the training set and position x
p(t|x, X, T) =[itex]\int[/itex] p(t|x,w)p(w|X, T) dw

that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
But I believe that p(t|x,w)=/= p(t|x,w,X, T)

If it is not clear enough, i can explain more

Stephen Tashi · Aug 7, 2014

Ronald_Ku said:

p(t|x, X, T) =[itex]\int[/itex] p(t|x,w)p(w|X, T) dw

To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
p( X,k)
to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.

Ronald_Ku · Aug 7, 2014

Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

I am not sure when it comes to my case.
In my case, the notation p(t|x, X, T)means
given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.

Stephen Tashi · Aug 7, 2014

The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

[itex]p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw[/itex]

is to consider [itex]X,T[/itex] to be ordinary variables, not random variables. So within the equation [itex]X,T[/itex] can be treated as if they have some constant value.

The random variable [itex]t[/itex] is a function only of the random variables [itex]x[/itex] and [itex]w[/itex]
(i.e [itex]t = w_0 + w_1x + ... w_n x^n[/itex]). So the notation [itex]p(t|x,w)[/itex] means the same thing as [itex]p(t|x,w,X,T)[/itex] because [itex]t[/itex] has no random variation due to [itex]X, T[/itex].

But by that interpretation, the author could have written [itex]p(w | X,T)[/itex] as [itex]p(w)[/itex]. I supposed he needed to mention [itex]X, T[/itex] somewhere on the right hand side.

Leaving [itex]X,T[/itex] unmentioned, it isn't controversial that

[itex]p(t|x) = \int p(t|x,w) p(w) dw[/itex]

or, mentioning them everywhere, that

[itex]p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw[/itex]

Ronald_Ku · Aug 8, 2014

Stephen Tashi said:

The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

[itex]p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw[/itex]

is to consider [itex]X,T[/itex] to be ordinary variables, not random variables. So within the equation [itex]X,T[/itex] can be treated as if they have some constant value.

The random variable [itex]t[/itex] is a function only of the random variables [itex]x[/itex] and [itex]w[/itex]
(i.e [itex]t = w_0 + w_1x + ... w_n x^n[/itex]). So the notation [itex]p(t|x,w)[/itex] means the same thing as [itex]p(t|x,w,X,T)[/itex] because [itex]t[/itex] has no random variation due to [itex]X, T[/itex].

But by that interpretation, the author could have written [itex]p(w | X,T)[/itex] as [itex]p(w)[/itex]. I supposed he needed to mention [itex]X, T[/itex] somewhere on the right hand side.

Leaving [itex]X,T[/itex] unmentioned, it isn't controversial that

[itex]p(t|x) = \int p(t|x,w) p(w) dw[/itex]

or, mentioning them everywhere, that

[itex]p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw[/itex]

Thanks so much.I may try to proceed in this direction and see if anything weird occur again.

Ronald_Ku · Aug 8, 2014

I have another question.
if the above equations are needed to be considered with the following equation.
p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
α, β are fixed.

The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

Stephen Tashi · Aug 8, 2014

Ronald_Ku said:

So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)

Multiplication of conditional probability with several variables

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect