Multiplication of conditional probability with several variables

Ronald_Ku · Aug 6, 2014

Dear All,

I am a starter to machine learning and i am currently confused about the following problem:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z). But I don't think it is correct since
P(X|Z)= P(X|Y,Z)P(Y|Z)
But clearly P(X|Y)=/= P(X|Y,Z)

Assuming all Events are not independent.

I have simplified the problem in the above equation. The true equation is
p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

Any helps and ideas will be very appreciated.

Stephen Tashi · Aug 6, 2014

Ronald_Ku said:

since
P(X|Z)= P(X|Y,Z)P(Y|Z)

Are you saying the above is given as a special condition in the problem?

Or did you mean P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) ?

Ronald_Ku · Aug 6, 2014

Stephen Tashi said:

Are you saying the above is given as a special condition in the problem?

Or did you mean P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) ?

yes you are correct.
what I mean is P(x,y|z)=P(x|y,z)P(y|z)

Stephen Tashi · Aug 6, 2014

Ronald_Ku said:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z)

I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.

Ronald_Ku · Aug 7, 2014

It is in the introduction chapter of the book and is talking about polynomial curve fitting.
X,T refer to a training set while t refers to the predicted point at position x
W refers to the set of parameters of M-order polynomial, that
y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

it claims the following equation for the prediction of t with help of the training set and position x
p(t|x, X, T) =\int p(t|x,w)p(w|X, T) dw

that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
But I believe that p(t|x,w)=/= p(t|x,w,X, T)

If it is not clear enough, i can explain more

Stephen Tashi · Aug 7, 2014

Ronald_Ku said:

p(t|x, X, T) =\int p(t|x,w)p(w|X, T) dw

To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
p( X,k)
to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.

Ronald_Ku · Aug 7, 2014

Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

I am not sure when it comes to my case.
In my case, the notation p(t|x, X, T)means
given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.

Stephen Tashi · Aug 7, 2014

The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw

is to consider X,T to be ordinary variables, not random variables. So within the equation X,T can be treated as if they have some constant value.

The random variable t is a function only of the random variables x and w
(i.e t = w_0 + w_1x + ... w_n x^n). So the notation p(t|x,w) means the same thing as p(t|x,w,X,T) because t has no random variation due to X, T.

But by that interpretation, the author could have written p(w | X,T) as p(w). I supposed he needed to mention X, T somewhere on the right hand side.

Leaving X,T unmentioned, it isn't controversial that

p(t|x) = \int p(t|x,w) p(w) dw

or, mentioning them everywhere, that

p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw

Ronald_Ku · Aug 8, 2014

Stephen Tashi said:

The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw

is to consider X,T to be ordinary variables, not random variables. So within the equation X,T can be treated as if they have some constant value.

The random variable t is a function only of the random variables x and w
(i.e t = w_0 + w_1x + ... w_n x^n). So the notation p(t|x,w) means the same thing as p(t|x,w,X,T) because t has no random variation due to X, T.

But by that interpretation, the author could have written p(w | X,T) as p(w). I supposed he needed to mention X, T somewhere on the right hand side.

Leaving X,T unmentioned, it isn't controversial that

p(t|x) = \int p(t|x,w) p(w) dw

or, mentioning them everywhere, that

p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw

Thanks so much.I may try to proceed in this direction and see if anything weird occur again.

Ronald_Ku · Aug 8, 2014

I have another question.
if the above equations are needed to be considered with the following equation.
p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
α, β are fixed.

The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

Stephen Tashi · Aug 8, 2014

Ronald_Ku said:

So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)

Multiplication of conditional probability with several variables

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Stochastic calculus: Ito's lemma and differentials

I Help me understand skewness in QQ-plots please

I Intransitive implication

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem