# Multiplication of conditional probability with several variables

1. Aug 6, 2014

### Ronald_Ku

Dear All,

I am a starter to machine learning and i am currently confused about the following problem:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z). But I don't think it is correct since
P(X|Z)= P(X|Y,Z)P(Y|Z)
But clearly P(X|Y)=/= P(X|Y,Z)

Assuming all Events are not independent.

I have simplified the problem in the above equation. The true equation is
p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

Any helps and ideas will be very appreciated.

2. Aug 6, 2014

### Stephen Tashi

Are you saying the above is given as a special condition in the problem?

Or did you mean $P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z)$ ?

3. Aug 6, 2014

### Ronald_Ku

yes you are correct.
what I mean is P(x,y|z)=P(x|y,z)P(y|z)

4. Aug 6, 2014

### Stephen Tashi

I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.

5. Aug 7, 2014

### Ronald_Ku

It is in the introduction chapter of the book and is talking about polynomial curve fitting.
X,T refer to a training set while t refers to the predicted point at position x
W refers to the set of parameters of M-order polynomial, that
y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

it claims the following equation for the prediction of t with help of the training set and position x
p(t|x, X, T) =$\int$ p(t|x,w)p(w|X, T) dw

that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
But I believe that p(t|x,w)=/= p(t|x,w,X, T)

If it is not clear enough, i can explain more

6. Aug 7, 2014

### Stephen Tashi

To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
p( X,k)
to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.

7. Aug 7, 2014

### Ronald_Ku

Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

I am not sure when it comes to my case.
In my case, the notation p(t|x, X, T)means
given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.

8. Aug 7, 2014

### Stephen Tashi

The fact that a p(....) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

$p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw$

is to consider $X,T$ to be ordinary variables, not random variables. So within the equation $X,T$ can be treated as if they have some constant value.

The random variable $t$ is a function only of the random variables $x$ and $w$
(i.e $t = w_0 + w_1x + ... w_n x^n$). So the notation $p(t|x,w)$ means the same thing as $p(t|x,w,X,T)$ because $t$ has no random variation due to $X, T$.

But by that interpretation, the author could have written $p(w | X,T)$ as $p(w)$. I supposed he needed to mention $X, T$ somewhere on the right hand side.

Leaving $X,T$ unmentioned, it isn't controversial that

$p(t|x) = \int p(t|x,w) p(w) dw$

or, mentioning them everywhere, that

$p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw$

9. Aug 8, 2014

### Ronald_Ku

Thanks so much.I may try to proceed in this direction and see if anything weird occur again.

10. Aug 8, 2014

### Ronald_Ku

I have another question.
if the above equations are needed to be considered with the following equation.
p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
α, β are fixed.

The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

11. Aug 8, 2014

### Stephen Tashi

It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)