Multiplication of conditional probability with several variables

Ronald_Ku
Messages
17
Reaction score
0
Dear All,

I am a starter to machine learning and i am currently confused about the following problem:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z). But I don't think it is correct since
P(X|Z)= P(X|Y,Z)P(Y|Z)
But clearly P(X|Y)=/= P(X|Y,Z)

Assuming all Events are not independent.

I have simplified the problem in the above equation. The true equation is
p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

Any helps and ideas will be very appreciated.
 
Physics news on Phys.org
Ronald_Ku said:
since
P(X|Z)= P(X|Y,Z)P(Y|Z)

Are you saying the above is given as a special condition in the problem?

Or did you mean P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) ?
 
Stephen Tashi said:
Are you saying the above is given as a special condition in the problem?

Or did you mean P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) ?


yes you are correct.
what I mean is P(x,y|z)=P(x|y,z)P(y|z)
 
Ronald_Ku said:
what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z)

I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.
 
It is in the introduction chapter of the book and is talking about polynomial curve fitting.
X,T refer to a training set while t refers to the predicted point at position x
W refers to the set of parameters of M-order polynomial, that
y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

it claims the following equation for the prediction of t with help of the training set and position x
p(t|x, X, T) =\int p(t|x,w)p(w|X, T) dw

that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
But I believe that p(t|x,w)=/= p(t|x,w,X, T)

If it is not clear enough, i can explain more
 
Ronald_Ku said:
p(t|x, X, T) =\int p(t|x,w)p(w|X, T) dw

To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
p( X,k)
to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.
 
Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

I am not sure when it comes to my case.
In my case, the notation p(t|x, X, T)means
given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.
 
The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw

is to consider X,T to be ordinary variables, not random variables. So within the equation X,T can be treated as if they have some constant value.

The random variable t is a function only of the random variables x and w
(i.e t = w_0 + w_1x + ... w_n x^n). So the notation p(t|x,w) means the same thing as p(t|x,w,X,T) because t has no random variation due to X, T.

But by that interpretation, the author could have written p(w | X,T) as p(w). I supposed he needed to mention X, T somewhere on the right hand side.

Leaving X,T unmentioned, it isn't controversial that

p(t|x) = \int p(t|x,w) p(w) dw

or, mentioning them everywhere, that

p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw
 
  • Like
Likes 1 person
Stephen Tashi said:
The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw

is to consider X,T to be ordinary variables, not random variables. So within the equation X,T can be treated as if they have some constant value.

The random variable t is a function only of the random variables x and w
(i.e t = w_0 + w_1x + ... w_n x^n). So the notation p(t|x,w) means the same thing as p(t|x,w,X,T) because t has no random variation due to X, T.

But by that interpretation, the author could have written p(w | X,T) as p(w). I supposed he needed to mention X, T somewhere on the right hand side.

Leaving X,T unmentioned, it isn't controversial that

p(t|x) = \int p(t|x,w) p(w) dw

or, mentioning them everywhere, that

p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw

Thanks so much.I may try to proceed in this direction and see if anything weird occur again.
 
  • #10
I have another question.
if the above equations are needed to be considered with the following equation.
p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
α, β are fixed.

The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)
 
  • #11
Ronald_Ku said:
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)
 
Back
Top