Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Multiplication of conditional probability with several variables

  1. Aug 6, 2014 #1
    Dear All,

    I am a starter to machine learning and i am currently confused about the following problem:

    what is the result of P(X|Y)P(Y|Z)?
    In my book, it is written to be P(X|Z). But I don't think it is correct since
    P(X|Z)= P(X|Y,Z)P(Y|Z)
    But clearly P(X|Y)=/= P(X|Y,Z)

    Assuming all Events are not independent.

    I have simplified the problem in the above equation. The true equation is
    p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

    Any helps and ideas will be very appreciated.
  2. jcsd
  3. Aug 6, 2014 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    Are you saying the above is given as a special condition in the problem?

    Or did you mean [itex] P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) [/itex] ?
  4. Aug 6, 2014 #3

    yes you are correct.
    what I mean is P(x,y|z)=P(x|y,z)P(y|z)
  5. Aug 6, 2014 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.
  6. Aug 7, 2014 #5
    It is in the introduction chapter of the book and is talking about polynomial curve fitting.
    X,T refer to a training set while t refers to the predicted point at position x
    W refers to the set of parameters of M-order polynomial, that
    y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

    it claims the following equation for the prediction of t with help of the training set and position x
    p(t|x, X, T) =[itex]\int [/itex] p(t|x,w)p(w|X, T) dw

    that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
    But I believe that p(t|x,w)=/= p(t|x,w,X, T)

    If it is not clear enough, i can explain more
  7. Aug 7, 2014 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
    p( X,k)
    to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.
  8. Aug 7, 2014 #7
    Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
    IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

    I am not sure when it comes to my case.
    In my case, the notation p(t|x, X, T)means
    given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.
  9. Aug 7, 2014 #8

    Stephen Tashi

    User Avatar
    Science Advisor

    The fact that a p(....) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

    One way to make sense of:

    [itex] p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw [/itex]

    is to consider [itex] X,T [/itex] to be ordinary variables, not random variables. So within the equation [itex] X,T [/itex] can be treated as if they have some constant value.

    The random variable [itex] t [/itex] is a function only of the random variables [itex] x [/itex] and [itex] w [/itex]
    (i.e [itex] t = w_0 + w_1x + ... w_n x^n [/itex]). So the notation [itex] p(t|x,w) [/itex] means the same thing as [itex] p(t|x,w,X,T) [/itex] because [itex] t [/itex] has no random variation due to [itex] X, T [/itex].

    But by that interpretation, the author could have written [itex] p(w | X,T) [/itex] as [itex] p(w) [/itex]. I supposed he needed to mention [itex] X, T [/itex] somewhere on the right hand side.

    Leaving [itex] X,T [/itex] unmentioned, it isn't controversial that

    [itex] p(t|x) = \int p(t|x,w) p(w) dw [/itex]

    or, mentioning them everywhere, that

    [itex] p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw [/itex]
  10. Aug 8, 2014 #9
    Thanks so much.I may try to proceed in this direction and see if anything weird occur again.
  11. Aug 8, 2014 #10
    I have another question.
    if the above equations are needed to be considered with the following equation.
    p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
    α, β are fixed.

    The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
    So X,T are random variables. Right?
    In the book, it mentions that p(w|X,T) in the integral will be given by (a)
  12. Aug 8, 2014 #11

    Stephen Tashi

    User Avatar
    Science Advisor

    It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
    A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

    An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook