Multiplication of conditional probability with several variables

Click For Summary

Discussion Overview

The discussion revolves around the multiplication of conditional probabilities involving multiple variables, particularly in the context of machine learning and statistical modeling. Participants explore the implications of various probability expressions and their interpretations, referencing a specific equation from Christopher M. Bishop's work on pattern recognition and machine learning.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions the validity of the equation P(X|Y)P(Y|Z) equating to P(X|Z), suggesting that P(X|Y) does not equal P(X|Y,Z) under the assumption of non-independence.
  • Another participant seeks clarification on whether the equation P(X|Z) = P(X|Y,Z)P(Y|Z) is a special condition or a general rule.
  • There is a discussion about the interpretation of the notation p(t|x, X, T) and whether X, T are random variables or parameters, with one participant noting the ambiguity in the author's notation.
  • Some participants express uncertainty about the context in which the variables are defined, particularly regarding their status as random variables or fixed parameters.
  • One participant suggests that the notation p(...) could imply different meanings depending on the context, which could affect the correctness of the equations presented.
  • Another participant emphasizes the need for a verbal explanation to establish context for interpreting the equations correctly.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the interpretations of the probability expressions and the roles of the variables involved. Multiple competing views remain regarding the nature of the variables and the validity of the equations discussed.

Contextual Notes

Participants highlight the need for clarity in the definitions of random variables versus parameters, as well as the implications of the notation used in the equations. There is an acknowledgment that the context of the problem is crucial for accurate interpretation.

Ronald_Ku
Messages
17
Reaction score
0
Dear All,

I am a starter to machine learning and i am currently confused about the following problem:

what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z). But I don't think it is correct since
P(X|Z)= P(X|Y,Z)P(Y|Z)
But clearly P(X|Y)=/= P(X|Y,Z)

Assuming all Events are not independent.

I have simplified the problem in the above equation. The true equation is
p(w|x,t,α,β)proportional to p(t|x,w,β)p(w|α) from pattern recognition and machine learning written by christopher m. bishop.

Any helps and ideas will be very appreciated.
 
Physics news on Phys.org
Ronald_Ku said:
since
P(X|Z)= P(X|Y,Z)P(Y|Z)

Are you saying the above is given as a special condition in the problem?

Or did you mean P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) ?
 
Stephen Tashi said:
Are you saying the above is given as a special condition in the problem?

Or did you mean P( \ ( X \cap Y) | Z\ ) = P(X | \ (Y \cap Z)\ )\ P(Y | Z) ?


yes you are correct.
what I mean is P(x,y|z)=P(x|y,z)P(y|z)
 
Ronald_Ku said:
what is the result of P(X|Y)P(Y|Z)?
In my book, it is written to be P(X|Z)

I don't see why that would be correct. Perhaps you need to explain the entire context for it. I don't have a copy of Bishop's book.
 
It is in the introduction chapter of the book and is talking about polynomial curve fitting.
X,T refer to a training set while t refers to the predicted point at position x
W refers to the set of parameters of M-order polynomial, that
y(x,w) = w0 + w1*x + w2*x^2 + . . . + wM*x^M

it claims the following equation for the prediction of t with help of the training set and position x
p(t|x, X, T) =\int p(t|x,w)p(w|X, T) dw

that means p(t|x,w)p(w|X, T)= p(t,w|x,X, T) for later maginalization
But I believe that p(t|x,w)=/= p(t|x,w,X, T)

If it is not clear enough, i can explain more
 
Ronald_Ku said:
p(t|x, X, T) =\int p(t|x,w)p(w|X, T) dw

To make sense of an expression denoting a probability, we must understand what the "probability space" is. Can you describe the space associated with the notation p(t,x,X,T) ? Is it possible that some of those variables are not random variables, but ordinary variables instead? For example, if I have 3 loaded dice then I might use the notation
p( X,k)
to mean "the probability of getting a result of X when I roll the k-th die".. That interpretation doesn't imply that "k" is a random variable. It doesn't implay that there is an experiment where I pick a die at random.
 
Let me clarify what you mean: in the expression p(x|m,n), it is not necessary that m and n are random variable. They can be parameters. Whether one is a random variable depend on the setting of the experiment,right?
IN your case, k can be random variable and p(x,k) means getting a x at random and rolling the k die at random if the experiment is set to be this way.

I am not sure when it comes to my case.
In my case, the notation p(t|x, X, T)means
given the training set X,T and the position x, the probability of finding t. t is obviously random variable. But x,X,T can also be parameters. It is not explicitly written that they are random variables or parameters. The experiment can be predicting t at position x, given a fixed set of X,T. Or the experiment can be predicting t while picking x,X,T at random and now considering P(t|x,X,T). I don't know which experiment the author is doing.
 
The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw

is to consider X,T to be ordinary variables, not random variables. So within the equation X,T can be treated as if they have some constant value.

The random variable t is a function only of the random variables x and w
(i.e t = w_0 + w_1x + ... w_n x^n). So the notation p(t|x,w) means the same thing as p(t|x,w,X,T) because t has no random variation due to X, T.

But by that interpretation, the author could have written p(w | X,T) as p(w). I supposed he needed to mention X, T somewhere on the right hand side.

Leaving X,T unmentioned, it isn't controversial that

p(t|x) = \int p(t|x,w) p(w) dw

or, mentioning them everywhere, that

p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw
 
  • Like
Likes   Reactions: 1 person
Stephen Tashi said:
The fact that a p(...) notation can be interpreted in variouis ways, doesn't mean that an equation using it will be correct for each possible interpretation. I suppose an author might use ambiguous notation to assert that a whole family of equations are correct by writing one equation. In your case, I'll guess the author only has one specific interpretation in mind.

One way to make sense of:

p(t|x, X, T) = \int p(t|x,w) p(w,X,T) dw

is to consider X,T to be ordinary variables, not random variables. So within the equation X,T can be treated as if they have some constant value.

The random variable t is a function only of the random variables x and w
(i.e t = w_0 + w_1x + ... w_n x^n). So the notation p(t|x,w) means the same thing as p(t|x,w,X,T) because t has no random variation due to X, T.

But by that interpretation, the author could have written p(w | X,T) as p(w). I supposed he needed to mention X, T somewhere on the right hand side.

Leaving X,T unmentioned, it isn't controversial that

p(t|x) = \int p(t|x,w) p(w) dw

or, mentioning them everywhere, that

p(t|x,X,T) = \int p(t|x,w,X,T) p(w| X,T) dw

Thanks so much.I may try to proceed in this direction and see if anything weird occur again.
 
  • #10
I have another question.
if the above equations are needed to be considered with the following equation.
p(w|X, T, α, β) ∝ p(T|X,w, β)p(w|α).------(a)
α, β are fixed.

The left hand side p(w|X,T) is posterior probability. The right hand side p(w) is the prior probability.
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)
 
  • #11
Ronald_Ku said:
So X,T are random variables. Right?
In the book, it mentions that p(w|X,T) in the integral will be given by (a)

It isn't possible to interpret equations without some context. Establishing the context requires a verbal explanation.
A person who is familiar with the type of problem that Bishop is solving might understand his notation, but I haven't read a statement of what these equations are supposed to accomplish.

An elementary question that needs a verbl explanation is whether the p(...) notation is supposed to indicate the probability of an event or whether it supposed to denote a probability density function evaluated somewhere. (The value of a a density function evaluated at a point isn't equal to "the proability of" that point.)
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
3
Views
3K