Basic notation (conditional probability delim in linear equation)

Click For Summary

Discussion Overview

The discussion revolves around the use of notation in conditional probability within the context of Bayesian prior distributions, specifically as presented in "Pattern Recognition and Machine Learning" by Bishop. Participants explore the implications of using different delimiters in mathematical functions and their interpretations in relation to curve fitting problems.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested

Main Points Raised

  • One participant questions the use of a conditional probability delimiter in a linear function, seeking clarification on its meaning in the context of Bayesian prior distributions.
  • Another participant suggests that delimiters other than commas, such as semicolons and vertical bars, are sometimes used to clarify the roles of variables and parameters in mathematical expressions.
  • A different participant expresses skepticism about the appropriateness of the delimiter in the original equation, proposing an alternative interpretation based on the evaluation of a function.
  • Some participants discuss the Bayesian expression for conditional probability and its derivation, noting the roles of likelihood and prior distributions, as well as the presence of hyperparameters.
  • There is a reiteration of the original equation and its components, indicating a focus on the relationship between the prior and likelihood in Bayesian inference.

Areas of Agreement / Disagreement

Participants express differing views on the interpretation of the notation and the appropriateness of the delimiter used in the conditional probability expression. There is no consensus on the correct interpretation or application of the notation.

Contextual Notes

Participants reference specific mathematical expressions and their components, indicating a reliance on definitions and context that may not be universally agreed upon. The discussion includes unresolved interpretations of notation and its implications in Bayesian statistics.

dspiegel
Messages
3
Reaction score
0
Hey all.

Looking at "Pattern Recognition and Machine Learning" (Bishop, 2006) p28-31, the author appears to be using what would ordinarily be a delimiter for a conditional probability inside a linear function. See the first variable in normpdf as below. This is in the context of defining a Bayesian prior distribution over polynomial coefficients in a curve fitting problem.

[tex]p(\textbf{w} | \alpha) = NormPDF(\textbf{w} | \textbf{0}, \alpha^{-1}\textbf{I}) = \left(\frac{\alpha}{2\pi}\right)^{(M+1)/2} exp \left(-\frac{\alpha}{2}\textbf{w}^T\textbf{w}\right)[/tex]

Can anybody shine some light on this for me please?

Many thanks.
 
Physics news on Phys.org
I don't know if this is precisely the case here, but sometimes delimiters other than comma are used in functions. I have mostly seen semicolons (;) and vertical bars (|).
Often this is done to separate arguments by meaning. For example, an author may write
Consider a normal distribution with mean [itex]\mu[/itex] and standard deviation [itex]\sigma[/itex]. We define the probability of finding a value between a and b as [tex]P(a, b \mid \mu, \sigma)[/tex] as ...
You can just as well write [tex]P(x, \mu, \sigma)[/tex]. However, writing a separate delimiter hopefully makes it more clear to the reader that a and b are really the variables here and, though technically mu and sigma are variables as well, in this case they are more like parameters that have been previously fixed (some arbitrary values for some normal distribution we are interested in).
 
CompuChip said:
I don't know if this is precisely the case here, but sometimes delimiters other than comma are used in functions. I have mostly seen semicolons (;) and vertical bars (|).
Often this is done to separate arguments by meaning. For example, an author may write

You can just as well write [tex]P(x, \mu, \sigma)[/tex]. However, writing a separate delimiter hopefully makes it more clear to the reader that a and b are really the variables here and, though technically mu and sigma are variables as well, in this case they are more like parameters that have been previously fixed (some arbitrary values for some normal distribution we are interested in).

Thanks for your reply.

Although I am quite sure that's not the case in this particular instance, in general, I know non-variable parameters may be written after a semicolon.

I believe the case to be that it reads as, "the value of [tex]t_n[/tex] evaluated for [tex]y(x_n,\textbf{w})[/tex]" as described on http://en.wikipedia.org/wiki/Vertical_bar#Mathematics".

Elsewhere the likelihood of the parameters [tex]\{w,\beta\}[/tex] is written for two i.i.d. variables [tex]\{\textbf{x,t}\}[/tex] where the function y(x,w) computes the predicted value of t.

[tex]p(\textbf{t}|\textbf{x},w,\beta) = \prod_{n=1}^N NormPDF(t_n|y(x_n, \textbf{w}),\beta^{-1})[/tex]

2sLvo.png


So it seams a reasonable interpretation in this context.
 
Last edited by a moderator:
dspiegel said:
Hey all.

This is in the context of defining a Bayesian prior distribution over polynomial coefficients in a curve fitting problem.

[tex]p(\textbf{w} | \alpha) = NormPDF(\textbf{w} | \textbf{0}, \alpha^{-1}\textbf{I}) = \left(\frac{\alpha}{2\pi}\right)^{(M+1)/2} exp \left(-\frac{\alpha}{2}\textbf{w}^T\textbf{w}\right)[/tex]

Can anybody shine some light on this for me please?

Many thanks.

I don't know what this is. The Bayesian expression for the conditional probability p(w|a) is:

p(w|a)=p(a|w)p(w)/p(a).
 
SW VandeCarr said:
I don't know what this is. The Bayesian expression for the conditional probability p(w|a) is:

p(w|a)=p(a|w)p(w)/p(a).

Well there're a bit more to it. The formula you quoted is just for the prior.

The derivation is thus.

[tex]p(w|x,t,\alpha,\beta)[/tex] = (likelihood * prior) / marginal likelihood

[tex]p(w|x,t,\alpha,\beta) \propto p(t|x,w,\beta) * p(w|\alpha)[/tex]

[tex]\{\alpha,\beta\}[/tex] are hyperparameters.
 
dspiegel said:
Well there're a bit more to it. The formula you quoted is just for the prior.

The derivation is thus.

[tex]p(w|x,t,\alpha,\beta)[/tex] = (likelihood * prior) / marginal likelihood

[tex]p(w|x,t,\alpha,\beta) \propto p(t|x,w,\beta) * p(w|\alpha)[/tex]

[tex]\{\alpha,\beta\}[/tex] are hyperparameters.

OK. I was going by the original equation where the left side was simply [tex]p([/tex]w[tex]|\alpha)=[/tex]
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
Replies
5
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 31 ·
2
Replies
31
Views
3K
  • · Replies 42 ·
2
Replies
42
Views
11K
  • · Replies 114 ·
4
Replies
114
Views
12K