GingerCat said:
Thanks. I don't suppose you know of a good reference which explains this in simple terms? I would like to learn to do these calculations myself and understand why it works, but when I Googled the terms you use, the results were a little intimidating.
I'd recommend starting out by discretizing things so that you can see what goes on. The continuous approach follows pretty naturally once you get the discrete case.
Discrete sketch: I use matrix vector notation here. You can do this in excel or whatever if you prefer.
prior distribution ##=
\mathbf x \propto
\begin{bmatrix}
1\\
1\\
\vdots\\
1\\
\end{bmatrix}##
that is a uniform distribution. (Is a uniform distribution reasonable?) Note I used ##\propto## not ##=## because it isn't a valid probability distribution-- it doesn't sum to one. For a vector ##\mathbf \in \mathbb R^n## (i.e. n items in your vector) you could multiply everything by ## \frac{1}{n}## if you wanted to make it a valid probability distribution. What's important is that your final distribution sums to one. Having your prior distribution be improper vs proper isn't such a big deal.
let's say we have 9 items in ##\mathbf x## (i.e. ##\mathbf x \in \mathbb R^9## for purposes of an illustrative example). So the Labels for ##\mathbf x## are that the true 'coin' has a 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of having a 'heads' occur for a given 'toss' and 1 minus that for 'tails'. That is we can act is if there are 9 different possible states of the world and we want to evaluate and update our probabilities of being in each one, as we make observations.
So likelihood function for heads is given by the diagonal matrix ##
\mathbf D_1 =\begin{bmatrix}
0.1 & 0 & 0& 0& 0& 0 & 0 & 0& 0\\
0 & 0.2 & 0& 0& 0& 0 & 0 & 0& 0\\
0 & 0 & 0.3 & 0& 0& 0 & 0 & 0& 0\\
0 & 0 & 0& 0.4& 0& 0 & 0 & 0& 0\\
0 & 0 & 0& 0& 0.5& 0 & 0 & 0& 0\\
0 & 0 & 0& 0& 0& 0.6 & 0 & 0& 0\\
0 & 0 & 0& 0& 0& 0 & 0.7 & 0& 0\\
0 & 0 & 0& 0& 0& 0 & 0 & 0.8& 0\\
0 & 0 & 0& 0& 0& 0 & 0 & 0& 0.9\\
\end{bmatrix}##
and the likelihood of tails is given by the diagonal matrix
##
\mathbf D_0 =\begin{bmatrix}
0.9 & 0 & 0& 0& 0& 0 & 0 & 0& 0\\
0 & 0.8 & 0& 0& 0& 0 & 0 & 0& 0\\
0 & 0 & 0.7 & 0& 0& 0 & 0 & 0& 0\\
0 & 0 & 0& 0.6& 0& 0 & 0 & 0& 0\\
0 & 0 & 0& 0& 0.5& 0 & 0 & 0& 0\\
0 & 0 & 0& 0& 0& 0.4 & 0 & 0& 0\\
0 & 0 & 0& 0& 0& 0 & 0.3 & 0& 0\\
0 & 0 & 0& 0& 0& 0 & 0 & 0.2& 0\\
0 & 0 & 0& 0& 0& 0 & 0 & 0& 0.1\\
\end{bmatrix}##
so if you observe 3 heads then one tail, your final distribution (posterior) is given by:
##\mathbf {posterior} \propto \mathbf D_0 \cdot \mathbf D_1 \cdot \mathbf D_1 \cdot \mathbf D_1 \cdot \mathbf x = \big(\mathbf D_0\big)^1 \big(\mathbf D_1\big)^3 \mathbf x##
To move from ##\propto ## to equals, you just need to add up the values in that final vector (call it ##\alpha## and multiply your calculated value by ##\frac{1}{\alpha}## -- that will make sure you posterior sums to one. hence ##\frac{1}{\alpha}## is your normalizing constant.
In the problem mentioned by Dale you'd actually have
##\mathbf {posterior} \propto \mathbf D_0 \big(\mathbf D_1\big)^{19} \mathbf x##
note that I used n terms, where n = 9 for the sake of the example. To the extent you were going to do this discretely and use the results, you'd want to use a much higher n, like n = 1,000. Alternatively, once you understand the discrete case deeply, you can fairly easily generalize to the continuous case.
- - - -
Alternatively, if you speak Python, Allen Downey does a great job walking you through discrete Bayes, here in his book "Think Bayes". It is made freely available by the author, here:
http://greenteapress.com/wp/think-bayes/