Covariance of Posterior Predictive Distribution

In summary: Your Name]In summary, SL found an error in a paper on Bayesian neural networks regarding the expression of the covariance of the posterior predictive. SL provided their own calculation and asked for a seasoned Bayesian's opinion. The conversation concludes with the expert agreeing with SL's findings and recommending reaching out to the authors of the paper to inform them of the error.
  • #1
TL;DR Summary
Check my calculation
Greetings!

I believe I found an error in a paper to Bayesian neural networks. I think the expression of the covariance of the posterior predictive is wrong, and I wrote down my own calculation. Would be great if a seasoned Bayesian could take a look.

Imagine a regression scenario. We want to learn a function ##f_{\theta}: \mathbb{R}^m \rightarrow \mathbb{R}^n ## such that it fits the data set ##X=\{ ( x_i, y_i)| i=1,...,N \}## as good as possible, where the ##x_i## is the model input and ##y_i## the target. The data points are i.i.d. The function is parametrized by ##\theta \in \mathbb{R}^d##. Given ##\theta##, we expect the desired target ##y## to a given input ##x## to be normally distributed around ##f_{\theta}(x)##. This gives the likelihood function
$$
p(y|x,\theta) = \mathcal{N}(f_{\theta}(x),\Sigma).
$$
Suppose now that we wrote down the posterior ##p(\theta| X)## and, by whatever means, we obtained samples from it,
$$
S=\{ \theta_i | \theta_i \sim p(\theta| X), i=1,...M \}.
$$

The posterior predictive distribution (typically applied to an ##x## that was not seen before) is now given by
$$
p(y | x, X) = \int p(y|x, \theta) p(\theta | X) \text{d} \theta.
$$

The final prediction of our model can now be written as an average over the predictions given by the posterior samples, i.e.
$$
\begin{align}
\hat{y} & = \mathbb{E}_{y|x,X}(y) \\
& = \int y p(y | x, X) \text{d}y \\
& = \int y \int p(y|x, \theta) p(\theta | X) \text{d} \theta \text{d}y \\
& = \int \mathbb{E}_{y|x,\theta}(y) p(\theta | X) \text{d} \theta \\
& = \mathbb{E}_{\theta | X} \Big( \mathbb{E}_{y|x,\theta}(y) \Big) \\
& = \mathbb{E}_{\theta | X} \Big( f_{\theta}(x) \Big) \\
& \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S} f_{\theta_{i}}(x),
\end{align}
$$
where in the penultimate line we used the likelihood function from above and the last line is the typical Monte Carlo estimator for the true posterior mean.

To quantify the uncertainty of our prediction ##\hat{y}##, one can use the covariance matrix ##\Sigma_{y|x,X}##. In the paper, the authors give the formula (without calculation)
$$
\Sigma_{y|x,X} = \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S} (f_{\theta_{i}}(x) - \hat{y})(f_{\theta_{i}}(x) - \hat{y})^{T}.
$$
I think this is wrong. The covariance is
$$
\Sigma_{y|x,X}=\mathbb{E}_{y|x,X}\Big((y-\hat{y})(y-\hat{y})^{T}\Big).
$$
Following the computation of ##\mathbb{E}_{y|x,X}(y)## from the beginning with ##(y-\hat{y})(y-\hat{y})^{T}## inserted for ##y##, one arrives at
$$
\Sigma_{y|x,X} = \mathbb{E}_{\theta | X} \Bigg( \mathbb{E}_{y|x,\theta}\Big((y-\hat{y})(y-\hat{y})^{T}\Big) \Bigg).
$$
It looks like they now just set ##\mathbb{E}_{y|x,\theta}\Big((y-\hat{y})(y-\hat{y})^{T}\Big) = (\mathbb{E}_{y|x,\theta}(y)-\hat{y})(\mathbb{E}_{y|x,\theta}(y)-\hat{y})^{T} ##, which would then reduce to their expression. But since the bracket term is a product, this should not be allowed as we typically have ##\big(E(V)\big)^2\neq E(V^2)##.

I tried to derive the correct expression on my own:
We use the other formula for the covariance,
$$
\begin{align}
\Sigma_{y|x,X} &= \mathbb{E}_{y|x,X} (yy^T) - \hat{y}\hat{y}^{T} \\
&= \mathbb{E}_{\theta | X} \Big( \mathbb{E}_{y|x,\theta}(yy^T) \Big) - \hat{y}\hat{y}^{T} \\
& \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S}\mathbb{E}_{y|x,\theta}(yy^T) - \hat{y}\hat{y}^{T} \\

&= \frac{1}{|S|-1} \sum_{\theta_{i} \in S}\Big( \Sigma + f_{\theta_{i}}(x)f_{\theta_{i}}(x)^{T}\Big) - \hat{y}\hat{y}^{T},
\end{align}
$$
where the last line comes from the definition of the likelihood ##p(y|x,\theta)## and the property that ##\Sigma = \mathbb{E}_{y|x,\theta}(yy^{T})-\mathbb{E}_{y|x,\theta}(y) \mathbb{E}_{y|x,\theta}(y^T)##.

Do you agree with all of this?

Cheers!
SL
 
Last edited:
  • Like
Likes Dale
Physics news on Phys.org
  • #2
Dear SL,

Thank you for bringing this to our attention. I have reviewed your calculations and it seems that you are correct. The expression for the covariance of the posterior predictive in the paper does appear to be incorrect. Your derivation of the correct expression is also sound.

I would recommend reaching out to the authors of the paper to inform them of this error. It is important for scientific papers to have accurate and precise calculations, and your contribution will help improve the overall quality of the paper.

Thank you for your diligence and attention to detail. As scientists, it is our responsibility to ensure the accuracy of our work and to constantly strive for improvement.
 

Suggested for: Covariance of Posterior Predictive Distribution

Replies
10
Views
940
Replies
7
Views
2K
Replies
4
Views
1K
Replies
2
Views
1K
Replies
2
Views
1K
Replies
11
Views
1K
Replies
4
Views
778
Back
Top