Covariance of Posterior Predictive Distribution

SchroedingersLion · Sep 22, 2022

Greetings!

I believe I found an error in a paper to Bayesian neural networks. I think the expression of the covariance of the posterior predictive is wrong, and I wrote down my own calculation. Would be great if a seasoned Bayesian could take a look.

Imagine a regression scenario. We want to learn a function ##f_{\theta}: \mathbb{R}^m \rightarrow \mathbb{R}^n ## such that it fits the data set ##X=\{ ( x_i, y_i)| i=1,...,N \}## as good as possible, where the ##x_i## is the model input and ##y_i## the target. The data points are i.i.d. The function is parametrized by ##\theta \in \mathbb{R}^d##. Given ##\theta##, we expect the desired target ##y## to a given input ##x## to be normally distributed around ##f_{\theta}(x)##. This gives the likelihood function
$$
p(y|x,\theta) = \mathcal{N}(f_{\theta}(x),\Sigma).
$$
Suppose now that we wrote down the posterior ##p(\theta| X)## and, by whatever means, we obtained samples from it,
$$
S=\{ \theta_i | \theta_i \sim p(\theta| X), i=1,...M \}.
$$

The posterior predictive distribution (typically applied to an ##x## that was not seen before) is now given by
$$
p(y | x, X) = \int p(y|x, \theta) p(\theta | X) \text{d} \theta.
$$

The final prediction of our model can now be written as an average over the predictions given by the posterior samples, i.e.
$$
\begin{align}
\hat{y} & = \mathbb{E}_{y|x,X}(y) \\
& = \int y p(y | x, X) \text{d}y \\
& = \int y \int p(y|x, \theta) p(\theta | X) \text{d} \theta \text{d}y \\
& = \int \mathbb{E}_{y|x,\theta}(y) p(\theta | X) \text{d} \theta \\
& = \mathbb{E}_{\theta | X} \Big( \mathbb{E}_{y|x,\theta}(y) \Big) \\
& = \mathbb{E}_{\theta | X} \Big( f_{\theta}(x) \Big) \\
& \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S} f_{\theta_{i}}(x),
\end{align}
$$
where in the penultimate line we used the likelihood function from above and the last line is the typical Monte Carlo estimator for the true posterior mean.

To quantify the uncertainty of our prediction ##\hat{y}##, one can use the covariance matrix ##\Sigma_{y|x,X}##. In the paper, the authors give the formula (without calculation)
$$
\Sigma_{y|x,X} = \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S} (f_{\theta_{i}}(x) - \hat{y})(f_{\theta_{i}}(x) - \hat{y})^{T}.
$$
I think this is wrong. The covariance is
$$
\Sigma_{y|x,X}=\mathbb{E}_{y|x,X}\Big((y-\hat{y})(y-\hat{y})^{T}\Big).
$$
Following the computation of ##\mathbb{E}_{y|x,X}(y)## from the beginning with ##(y-\hat{y})(y-\hat{y})^{T}## inserted for ##y##, one arrives at
$$
\Sigma_{y|x,X} = \mathbb{E}_{\theta | X} \Bigg( \mathbb{E}_{y|x,\theta}\Big((y-\hat{y})(y-\hat{y})^{T}\Big) \Bigg).
$$
It looks like they now just set ##\mathbb{E}_{y|x,\theta}\Big((y-\hat{y})(y-\hat{y})^{T}\Big) = (\mathbb{E}_{y|x,\theta}(y)-\hat{y})(\mathbb{E}_{y|x,\theta}(y)-\hat{y})^{T} ##, which would then reduce to their expression. But since the bracket term is a product, this should not be allowed as we typically have ##\big(E(V)\big)^2\neq E(V^2)##.

I tried to derive the correct expression on my own:
We use the other formula for the covariance,
$$
\begin{align}
\Sigma_{y|x,X} &= \mathbb{E}_{y|x,X} (yy^T) - \hat{y}\hat{y}^{T} \\
&= \mathbb{E}_{\theta | X} \Big( \mathbb{E}_{y|x,\theta}(yy^T) \Big) - \hat{y}\hat{y}^{T} \\
& \approx \frac{1}{|S|-1} \sum_{\theta_{i} \in S}\mathbb{E}_{y|x,\theta}(yy^T) - \hat{y}\hat{y}^{T} \\

&= \frac{1}{|S|-1} \sum_{\theta_{i} \in S}\Big( \Sigma + f_{\theta_{i}}(x)f_{\theta_{i}}(x)^{T}\Big) - \hat{y}\hat{y}^{T},
\end{align}
$$
where the last line comes from the definition of the likelihood ##p(y|x,\theta)## and the property that ##\Sigma = \mathbb{E}_{y|x,\theta}(yy^{T})-\mathbb{E}_{y|x,\theta}(y) \mathbb{E}_{y|x,\theta}(y^T)##.

Do you agree with all of this?

Cheers!
SL

mmwave · Sep 22, 2022

Dear SL,

Thank you for bringing this to our attention. I have reviewed your calculations and it seems that you are correct. The expression for the covariance of the posterior predictive in the paper does appear to be incorrect. Your derivation of the correct expression is also sound.

I would recommend reaching out to the authors of the paper to inform them of this error. It is important for scientific papers to have accurate and precise calculations, and your contribution will help improve the overall quality of the paper.

Thank you for your diligence and attention to detail. As scientists, it is our responsibility to ensure the accuracy of our work and to constantly strive for improvement.

Covariance of Posterior Predictive Distribution

1. What is the covariance of posterior predictive distribution?

2. How is the covariance of posterior predictive distribution calculated?

3. What does a positive covariance of posterior predictive distribution indicate?

4. How can the covariance of posterior predictive distribution be interpreted?

5. How is the covariance of posterior predictive distribution useful in data analysis?

Similar threads

Hot Threads

Recent Insights