# Conditional & uncoditional MSE (in MMSE estimation)

by kasraa
Tags: kalman filter, mmse, mse
 P: 16 Hi, 1- Please explain conditional & unconditional mean square error, and their difference. 2- Which one is the solution for minimum MSE estimation? (that is conditional expectation: $$E \left[ X|Y \right]$$. I meant which one is minimized by selecting the conditional expectation.) 3- What is the relation between these two and covariance matrix in Kalman Filter? IMO, the trace of Kalman's covariance (error covariance matrix) is one of these MSEs, but I don't know which one. 4- Is there any other interpretation of Kalman's covariance matrix than the one I mentioned above? (of course there is. I meant I don't know any other and please help me :)) Thanks a lot.
P: 2,504
 Quote by kasraa Hi, 1- Please explain conditional & unconditional mean square error, and their difference. 2- Which one is the solution for minimum MSE estimation? (that is conditional expectation: $$E \left[ X|Y \right]$$. I meant which one is minimized by selecting the conditional expectation.) 3- What is the relation between these two and covariance matrix in Kalman Filter? IMO, the trace of Kalman's covariance (error covariance matrix) is one of these MSEs, but I don't know which one. 4- Is there any other interpretation of Kalman's covariance matrix than the one I mentioned above? (of course there is. I meant I don't know any other and please help me :)) Thanks a lot.
I usually don't refer questions to the Wikipedia, but it has a fairly comprehensive discussion of the Kalman filter and associated Bayesian analysis. I suggest you read it and then come back if you have unanswered questions.

You can minimize the MSE by minimizing the trace of the posterior error estimate covariance matrix. The trace is minimized when the matrix derivative is zero
P: 16
 Quote by SW VandeCarr I usually don't refer questions to the Wikipedia, but it has a fairly comprehensive discussion of the Kalman filter and associated Bayesian analysis. I suggest you read it and then come back if you have unanswered questions. You can minimize the MSE by minimizing the trace of the posterior error estimate covariance matrix. The trace is minimized when the matrix derivative is zero

My question is about MMSE estimation in general (and Kalman filter, only as one of its implementations for some particular case).

Let me explain more. As I've asked in (1) and (2), I'm not sure what conditional/unconditional MSE exactly are (and which one is minimized by MMSE estimator), but I think they are something like:

$$E \left[ \left( x - \hat{x} \right) \left( x - \hat{x} \right)^{T} | Z \right]$$
and
$$E \left[ \left( x - \hat{x} \right) \left( x - \hat{x} \right)^{T} \right]$$

(where $$Z$$ is the observation (or sequence of observations as in Kalman) and $$\hat{x}=E \left[ x | Z \right]$$).

Again, if we look at Kalman as an implementation of MMSE estimator, in some references the conditional MSE is expanded to reach Kalman's covariances, and in some others, the unconditional MSE is used to do so.

(BTW, I won't be surprised if someone show that they're equal for Gaussian/linear case, and both references are right).

Thanks a lot.

P: 2,504
Conditional & uncoditional MSE (in MMSE estimation)

 Quote by kasraa Thanks for your reply. Actually I've read it. My question is about MMSE estimation in general (and Kalman filter, only as one of its implementations for some particular case). $$E \left[ \left( x - \hat{x} \right) \left( x - \hat{x} \right)^{T} \right]$$ (where $$Z$$ is the observation (or sequence of observations as in Kalman) and $$\hat{x}=E \left[ x | Z \right]$$). Again, if we look at Kalman as an implementation of MMSE estimator, in some references the conditional MSE is expanded to reach Kalman's covariances, and in some others, the unconditional MSE is used to do so. (BTW, I won't be surprised if someone show that they're equal for Gaussian/linear case, and both references are right). Thanks a lot.

http://cnx.org/content/m11267/latest/

I take it that P(Z) is your unconditional probability density and p(Z|x) is your likelihood function. Then taking the joint density p(x)p(Z|x) you can use Bayes Theorem for the posterior density which is the conditional p(x|Z)=p(Z|x)p(x)/p(Z).

I'm not sure why you think the unconditional and conditional probability densities would be equal unless, of course, the prior density and the posterior density were equal. It appears that the MMSE estimate applies to the posterior density p(x|Z).

EDIT: The link is a bit slow, but works as of my testing at the edit time.
P: 16
 Quote by SW VandeCarr I think this article may help. http://cnx.org/content/m11267/latest/ I take it that P(Z) is your unconditional probability density and p(Z|x) is your likelihood function. Then taking the joint density p(x)p(Z|x) you can use Bayes Theorem to for the posterior density which is the conditional p(x|Z)=p(Z|x)p(x)/p(Z). I'm not sure why you think the unconditional and conditional probability densities would be equal unless, of course, the prior density and the posterior density were equal. It appears that the MMSE estimate applies to the posterior density p(x|Z). EDIT: The link is a bit slow, but works as of my testing at the edit time.
Part one:

The posterior $$p \left( x|Z \right)$$, has a mean and a (co)variance. Its mean is the MMSE estimator, $$E \left[ x|Z \right]$$, and its variance (or the trace of its covariance matrix, if it's a random vector) is the minimum mean squared error. Am I right?

So the trace of conditional (co)variance ((co)variance of conditional pdf), that is the trace of
$$E \left[ \left( x - E \left[ x|Z \right] \right) \left( x - E \left[ x|Z \right] \right)^{T} | Z \right]$$
is the minimum MSE (and
$$E \left[ \left(x-E \left[ x|Z \right] \right)^2 | Z \right]$$
for the case of scalar RV).
Is it correct?

And then what is the trace of
$$E \left[ \left( x - E \left[ x|Z \right] \right) \left( x - E \left[ x|Z \right] \right)^{T}\right]$$
?
(or
$$E \left[ \left(x-E \left[ x|Z \right] \right)^2 \right]$$
for the case of scaler RV).

Part Two:

As I know MMSE estimation is about finding $$h \left( . \right)$$ that minimizes the
$$E \left[ \left( x - h \left( Z \right) \right)^2 \right]$$ (MSE).
And the answer is $$h \left( Z \right) = E \left[ x | Z \right]$$.

So the MMSE is
$$E \left[ \left(x-E \left[ x|Z \right] \right)^2 \right]$$.

Can you see the problem?

And a new one :D Maybe it's the answer.

Orthogonality principle implies $$E \left[ \left( x - E \left[ x|Z \right] \right)Z \right] = 0$$, which implies
$$E \left[ \left( x - E \left[ x|Z \right] \right)| Z \right] = E \left[ \left( x - E \left[ x|Z \right] \right) \right]$$.

Does it also imply:
$$E \left[ \left( x - E \left[ x|Z \right] \right) ^2 | Z \right] = E \left[ \left( x - E \left[ x|Z \right] \right)^2 \right]$$?
Is it correct?

Thanks.
P: 2,504
 Quote by kasraa Part one: The posterior $$p \left( x|Z \right)$$, has a mean and a (co)variance. Its mean is the MMSE estimator, $$E \left[ x|Z \right]$$, and its variance (or the trace of its covariance matrix, if it's a random vector) is the minimum mean squared error. Am I right? Thanks.
I don't think so. For a random vector of observations, the MMSE for the posterior estimate is the minimized trace of the covariance matrix. This is consistent with the discussion in the link I provided. As for the rest, I'm not following you. I don't understand why you're double conditioning on Z for instance. Someone else will have to try and help you
P: 16
 Quote by SW VandeCarr I don't think so. For a random vector of observations, the MMSE for the posterior estimate is the minimized trace of the covariance matrix. This is consistent with the discussion in the link I provided. As for the rest, I'm not following you. I don't understand why you're double conditioning on Z for instance. Someone else will have to try and help you
I believe the covariance matrix of $$p \left( x | Z \right)$$ when they're jointly Gaussian is:
$$R_{XX}-R_{XZ}R_{ZZ}^{-1}R_{ZX}$$
which its trace is the *minimum* MSE.

I believe the minimization took place when you selected $$E \left[ x|Z \right]$$ as your estimator.

About double conditioning, that's the part I do not fully understand either. But you can find it in many references. For example: "Estimation with Applications to Tracking and Navigation" by Bar-Shalom.

see the bottom of page 204 for example. (There are plenty of these in this book (and also others), I just found one that is included in Google's preview.)

Thanks again.

Any other ideas?
 P: 2,504 Not really. I was thinking of the discussion re the Kalman filter where the trace is minimized using the Kalman gain $$K_{k}$$ and setting: $$\frac{\partial tr(P_{k|k})}{\partial K_{k}}= 0$$
P: 16
 Quote by SW VandeCarr Not really. I was thinking of the discussion re the Kalman filter where the trace is minimized using the Kalman gain $$K_{k}$$ and setting: $$\frac{\partial tr(P_{k|k})}{\partial K_{k}}= 0$$
Sorry, but I can't understand your last post (I don't get your "English". not minimizing the trace of covariance matrix to find the Kalman gain ...).

What I understand is that Kalman and MMSE are related (in fact, I think Kalman is the MMSE estimator for the case of Gaussian variables (or Linear MMSE estimator without the assumption of Gaussian variables), for associated linear state (process) and observation equations (models)).

Did you see the book?
P: 16
 Quote by SW VandeCarr Not really. I was thinking of the discussion re the Kalman filter where the trace is minimized using the Kalman gain $$K_{k}$$ and setting: $$\frac{\partial tr(P_{k|k})}{\partial K_{k}}= 0$$
Sorry, but I can't understand your last post (I don't get your "English". not minimizing the trace of covariance matrix to find the Kalman gain ...).

What I understand is that Kalman and MMSE are related (in fact, I think Kalman is the MMSE estimator for the case of Gaussian variables (or Linear MMSE estimator without the assumption of Gaussian variables), for associated linear state (process) and observation equations (models)).

Did you see the book?
P: 2,504
 Quote by kasraa Sorry, but I can't understand your last post (I don't get your "English". not minimizing the trace of covariance matrix to find the Kalman gain ...). What I understand is that Kalman and MMSE are related (in fact, I think Kalman is the MMSE estimator for the case of Gaussian variables (or Linear MMSE estimator without the assumption of Gaussian variables), for associated linear state (process) and observation equations (models)). Did you see the book?
Yes. There's a lot there to look at. Thanks

If you go back to the wiki article and go down to "Kalman gain derivation" you'll see the equation I wrote. This is how the author suggests minimizing the trace of $$P_{k|k}$$ (posterior estimate covariance matrix).

http://en.wikipedia.org/wiki/Kalman_filter

And yes, the Kalman Filter is a MMSE estimator.
 P: 16 So you're confused about conditional/unconditional MSE too (just like me), right? :D
P: 2,504
 Quote by kasraa So you're confused about conditional/unconditional MSE too (just like me), right? :D
I didn't think so, but maybe I am. Using your notation P(Z) is an unconditional probability, p(Z|x) is the likelihood function. The joint probability is p(Z|x)p(x) and the conditional probability is p(x|Z) which we obtain from p(x|Z)=p(Z|x)p(x)/p(Z). What's wrong with this?

EDIT:If your reading this in your mail, go to the forum. The post has been edited. The calculation in the Wiki link is specific to the Kalman Filter.
 P: 16 In my notation, $$X$$ is the RV which we're trying to estimate, so the prior (unconditional pdf, which in case of the Kalman filter, is our estimate at the previous step) is $$p(x)$$. Actually nothing is wrong with it (using Bayes in order to reach to the posterior). I believe I explained my confusions clear, especially in post #5. What do you think about my statement at the end of that post? Is it true? Thanks a lot. BTW, anyone else has any ideas about our discussion?
P: 2,504
 Quote by kasraa Does it also imply: $$E \left[ \left( x - E \left[ x|Z \right] \right) ^2 | Z \right] = E \left[ \left( x - E \left[ x|Z \right] \right)^2 \right]$$? Is it correct? Thanks.
As I said, I don't know what the double conditional on Z means. I can only guess that it might mean something like $$P_{k|k}$$ which indicates the successor state to $$P_{k|k-1}$$. If so, you need to introduce a system of subsripts.

Also, I don't see any problem to getting the MSE from any sample vector. It's the MMSE that can be a challenge.

 Related Discussions Introductory Physics Homework 1 Set Theory, Logic, Probability, Statistics 5 Advanced Physics Homework 1 Introductory Physics Homework 1