# Conditional & uncoditional MSE (in MMSE estimation)

1. Apr 16, 2010

### kasraa

Hi,

1- Please explain conditional & unconditional mean square error, and their difference.
2- Which one is the solution for minimum MSE estimation? (that is conditional expectation: $$E \left[ X|Y \right]$$. I meant which one is minimized by selecting the conditional expectation.)
3- What is the relation between these two and covariance matrix in Kalman Filter? IMO, the trace of Kalman's covariance (error covariance matrix) is one of these MSEs, but I don't know which one.
4- Is there any other interpretation of Kalman's covariance matrix than the one I mentioned above? (of course there is. I meant I don't know any other and please help me

Thanks a lot.

2. Apr 19, 2010

### SW VandeCarr

I usually don't refer questions to the Wikipedia, but it has a fairly comprehensive discussion of the Kalman filter and associated Bayesian analysis. I suggest you read it and then come back if you have unanswered questions.

You can minimize the MSE by minimizing the trace of the posterior error estimate covariance matrix. The trace is minimized when the matrix derivative is zero

Last edited: Apr 20, 2010
3. Apr 20, 2010

### kasraa

My question is about MMSE estimation in general (and Kalman filter, only as one of its implementations for some particular case).

Let me explain more. As I've asked in (1) and (2), I'm not sure what conditional/unconditional MSE exactly are (and which one is minimized by MMSE estimator), but I think they are something like:

$$E \left[ \left( x - \hat{x} \right) \left( x - \hat{x} \right)^{T} | Z \right]$$
and
$$E \left[ \left( x - \hat{x} \right) \left( x - \hat{x} \right)^{T} \right]$$

(where $$Z$$ is the observation (or sequence of observations as in Kalman) and $$\hat{x}=E \left[ x | Z \right]$$).

Again, if we look at Kalman as an implementation of MMSE estimator, in some references the conditional MSE is expanded to reach Kalman's covariances, and in some others, the unconditional MSE is used to do so.

(BTW, I won't be surprised if someone show that they're equal for Gaussian/linear case, and both references are right).

Thanks a lot.

4. Apr 20, 2010

### SW VandeCarr

http://cnx.org/content/m11267/latest/

I take it that P(Z) is your unconditional probability density and p(Z|x) is your likelihood function. Then taking the joint density p(x)p(Z|x) you can use Bayes Theorem for the posterior density which is the conditional p(x|Z)=p(Z|x)p(x)/p(Z).

I'm not sure why you think the unconditional and conditional probability densities would be equal unless, of course, the prior density and the posterior density were equal. It appears that the MMSE estimate applies to the posterior density p(x|Z).

EDIT: The link is a bit slow, but works as of my testing at the edit time.

Last edited: Apr 20, 2010
5. Apr 20, 2010

### kasraa

Part one:

The posterior $$p \left( x|Z \right)$$, has a mean and a (co)variance. Its mean is the MMSE estimator, $$E \left[ x|Z \right]$$, and its variance (or the trace of its covariance matrix, if it's a random vector) is the minimum mean squared error. Am I right?

So the trace of conditional (co)variance ((co)variance of conditional pdf), that is the trace of
$$E \left[ \left( x - E \left[ x|Z \right] \right) \left( x - E \left[ x|Z \right] \right)^{T} | Z \right]$$
is the minimum MSE (and
$$E \left[ \left(x-E \left[ x|Z \right] \right)^2 | Z \right]$$
for the case of scalar RV).
Is it correct?

And then what is the trace of
$$E \left[ \left( x - E \left[ x|Z \right] \right) \left( x - E \left[ x|Z \right] \right)^{T}\right]$$
?
(or
$$E \left[ \left(x-E \left[ x|Z \right] \right)^2 \right]$$
for the case of scaler RV).

Part Two:

As I know MMSE estimation is about finding $$h \left( . \right)$$ that minimizes the
$$E \left[ \left( x - h \left( Z \right) \right)^2 \right]$$ (MSE).
And the answer is $$h \left( Z \right) = E \left[ x | Z \right]$$.

So the MMSE is
$$E \left[ \left(x-E \left[ x|Z \right] \right)^2 \right]$$.

Can you see the problem?

And a new one :D Maybe it's the answer.

Orthogonality principle implies $$E \left[ \left( x - E \left[ x|Z \right] \right)Z \right] = 0$$, which implies
$$E \left[ \left( x - E \left[ x|Z \right] \right)| Z \right] = E \left[ \left( x - E \left[ x|Z \right] \right) \right]$$.

Does it also imply:
$$E \left[ \left( x - E \left[ x|Z \right] \right) ^2 | Z \right] = E \left[ \left( x - E \left[ x|Z \right] \right)^2 \right]$$?
Is it correct?

Thanks.

Last edited: Apr 20, 2010
6. Apr 20, 2010

### SW VandeCarr

I don't think so. For a random vector of observations, the MMSE for the posterior estimate is the minimized trace of the covariance matrix. This is consistent with the discussion in the link I provided. As for the rest, I'm not following you. I don't understand why you're double conditioning on Z for instance. Someone else will have to try and help you

7. Apr 20, 2010

### kasraa

I believe the covariance matrix of $$p \left( x | Z \right)$$ when they're jointly Gaussian is:
$$R_{XX}-R_{XZ}R_{ZZ}^{-1}R_{ZX}$$
which its trace is the *minimum* MSE.

I believe the minimization took place when you selected $$E \left[ x|Z \right]$$ as your estimator.

About double conditioning, that's the part I do not fully understand either. But you can find it in many references. For example: "Estimation with Applications to Tracking and Navigation" by Bar-Shalom.

see the bottom of page 204 for example. (There are plenty of these in this book (and also others), I just found one that is included in Google's preview.)

Thanks again.

Any other ideas?

8. Apr 20, 2010

### SW VandeCarr

Not really. I was thinking of the discussion re the Kalman filter where the trace is minimized using the Kalman gain $$K_{k}$$ and setting:

$$\frac{\partial tr(P_{k|k})}{\partial K_{k}}= 0$$

9. Apr 20, 2010

### kasraa

Sorry, but I can't understand your last post (I don't get your "English". not minimizing the trace of covariance matrix to find the Kalman gain ...).

What I understand is that Kalman and MMSE are related (in fact, I think Kalman is the MMSE estimator for the case of Gaussian variables (or Linear MMSE estimator without the assumption of Gaussian variables), for associated linear state (process) and observation equations (models)).

Did you see the book?

10. Apr 20, 2010

### kasraa

Sorry, but I can't understand your last post (I don't get your "English". not minimizing the trace of covariance matrix to find the Kalman gain ...).

What I understand is that Kalman and MMSE are related (in fact, I think Kalman is the MMSE estimator for the case of Gaussian variables (or Linear MMSE estimator without the assumption of Gaussian variables), for associated linear state (process) and observation equations (models)).

Did you see the book?

11. Apr 20, 2010

### SW VandeCarr

Yes. There's a lot there to look at. Thanks

If you go back to the wiki article and go down to "Kalman gain derivation" you'll see the equation I wrote. This is how the author suggests minimizing the trace of $$P_{k|k}$$ (posterior estimate covariance matrix).

http://en.wikipedia.org/wiki/Kalman_filter

And yes, the Kalman Filter is a MMSE estimator.

Last edited: Apr 20, 2010
12. Apr 20, 2010

### kasraa

So you're confused about conditional/unconditional MSE too (just like me), right? :D

13. Apr 20, 2010

### SW VandeCarr

I didn't think so, but maybe I am. Using your notation P(Z) is an unconditional probability, p(Z|x) is the likelihood function. The joint probability is p(Z|x)p(x) and the conditional probability is p(x|Z) which we obtain from p(x|Z)=p(Z|x)p(x)/p(Z). What's wrong with this?

EDIT:If your reading this in your mail, go to the forum. The post has been edited. The calculation in the Wiki link is specific to the Kalman Filter.

Last edited: Apr 20, 2010
14. Apr 20, 2010

### kasraa

In my notation, $$X$$ is the RV which we're trying to estimate, so the prior (unconditional pdf, which in case of the Kalman filter, is our estimate at the previous step) is $$p(x)$$.

Actually nothing is wrong with it (using Bayes in order to reach to the posterior). I believe I explained my confusions clear, especially in post #5.

What do you think about my statement at the end of that post? Is it true?

Thanks a lot.

BTW, anyone else has any ideas about our discussion?

15. Apr 20, 2010

### SW VandeCarr

As I said, I don't know what the double conditional on Z means. I can only guess that it might mean something like $$P_{k|k}$$ which indicates the successor state to $$P_{k|k-1}$$. If so, you need to introduce a system of subsripts.

Also, I don't see any problem to getting the MSE from any sample vector. It's the MMSE that can be a challenge.

Last edited: Apr 20, 2010