# A How to combine correlated vector-valued estimates

Tags:
1. Mar 20, 2017

### WhiteHaired

I'd need to combine several vector-valued estimates of a physical quantity in order to obtain a better estimate with less uncertainty.
As in the scalar case, the weighted mean of multiple estimates can provide a maximum likelihood estimate. For independent estimates we simply replace the variance $σ^2$ by the covariance matrix $∑$ and the arithmetic inverse by the matrix inverse (both denoted in the same way, via superscripts); the weight matrix then reads (see https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Vector-valued_estimates)
$$W_i =∑_i^{-1}$$
The weighted mean in this case is:
$$\bar x = \Sigma_{\bar x} \left(\sum_{i=1}^n \text{W}_i \mathbf{x}_i\right)$$
(where the order of the matrix-vector product is not commutative).
The covariance of the weighted mean is:
$$\Sigma_{\bar x} = \left(\sum_{i=1}^n \text{W}_i\right)^{-1}$$
For example, consider the weighted mean of the point $[1~0]^\top$ with high variance in the second component and $[0~1]^\top$ with high variance in the first component. Then
$$x_1 := \begin{bmatrix}1\\0\end{bmatrix}, \qquad \Sigma_1 := \begin{bmatrix}1 & 0\\ 0 & 100\end{bmatrix}$$
$$x_2 := \begin{bmatrix}0\\1\end{bmatrix}, \qquad \Sigma_2 := \begin{bmatrix}100 & 0\\ 0 & 1\end{bmatrix}$$
then the weighted mean is:
$$\bar x = \left(\Sigma_1^{-1} + \Sigma_2^{-1}\right)^{-1} \left(\Sigma_1^{-1} \mathbf{x}_1 + \Sigma_2^{-1} \mathbf{x}_2\right) \\[5pt] =\begin{bmatrix} 0.9901 &0\\ 0& 0.9901\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix} = \begin{bmatrix}0.9901 \\ 0.9901\end{bmatrix}$$

On the other hand, for scalar quantities it is well known that correlations between estimates can be easily accounted. In the general case (see https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Accounting_for_correlations), suppose that $X=[x_1,\dots,x_n]^\top$, $C$ is the covariance matrix relating the quantities $x_i$,$\bar x$ is the common mean to be estimated, and $W$ is the design matrix $[1, ..., 1]^\top$ (of length $n$). The Gauss–Markov theorem states that the estimate of the mean having minimum variance is given by:
$$\bar x = \sigma^2_\bar x (W^\top C^{-1} X)$$
with
$$\sigma^2_\bar x=(W^\top C^{-1} W)^{-1}$$

The question is, how can correlated vector-valued estimates be combined?
In our case, how to proceed if $x_1$ and $x_2$ are not independent and all the terms in the covariance matrix are known?
In other words, are there analogous expressions to the last two for vector-valued estimates?

Last edited: Mar 20, 2017
2. Mar 20, 2017

### andrewkirk

What does $∑_i^{-1}$ represent?

3. Mar 21, 2017

### WhiteHaired

$\Sigma_i^{-1}$ stands for the inverse of the covariance matrix of the vector-valued quantity $x_i$.

4. Mar 21, 2017

### andrewkirk

The inverse of the covariance matrix is denoted by $\Sigma^{-1}$. There is no $i$ subscript.

If I had to guess, I would guess that the author intended $\Sigma_i^{-1}$ to denote the $i$th row of $\Sigma^{-1}$. But the wiki article does not give sufficient info to either confirm or reject that guess.

5. Mar 21, 2017

### WhiteHaired

It has the subscript $i$ since it represents the covariance matrix of $x_i$, a vector-valued (2 values) quantity. Please, see the numerical example.

6. Mar 21, 2017

### Stephen Tashi

Let me see if I understand the terminology "correlated vector-valued". If I am sampling a vector valued measurement of a randomly varying vector, there can be two types of dependence.

First, there can be dependence among the components in each measurement of a vector - for example, if the 2nd component of a vector is small then perhaps the 5th component of the same vector tends to be large.

Second, there can be a dependence between the different measurements of vectors - for example, if the third vector I measure has a small 2nd component, perhaps the tenth vector I measure tends to have a large 5th component.

Dependence between random variables is partially described by their covariance. Information about covariance is often sufficient to answer questions about the variance of estimators of functions of the random variables. With further assumptions about their joint distribution, information about covariance may be sufficient to answer questions about maximum likelihood estimators of those functions.

If your problem has both types of dependence then a complete covariance matrix would describe the covariance between each $X_{i,j}$ and $X_{k,l}$ where $X_{a,b}$ denotes the $a$th component of the $b$th measured vector. So if each measured vector had 4 components and you take 100 measurements, the covariance matrix that describes the all the dependences would have dimensions 400 by 400.

Are we making the assumption that this large covariance matrix is known?

7. Mar 21, 2017

### WhiteHaired

Stephen, you described very well my problem. Thank you very much.
Yes, we assume than the "large" covariance matrix is known. Actually it is not very large, since each vector has only 2 components and we take 10 measurements.

Last edited: Mar 21, 2017
8. Mar 21, 2017

### Stephen Tashi

We have to decide what we are trying to estimate.

From that point of view, we are dealing with 20 different random variables and you have one sample that consists of the joint measurement of those 20 variables.

That scenario could be simplified in various ways. We could assume different things are really the same or that some of the 20 different random variables have the same mean, etc.

9. Mar 24, 2017

### WhiteHaired

We are looking for the best (unbiased, efficient, etc.) estimate for a vector-valued quantity of two components [x1, x2], from several measures (see the first previous numerical example). When measurements are independent, the best estimate is the weighted mean of the measurements, where each weight is the inverse of the covariance matrix of each sample (variances and correlations between the two components of a measure, see first example above). The uncertainty of the best estimate can also be estimated (as indicated above).
If the measures were not independent, and the covariance matrix between the different measures was known, how would they obtain the best estimate and their uncertainty?
Could the case of non-independent scalar quantities (second example above) be generalized to vector-valued quantities?
Any reference that addresses this topic?

10. Mar 24, 2017

### Stephen Tashi

In the situations you cited from the Wikipedia, the goal is to estimate a "common mean". The ways to apply that phrase directly to the vector valued random variable $[X_1, X_2]$ are to say we are trying to estimate the mean of $X_1 + X_2$ or that we are assuming $X_1$ and $X_2$ have the same [population] mean $\mu$ and that we are trying to estimate that "common" mean.

If the random variables $Y_1, Y_2$ have the joint distribution $f(y_1, y_2)$ then how do we define the concept of "vector valued" mean? It seems simplest to define it to be the vector of mean values with each component computed from a marginal distribution. So it would be $(\mu_1,\mu_2)$ with $\mu_1$ the mean of $Y_1$ computed from the marginal distribution of $Y_1$, etc. (There is nothing in such a definition that requires we use a common estimation algorithm that operates on vectors to compute the vector valued mean.)

You have the collection of 20 random variables $\{\{X_1,j\},\{X_2,k\}\}, \ j=1,...10,\ k=1,...10\$. Your notation indicates a "vector valued" mean with only two components instead of 20 components.

In regard to the first component of this two-component vector valued mean, shall we assume the collection of random variables $\{x_1,j\},\ j = 1,...10$ has a "common mean" in the sense of $\mu_{X_{1,1}} = \mu_{X_{1,2}} = ... = \mu_{X_{1,10}}$ ? $\$ Or shall we assume our objective is to estimate $\mu_1= \sum_{j=1}^{10} \mu_{X_{1,j}}$ , which does not assume the random variables involved have equal means?

Last edited: Mar 24, 2017
11. Mar 24, 2017

### WhiteHaired

Hi Stephen. Thank you for your time again.

I am not expert in statistical terminology, so let me try to clarify the subject with the practical problem that I have.

We wish to determine the radius $R$ and the length $L$ of a microwave cylindrical resonant cavity from the measurement of resonance frequencies of different resonant modes in the cavity. The result of an experiment (measurement with two different modes) gives us a value for $R$ and another one for $L$, i.e. a pair $[R_1,L_1]$ and the covariance matrix
$$\Sigma_1:= \begin{bmatrix} \sigma^2_{R_{1}} & cov(R_1,L_1)\\ cov(R_1,L_1)& \sigma^2_{L_{1}} \end{bmatrix}$$
Unlike the wikipedia example, $R_1$ and $L_1$ are not independent, so $\Sigma_1$ is not diagonal.

We “repeat” the experiment 10 times, so we obtain 10 pairs $[R_i, L_i]$ and we are also able to obtain the covariance matrix between different pairs which are not independent. (We know the covariance matrix between the components of the same pair and also different pairs, i.e. a 20x20 matrix).
How can we estimate the values of $R$ and $L$ of our cavity and their uncertainties?

Note: Actually there are not 10 repetitions of the whole experiment, but 10 combinations of the resonant frequencies (5 modes taken 2 at a time) giving rise to 10 different pairs $[R_i,L_i]$. That’s the reason why the pairs are not independent, because some of them share a mode. But in order to keep the problem in focus, we considere that we have 10 not-independent experiments and that's all.

12. Mar 26, 2017

### Stephen Tashi

Let me make sure that I understand the process. One (complete) experiment consists of taking 10 sub-experiments. Each sub-experiment produces one pair of estimates for $[R,L]$.

Since you say you have a covariance matrix for the 20 readings taken in the sub-experiments, I would assume you have conducted many complete experiments in order to get this covariance matrix.

While "independence" is a very specific relation between random variables, "not-independent" is unspecific. There are many ways that two random variables can be not-independent.

It's wise not to let "statistics" lobotomize your knowledge of the physics of the problem. For example, if the measurement of $[R_i, L_I]$ is done by a known computation from "raw" measurements like voltages or frequency counts then perhaps the errors in the raw data can be modeled as independent random variables. Such a model might imply a definite joint distribution for the 20 random variables involved in one complete experiment.