How to combine correlated vector-valued estimates

In summary: In this simplified case, the question is what is the best way to combine the 20 different estimates into a single estimate with the least uncertainty?In summary, if a vector-valued quantity has multiple estimates with different variances, the weighted mean of those estimates can provide a maximum likelihood estimate.
  • #1
WhiteHaired
17
0
I'd need to combine several vector-valued estimates of a physical quantity in order to obtain a better estimate with less uncertainty.
As in the scalar case, the weighted mean of multiple estimates can provide a maximum likelihood estimate. For independent estimates we simply replace the variance ##σ^2## by the covariance matrix ##∑## and the arithmetic inverse by the matrix inverse (both denoted in the same way, via superscripts); the weight matrix then reads (see https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Vector-valued_estimates)
$$W_i =∑_i^{-1}$$
The weighted mean in this case is:
$$\bar x = \Sigma_{\bar x} \left(\sum_{i=1}^n \text{W}_i \mathbf{x}_i\right)$$
(where the order of the matrix-vector product is not commutative).
The covariance of the weighted mean is:
$$\Sigma_{\bar x} = \left(\sum_{i=1}^n \text{W}_i\right)^{-1}$$
For example, consider the weighted mean of the point ##[1~0]^\top## with high variance in the second component and ##[0~1]^\top## with high variance in the first component. Then
$$x_1 := \begin{bmatrix}1\\0\end{bmatrix}, \qquad \Sigma_1 := \begin{bmatrix}1 & 0\\ 0 & 100\end{bmatrix}$$
$$x_2 := \begin{bmatrix}0\\1\end{bmatrix}, \qquad \Sigma_2 := \begin{bmatrix}100 & 0\\ 0 & 1\end{bmatrix}$$
then the weighted mean is:
$$ \bar x = \left(\Sigma_1^{-1} + \Sigma_2^{-1}\right)^{-1} \left(\Sigma_1^{-1} \mathbf{x}_1 + \Sigma_2^{-1} \mathbf{x}_2\right) \\[5pt] =\begin{bmatrix} 0.9901 &0\\ 0& 0.9901\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix} = \begin{bmatrix}0.9901 \\ 0.9901\end{bmatrix}$$

On the other hand, for scalar quantities it is well known that correlations between estimates can be easily accounted. In the general case (see https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Accounting_for_correlations), suppose that ##X=[x_1,\dots,x_n]^\top##, ##C## is the covariance matrix relating the quantities ##x_i##,##\bar x## is the common mean to be estimated, and ##W## is the design matrix ##[1, ..., 1]^\top## (of length ##n##). The Gauss–Markov theorem states that the estimate of the mean having minimum variance is given by:
$$\bar x = \sigma^2_\bar x (W^\top C^{-1} X) $$
with
$$\sigma^2_\bar x=(W^\top C^{-1} W)^{-1}$$

The question is, how can correlated vector-valued estimates be combined?
In our case, how to proceed if ##x_1## and ##x_2## are not independent and all the terms in the covariance matrix are known?
In other words, are there analogous expressions to the last two for vector-valued estimates?
Any suggestion or reference, please?
 
Last edited:
Physics news on Phys.org
  • #2
What does ##∑_i^{-1}## represent?
 
  • #3
##\Sigma_i^{-1}## stands for the inverse of the covariance matrix of the vector-valued quantity ##x_i##.
 
  • #4
The inverse of the covariance matrix is denoted by ##\Sigma^{-1}##. There is no ##i## subscript.

If I had to guess, I would guess that the author intended ##\Sigma_i^{-1}## to denote the ##i##th row of ##\Sigma^{-1}##. But the wiki article does not give sufficient info to either confirm or reject that guess.
 
  • #5
It has the subscript ##i## since it represents the covariance matrix of ##x_i##, a vector-valued (2 values) quantity. Please, see the numerical example.
 
  • #6
WhiteHaired said:
The question is, how can correlated vector-valued estimates be combined?

Let me see if I understand the terminology "correlated vector-valued". If I am sampling a vector valued measurement of a randomly varying vector, there can be two types of dependence.

First, there can be dependence among the components in each measurement of a vector - for example, if the 2nd component of a vector is small then perhaps the 5th component of the same vector tends to be large.

Second, there can be a dependence between the different measurements of vectors - for example, if the third vector I measure has a small 2nd component, perhaps the tenth vector I measure tends to have a large 5th component.

Dependence between random variables is partially described by their covariance. Information about covariance is often sufficient to answer questions about the variance of estimators of functions of the random variables. With further assumptions about their joint distribution, information about covariance may be sufficient to answer questions about maximum likelihood estimators of those functions.

If your problem has both types of dependence then a complete covariance matrix would describe the covariance between each ##X_{i,j}## and ##X_{k,l}## where ##X_{a,b}## denotes the ##a##th component of the ##b##th measured vector. So if each measured vector had 4 components and you take 100 measurements, the covariance matrix that describes the all the dependences would have dimensions 400 by 400.

Are we making the assumption that this large covariance matrix is known?
 
  • #7
Stephen, you described very well my problem. Thank you very much.
Yes, we assume than the "large" covariance matrix is known. Actually it is not very large, since each vector has only 2 components and we take 10 measurements.
 
Last edited:
  • #8
We have to decide what we are trying to estimate.

WhiteHaired said:
Yes, we assume than the "large" covariance matrix is known. Actually it is not very large, since each vector has only 2 components and we take 10 measurements.

From that point of view, we are dealing with 20 different random variables and you have one sample that consists of the joint measurement of those 20 variables.

That scenario could be simplified in various ways. We could assume different things are really the same or that some of the 20 different random variables have the same mean, etc.
 
  • #9
Stephen Tashi said:
We have to decide what we are trying to estimate.
From that point of view, we are dealing with 20 different random variables and you have one sample that consists of the joint measurement of those 20 variables.

That scenario could be simplified in various ways. We could assume different things are really the same or that some of the 20 different random variables have the same mean, etc.

We are looking for the best (unbiased, efficient, etc.) estimate for a vector-valued quantity of two components [x1, x2], from several measures (see the first previous numerical example). When measurements are independent, the best estimate is the weighted mean of the measurements, where each weight is the inverse of the covariance matrix of each sample (variances and correlations between the two components of a measure, see first example above). The uncertainty of the best estimate can also be estimated (as indicated above).
If the measures were not independent, and the covariance matrix between the different measures was known, how would they obtain the best estimate and their uncertainty?
Could the case of non-independent scalar quantities (second example above) be generalized to vector-valued quantities?
Any reference that addresses this topic?
 
  • #10
WhiteHaired said:
We are looking for the best (unbiased, efficient, etc.) estimate for a vector-valued quantity of two components [x1, x2], from several measures (see the first previous numerical example).

In the situations you cited from the Wikipedia, the goal is to estimate a "common mean". The ways to apply that phrase directly to the vector valued random variable ##[X_1, X_2]## are to say we are trying to estimate the mean of ##X_1 + X_2## or that we are assuming ##X_1## and ##X_2## have the same [population] mean ##\mu## and that we are trying to estimate that "common" mean.

If the random variables ##Y_1, Y_2## have the joint distribution ##f(y_1, y_2)## then how do we define the concept of "vector valued" mean? It seems simplest to define it to be the vector of mean values with each component computed from a marginal distribution. So it would be ##(\mu_1,\mu_2)## with ##\mu_1## the mean of ##Y_1## computed from the marginal distribution of ##Y_1##, etc. (There is nothing in such a definition that requires we use a common estimation algorithm that operates on vectors to compute the vector valued mean.)

You have the collection of 20 random variables ##\{\{X_1,j\},\{X_2,k\}\}, \ j=1,...10,\ k=1,...10\ ##. Your notation indicates a "vector valued" mean with only two components instead of 20 components.

In regard to the first component of this two-component vector valued mean, shall we assume the collection of random variables ##\{x_1,j\},\ j = 1,...10## has a "common mean" in the sense of ##\mu_{X_{1,1}} = \mu_{X_{1,2}} = ... = \mu_{X_{1,10}} ## ? ##\ ## Or shall we assume our objective is to estimate ##\mu_1= \sum_{j=1}^{10} \mu_{X_{1,j}}## , which does not assume the random variables involved have equal means?
 
Last edited:
  • Like
Likes WhiteHaired
  • #11
Hi Stephen. Thank you for your time again.

I am not expert in statistical terminology, so let me try to clarify the subject with the practical problem that I have.

We wish to determine the radius ##R## and the length ##L## of a microwave cylindrical resonant cavity from the measurement of resonance frequencies of different resonant modes in the cavity. The result of an experiment (measurement with two different modes) gives us a value for ##R## and another one for ##L##, i.e. a pair ##[R_1,L_1]## and the covariance matrix
$$\Sigma_1:= \begin{bmatrix} \sigma^2_{R_{1}} & cov(R_1,L_1)\\ cov(R_1,L_1)& \sigma^2_{L_{1}} \end{bmatrix}$$
Unlike the wikipedia example, ##R_1## and ##L_1## are not independent, so ##\Sigma_1## is not diagonal.

We “repeat” the experiment 10 times, so we obtain 10 pairs ##[R_i, L_i]## and we are also able to obtain the covariance matrix between different pairs which are not independent. (We know the covariance matrix between the components of the same pair and also different pairs, i.e. a 20x20 matrix).
How can we estimate the values of ##R ## and ##L## of our cavity and their uncertainties?

Note: Actually there are not 10 repetitions of the whole experiment, but 10 combinations of the resonant frequencies (5 modes taken 2 at a time) giving rise to 10 different pairs ##[R_i,L_i]##. That’s the reason why the pairs are not independent, because some of them share a mode. But in order to keep the problem in focus, we considere that we have 10 not-independent experiments and that's all.
 
  • #12
WhiteHaired said:
Note: Actually there are not 10 repetitions of the whole experiment, but 10 combinations of the resonant frequencies (5 modes taken 2 at a time) giving rise to 10 different pairs ##[R_i,L_i]##.

Let me make sure that I understand the process. One (complete) experiment consists of taking 10 sub-experiments. Each sub-experiment produces one pair of estimates for ##[R,L]##.

Since you say you have a covariance matrix for the 20 readings taken in the sub-experiments, I would assume you have conducted many complete experiments in order to get this covariance matrix.

in order to keep the problem in focus, we considere that we have 10 not-independent experiments and that's all.

While "independence" is a very specific relation between random variables, "not-independent" is unspecific. There are many ways that two random variables can be not-independent.

It's wise not to let "statistics" lobotomize your knowledge of the physics of the problem. For example, if the measurement of ##[R_i, L_I]## is done by a known computation from "raw" measurements like voltages or frequency counts then perhaps the errors in the raw data can be modeled as independent random variables. Such a model might imply a definite joint distribution for the 20 random variables involved in one complete experiment.
 
  • Like
Likes jim mcnamara

1. How do I combine correlated vector-valued estimates?

To combine correlated vector-valued estimates, you can use multivariate regression or principal component analysis (PCA). Multivariate regression involves fitting a model that includes all of the estimates as predictors, while PCA involves transforming the estimates into uncorrelated components and then combining them.

2. How do I know if my vector-valued estimates are correlated?

You can determine if your vector-valued estimates are correlated by calculating the correlation coefficient between each pair of estimates. A correlation coefficient close to 1 or -1 indicates a strong positive or negative correlation, respectively.

3. Can I combine correlated vector-valued estimates if they have different units?

Yes, you can still combine correlated vector-valued estimates even if they have different units. However, it is important to standardize the units before combining them to avoid any biases in the results.

4. Is it better to use multivariate regression or PCA to combine correlated vector-valued estimates?

The choice between using multivariate regression or PCA to combine correlated vector-valued estimates depends on the specific research question and data. If the goal is to predict a certain outcome variable, multivariate regression may be more appropriate. If the goal is to reduce the dimensionality of the data, PCA may be a better option.

5. Can I combine more than two correlated vector-valued estimates?

Yes, you can combine more than two correlated vector-valued estimates. Multivariate regression and PCA can handle multiple predictors, which in this case would be the different correlated estimates. However, it is important to consider the sample size and the complexity of the model to avoid overfitting the data.

Similar threads

  • Linear and Abstract Algebra
Replies
10
Views
105
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
598
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
756
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
1K
Replies
27
Views
1K
Replies
4
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
972
  • Linear and Abstract Algebra
Replies
7
Views
818
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top