Question about sample covariance matrix

In summary, the formula for estimating the covariance matrix of a set of mutually independent random variables uses the sample mean and true mean of the variables. The equal sign holds in the last step because of the properties of linear transformation and the assumption that the covariances between each variable and the sample mean are all zero. The change in sign for UUT can be explained by considering 1/N as an approximate probability for each entry in VUT.
  • #1
sanctifier
58
0
Suppose vectors X1, X2, ... , Xn whose components are random variables are mutually independent(I mean Xi's are vectors of components with constants which are possible values of random variables labeled by the component indice, and all these labeled random variables are organized as a vector X, hence Xi's just are samples of such a X), and the sample mean is [itex]\hat{M}[/itex] = [itex]\frac{1}{N}[/itex][itex]\sum[/itex] [itex]\stackrel{N}{i = 1}[/itex] Xi, and the true mean of all Xi's is M. Then to estimate the covariance matrix of Xi, we employ the following formula:
[itex]\hat{Ʃ}[/itex] = [itex]\frac{1}{N}[/itex][itex]\sum[/itex] [itex]\stackrel{N}{i = 1}[/itex] {(Xi - [itex]\hat{M}[/itex])(Xi - [itex]\hat{M}[/itex])T}
[itex]\ [/itex][itex]\ [/itex][itex]\: [/itex]= [itex]\frac{1}{N}[/itex][itex]\sum[/itex] [itex]\stackrel{N}{i = 1}[/itex] {((Xi - M) - ([itex]\hat{M}[/itex] - M))((Xi - M) - ([itex]\hat{M}[/itex] - M))T}
[itex]\ [/itex][itex]\ [/itex][itex]\: [/itex]= [itex]\frac{1}{N}[/itex][itex]\sum[/itex] [itex]\stackrel{N}{i = 1}[/itex] (Xi - M)(Xi - M)T - ([itex]\hat{M}[/itex] - M)([itex]\hat{M}[/itex] - M)T
My question is how does the equal sign hold in the last step?
I did some work about this question, first I note that the transpose is a llinear transformation, i.e., for two vectors V and U, (V + U)T = VT + UT, then I realize that the following equation is legal.
(V - U)(V - U)T = V[itex]\! [/itex]VT - VUT - UVT + UUT
Let V = (Xi - M) and U = ([itex]\hat{M}[/itex] - M), the terms missing in the last step of [itex]\hat{Ʃ}[/itex] are -VUT and -UVT, OK, I know the entries of E[VUT] actually are covariances of Xi and [itex]\hat{M}[/itex], and I assume they are all zero, consequently the terms -VUT and -UVT do miss because of taking the expectation on [itex]\hat{Ʃ}[/itex], but in the last step, they vanished before taking the expectation! Why?
Finally, I also notice that the sign of UUT = ([itex]\hat{M}[/itex] - M)([itex]\hat{M}[/itex] - M)T has been changed from + to -, how does this happen?
 
Last edited:
Physics news on Phys.org
  • #2
Ok, if 1/N can be envisaged as a approximate probability of each entry of VUT, this can explain the vanishing of -VUT and -UVT without taking a expectation, but how to explain the sign change of UUT occurred in the last step?
 

1. What is a sample covariance matrix?

A sample covariance matrix is a square matrix that summarizes the relationship between two or more variables in a sample data set. It represents the covariance, or the measure of how two variables change together, between each pair of variables. It is commonly used in statistics and data analysis to understand the linear relationship between variables.

2. How is a sample covariance matrix calculated?

A sample covariance matrix is calculated by taking the mean of each variable and then calculating the covariance between each pair of variables using the formula: Cov(X,Y) = (1/n-1) * Σ(xi-x̄)(yi-ȳ), where n is the number of observations, xi and yi are individual data points, x̄ and ȳ are the mean of each variable. The resulting matrix is a square matrix with the same number of rows and columns as the number of variables in the data set.

3. What does the diagonal of a sample covariance matrix represent?

The diagonal of a sample covariance matrix represents the variance of each variable. The variance measures how much a single variable varies from its mean. Therefore, the diagonal of the covariance matrix shows the variability of each variable in the data set.

4. How is a sample covariance matrix used in data analysis?

A sample covariance matrix is used in data analysis to understand the relationship between variables in a data set. It can help identify patterns and trends and can be used to make predictions. It is also used in statistical tests to determine the significance of the relationship between variables.

5. What are the limitations of using a sample covariance matrix?

One limitation of using a sample covariance matrix is that it is affected by the scale and units of measurement of the variables. This can make it difficult to compare covariance matrices from different data sets. Additionally, if there are outliers or extreme values in the data, the covariance matrix may not accurately represent the relationship between variables. Furthermore, it can only capture linear relationships between variables and may not be suitable for non-linear data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
961
  • Linear and Abstract Algebra
Replies
8
Views
789
  • Differential Geometry
Replies
2
Views
590
  • Linear and Abstract Algebra
Replies
1
Views
1K
  • Introductory Physics Homework Help
Replies
3
Views
749
  • Linear and Abstract Algebra
Replies
2
Views
2K
Replies
4
Views
1K
  • Linear and Abstract Algebra
Replies
4
Views
1K
  • Linear and Abstract Algebra
Replies
3
Views
1K
  • Linear and Abstract Algebra
Replies
17
Views
4K
Back
Top