Calculating a covariance matrix with missing data

Click For Summary
SUMMARY

This discussion focuses on calculating a covariance matrix A with missing data, specifically addressing the scenario where random variables Xi and Xj have different sample sizes. The key question is whether it is acceptable to trim the sample size of each variable to the smaller size n for covariance calculations. It is established that while trimming is possible, it is crucial to consider the implications of trimming to the smallest sample size of all variables versus just the pair in question. The discussion emphasizes the need for literature on estimating covariance from samples with missing data.

PREREQUISITES
  • Understanding of covariance matrix calculations
  • Familiarity with random variables and their distributions
  • Knowledge of statistical methods for handling missing data
  • Basic principles of non-parametric statistical analysis
NEXT STEPS
  • Research methods for estimating covariance with missing data
  • Explore literature on non-parametric statistical analysis techniques
  • Learn about the implications of sample size trimming in statistical calculations
  • Investigate the joint distribution of random variables in covariance computation
USEFUL FOR

Statisticians, data analysts, and researchers dealing with datasets that have missing values, particularly those focused on covariance matrix estimation.

TravisJay
Messages
1
Reaction score
0
Consider a co-variance matrix A such that each element ai,j = E(Xi Xj) - E(Xi) E(Xj) where Xi,Xj are random variables.

Consider the case that each variable X has a different sample size. Let's say that Xi contains the elements xi,1, …, xi,N, and Xj contains the elements xj,1, ..., xj,n where each element is paired up to element n and N > n.

In this case, for each covariance ai,j, is it acceptable to trim the sample size for each Xi and Xj to n and continue the calculation? (I'm not sure if trim is the correct terminology but it seems to meet my needs).

If it is acceptable to trim, then is it necessary to trim to the smallest n of all of the random variables X, or can I just trim to the smallest of the pair?

I'd appreciate it if anyone can point me in the direction of some literature that explains this in detail. I've been struggling to find something that is specific to this case.
 
Physics news on Phys.org
See if non-parametric statistical analysis might be what you are needing...
 
TravisJay said:
Consider a co-variance matrix A such that each element ai,j = E(Xi Xj) - E(Xi) E(Xj) where Xi,Xj are random variables.
The way such a matrix is computed is from the joint distribution of X_i, X_j. It isn't computed from sample data.

Consider the case that each variable X has a different sample size.

Apparently, what you want to do is estimate the covariance matrix.

You should look up methods of estimating covariance from samples that have missing data.


I'd appreciate it if anyone can point me in the direction of some literature that explains this in detail. I've been struggling to find something that is specific to this case.

You haven't given enough information to define the case. There is no general "best" method for doing this unless you make some assumptions - for example, assumptions about what family of distributions generated the data.

http://icml.cc/discuss/2012/313.html
 
Last edited by a moderator:

Similar threads

  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K