Calculating a covariance matrix with missing data

TravisJay
Messages
1
Reaction score
0
Consider a co-variance matrix A such that each element ai,j = E(Xi Xj) - E(Xi) E(Xj) where Xi,Xj are random variables.

Consider the case that each variable X has a different sample size. Let's say that Xi contains the elements xi,1, …, xi,N, and Xj contains the elements xj,1, ..., xj,n where each element is paired up to element n and N > n.

In this case, for each covariance ai,j, is it acceptable to trim the sample size for each Xi and Xj to n and continue the calculation? (I'm not sure if trim is the correct terminology but it seems to meet my needs).

If it is acceptable to trim, then is it necessary to trim to the smallest n of all of the random variables X, or can I just trim to the smallest of the pair?

I'd appreciate it if anyone can point me in the direction of some literature that explains this in detail. I've been struggling to find something that is specific to this case.
 
Physics news on Phys.org
See if non-parametric statistical analysis might be what you are needing...
 
TravisJay said:
Consider a co-variance matrix A such that each element ai,j = E(Xi Xj) - E(Xi) E(Xj) where Xi,Xj are random variables.
The way such a matrix is computed is from the joint distribution of X_i, X_j. It isn't computed from sample data.

Consider the case that each variable X has a different sample size.

Apparently, what you want to do is estimate the covariance matrix.

You should look up methods of estimating covariance from samples that have missing data.


I'd appreciate it if anyone can point me in the direction of some literature that explains this in detail. I've been struggling to find something that is specific to this case.

You haven't given enough information to define the case. There is no general "best" method for doing this unless you make some assumptions - for example, assumptions about what family of distributions generated the data.

http://icml.cc/discuss/2012/313.html
 
Last edited by a moderator:
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...

Similar threads

Replies
9
Views
2K
Replies
2
Views
1K
Replies
5
Views
2K
Replies
5
Views
2K
Replies
2
Views
4K
Replies
2
Views
8K
Back
Top