Calculating a covariance matrix with missing data

TravisJay · Nov 26, 2013

Consider a co-variance matrix A such that each element a_i,j = E(X_i X_j) - E(X_i) E(X_j) where X_i,X_j are random variables.

Consider the case that each variable X has a different sample size. Let's say that X_i contains the elements x_i,1, …, x_i,N, and X_j contains the elements x_j,1, ..., x_j,n where each element is paired up to element n and N > n.

In this case, for each covariance a_i,j, is it acceptable to trim the sample size for each X_i and X_j to n and continue the calculation? (I'm not sure if trim is the correct terminology but it seems to meet my needs).

If it is acceptable to trim, then is it necessary to trim to the smallest n of all of the random variables X, or can I just trim to the smallest of the pair?

I'd appreciate it if anyone can point me in the direction of some literature that explains this in detail. I've been struggling to find something that is specific to this case.

bahamagreen · Nov 28, 2013

See if non-parametric statistical analysis might be what you are needing...

Stephen Tashi · Nov 28, 2013

TravisJay said:

Consider a co-variance matrix A such that each element a_i,j = E(X_i X_j) - E(X_i) E(X_j) where X_i,X_j are random variables.

The way such a matrix is computed is from the joint distribution of X_i, X_j. It isn't computed from sample data.

Consider the case that each variable X has a different sample size.

Apparently, what you want to do is estimate the covariance matrix.

You should look up methods of estimating covariance from samples that have missing data.

I'd appreciate it if anyone can point me in the direction of some literature that explains this in detail. I've been struggling to find something that is specific to this case.

You haven't given enough information to define the case. There is no general "best" method for doing this unless you make some assumptions - for example, assumptions about what family of distributions generated the data.

http://icml.cc/discuss/2012/313.html

Calculating a covariance matrix with missing data

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective