MATLAB Can I calculate the covariance matrix of a large set of data?

AI Thread Summary
Calculating the covariance matrix of a stochastic process in MATLAB with dimensions of 211302x50 leads to memory issues, as the requested 211302x211302 matrix exceeds MATLAB's maximum array size preference, resulting in significant resource demands. The discussion highlights that the user may be misinterpreting the dimensions needed for the covariance matrix. Instead of a large matrix, a 50x50 covariance matrix is more appropriate, given that there are 50 realizations of a stochastic process with 200k observations each. Suggestions include estimating specific elements of the covariance matrix directly or limiting the region or resolution of the data to manage memory usage better. The conversation emphasizes the importance of correctly understanding the dimensions and purpose behind calculating the covariance matrix to find a more efficient approach.
Frank Einstein
Messages
166
Reaction score
1
TL;DR Summary
I want to calculate the covariance matrix of a large set of data. However, I get an error telling me that said matrix would be too big and therefore It cannot be done.
Hello everyone. I want to calculate the covariance matrix of a stochastic process using Matlab as

[CODE lang="matlab" title="Covariance matrix"]cov(listOfUVValues)

[/CODE]

being the dimensions of listOfUVValues 211302*50. I get the following error:

[CODE title="Error"]Requested 211302x211302 (332.7GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become
unresponsive. See array size limit or preference panel for more information.

Error in cov (line 156)
c = (xc' * xc) ./ denom;
[/CODE]
;

Is there a way to go arround this limitation or is it impossible to do?

Any answer is appreciated.

Best regards.
 
Physics news on Phys.org
There is no getting around the fact that you are asking for a matrix that will need a ton of memory. What do you need the covariance matrix for?

If there is some other end goal then there may be a better approach that bypasses the need to compute the covariance matrix at all. I'm not sure how much information you would even get from a 211302x211302 matrix that has a rank of at most 50.

If you just need a few elements of the covariance matrix, then you can estimate those directly.

jason
 
  • Like
Likes Frank Einstein
Are you using the correct dimensions to represent each realisation? A 50x50 covariance matrix for 211k realisations sounds much more reasonable than vice versa.
 
  • Like
Likes FactChecker, Frank Einstein and jasonRF
Yes, I have 50 realizations of a stochastic process, 50 valriables and 200k observations of each. I am trying to calculate the covariance between the windspeed in the X and Y directions using data from the ECMWF. I gess I will have to limit the region or the resolution.

Thanks anyway for your comments
 
IMO, "50 realizations" is a misleading phrase. I interpret that phrase as 50 observations, each with a certain number of attributes (variables) recorded.
I think that you have your dimensions switched and, as @Orodruin suggested, your covariance matrix should be 50x50.
 
I am following this thread. I have 50 wind predictions, each measured at 200k places, thus, each wind prediction is a realization of a random variable. I don't know if that helps
 

Similar threads

Replies
8
Views
2K
Replies
9
Views
2K
Replies
3
Views
3K
Replies
13
Views
3K
Back
Top