Diagonalization of large non-sparse matrices

In summary: In that case, I don't think you would see any meaningful results from trying to speed up the computation.
  • #1
jsschreck
1
0
Dear physics friends:

I am using a Potts model to study protein folding. In short, the partition function of the problem is written as the sum of the eigenvalues of the transfer matrix each to the Nth power (the transfer matrix factors the expression exp(H/Temperature) where H is the Hamiltonian of the problem). The dimension of the transfer matrix is 3^(odd number) x 3^(odd number). I wrote a small code in Python, and I can diagonalize up to 3^5 x 3^5 (plotting 500 points now takes about 3 hours on a 64 bit machine). I am looking for ways to speed up the calculation, as I need to get at least up to the 3^7 x 3^7 case. The matrix in question has all elements greater than zero, and of the form exp(stuff / temperature). What I am doing is diagonalizing the matrix for various temperatures (eg, compute e-values, shift the temp. slightly, compute, shift, compute, etc). Most of the problems come from pushing Temp -> 0. I can get down to about 0.05 for the 27 x 27 case before eigenvalue overflow problems. I am using the routine eig() in python. Getting close to zero isn't that essential, its the compilation time that is stalling me. Any suggestions on making computation of this problem faster/overall strategy of attack??

John Schreck
Drexel University
 
Physics news on Phys.org
  • #2
Is there any particular reason for you using python?

I don't have any experience with the linalg module that implements this, but i was under the impression that MATLAB or c++ is pretty fast with eigenvalue/vector solutions and diagonalization... you might want to give that a shot, but I'm not sure what other dependencies you have here with the actual folding model
 
  • #3
3^7 x 3^7 is a VERY small matrix. Pretty much any standard diagonalization technique implemented in something like c++ or fortran should be really fast.
 
  • #4
maverick_starstrider said:
3^7 x 3^7 is a VERY small matrix. Pretty much any standard diagonalization technique implemented in something like c++ or fortran should be really fast.

I would not think of using Python to handle that large of a matrix, Python is woefully slow in my experience with it for atomic scale simulations.

Once you get above 5000x5000 unknowns you may want to look into more intelligent eigenvalue solvers. A direct way is usually order N3, but a better solver will be N log N. I was just reading a paper that had to solve eigenvalue problems for a large matrices and they referenced:

Z. Zhu, B. Song, and J. White in Proceedings of the Deign Automation Conference (Anaheim, CA, 2003), pp. 712-717.
V. Rokhlin, Journal of Computational Physics 60, 187 (1985).
W. Hackbusch and Z. P. Nowak, Numerische Mathematik 54, 463 (1989)

Personally, I have been using the Intel MKL libraries and have solved 7000x7000 eigenvalue problems in less than a minute or so I think. These are just optimized versions of the Lapack routines. So I do not think you need to move up to better solvers just yet, but I do think you should try using a faster language and look at the Lapack library.

Though the question remains, why are you diagonalizing the matrix? Do you want the eigenvalues or some other end product? Because if you are diagonalizing as an intermediate step, it is generally better to look for algorithms that are designed for your desired end product.
 
  • #5
jsschreck said:
Dear physics friends:

Most of the problems come from pushing Temp -> 0. I can get down to about 0.05 for the 27 x 27 case before eigenvalue overflow problems. I am using the routine eig() in python. Getting close to zero isn't that essential, its the compilation time that is stalling me. Any suggestions on making computation of this problem faster/overall strategy of attack??

John Schreck
Drexel University

What do you mean by pushing temperature: Temp -> 0? Of course you are going to have problems since I doubt you mean absolute zero. Try using degrees Kelvin (0 C = 273 K).
 
  • #6
SW VandeCarr said:
What do you mean by pushing temperature: Temp -> 0? Of course you are going to have problems since I doubt you mean absolute zero. Try using degrees Kelvin (0 C = 273 K).

No they mean absolute zero. It's a Boltzmann distribution like that used in Monte-Carlo. Anywho, now that I've actually READ your post (I only glanced at it before) it doesn't actually sound like it has anything to do with processing time. Your eigenvalues sound like they're blowing up because your boltzman factors exp(stuff/temp) are blowing up. I assume you're using double precision. How low are you hoping to go in terms of temperature?
 
  • #8
SW VandeCarr said:
Well, if he's studying protein folding, I don't think he's going to see much action below 268K.

jsstreck

The lowest temperatures I've found in studies of protein folding is -10C (263K). I could see temperatures as low as -20C; but this is still a long way from absolute zero. Perhaps you are trying something new. I've been away from this for a while. If there's something to be gained from ultra-cold experiments, I'd like to hear about it. I haven't turned up anything on a web search, but I may not be using the right search terms.

http://www.whatislife.com/reader/motion/motion.html see Experimental Procedure under 'Protein Folding' 2nd, paragraph

EDIT: In any case, thermodynamic entropy goes to infinity at absolute zero.
 
Last edited:
  • #9
Sorry it took me forever to respond, I forgot to ask for emails when people post replies!

Anywho, some folks on here have suggested not seeing much action below ~270K, this is correct (to my knowledge). I am concerned more about numerical problems that result from having all matrix elements of the form \exp(E / kT ) .. where T -> 0 (when you set k_Boltzmann = 1), some other number in the 200s if I plug in k or use the universal gas constant, R (obviously you have to rescale the energy, which experimentally we found is a monotonically increasing function of T). Ill spare the details, but numerically I am dealing with a matrix full of products elements of the form exp( (a+bT) / kT) where I have 3 sets of constants (a,b). The problem I still have to some degree is that if I push the T past some limit in Python (while decreasing T), the eigenvalue routine will not converge. For certain cases it fails near temperatures of interest.

I have been able to do this for the case 3^7 x 3^7 on our cluster, took about 4 hours to run the eigenvalue routine 100 times, iterating over T (my matrix size is 3^odd x 3^odd, where odd means an odd number). I would like to diagonalize up to a 3^15 x 3^15. I live in PA, and so have access to a supercomputer center in Pittsburgh where on some machines the time is free (provided you are accepted). My advisor was just using the machine recently, so it wouldn't take long to get on, but this could be unnecessary?

That said, I think I should take those of you have said to try c or c++. A lot of my friends (grad students) have suggested this as well. I started using Python mainly because other people in my group use it, and the syntax is very easy (also please note that I am already using the LAPACK tools that you can import into Python as well as using double precision). Converting to c++ I doubt is hard...does anyone have any specific suggestions on going about this?
 
  • #10
python seems to be the standard in modeling protein folding as far as I have seen.
 

1. What is diagonalization of a matrix?

Diagonalization is a process in which a matrix is transformed into a diagonal matrix, where all the non-zero elements are on the main diagonal and all the other elements are zero. This process is often used in linear algebra to simplify calculations and solve systems of equations.

2. Why is diagonalization important for large non-sparse matrices?

Diagonalization is important for large non-sparse matrices because it allows for efficient computation and manipulation of the matrix. It also helps to identify patterns and relationships within the matrix, making it easier to analyze and interpret the data it represents.

3. How is diagonalization performed on a large non-sparse matrix?

Diagonalization of a large non-sparse matrix involves finding the eigenvalues and eigenvectors of the matrix. These eigenvalues and eigenvectors are then used to construct a diagonal matrix through a process called similarity transformation.

4. What are the benefits of diagonalization for large non-sparse matrices?

There are several benefits to diagonalization of large non-sparse matrices. First, it simplifies the matrix and makes it easier to analyze and manipulate. It also helps to identify patterns and relationships within the matrix. Additionally, diagonalization can help to reduce the dimensionality of the matrix, which can be useful in data compression and storage.

5. Are there any limitations to diagonalization of large non-sparse matrices?

Yes, there are limitations to diagonalization of large non-sparse matrices. One limitation is that not all matrices can be diagonalized, particularly if they have complex eigenvalues. Additionally, the process of diagonalization can be computationally expensive for very large matrices, so it may not be practical for all applications.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
770
Replies
4
Views
3K
  • Calculus and Beyond Homework Help
Replies
2
Views
267
Replies
5
Views
888
  • Calculus and Beyond Homework Help
Replies
11
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
18
Views
2K
  • Linear and Abstract Algebra
Replies
4
Views
2K
  • Calculus and Beyond Homework Help
Replies
4
Views
798
Replies
1
Views
1K
  • Math Proof Training and Practice
2
Replies
69
Views
3K
Back
Top