# Diagonalization of large non-sparse matrices

1. Jul 27, 2009

### jsschreck

Dear physics friends:

I am using a Potts model to study protein folding. In short, the partition function of the problem is written as the sum of the eigenvalues of the transfer matrix each to the Nth power (the transfer matrix factors the expression exp(H/Temperature) where H is the Hamiltonian of the problem). The dimension of the transfer matrix is 3^(odd number) x 3^(odd number). I wrote a small code in Python, and I can diagonalize up to 3^5 x 3^5 (plotting 500 points now takes about 3 hours on a 64 bit machine). I am looking for ways to speed up the calculation, as I need to get at least up to the 3^7 x 3^7 case. The matrix in question has all elements greater than zero, and of the form exp(stuff / temperature). What I am doing is diagonalizing the matrix for various temperatures (eg, compute e-values, shift the temp. slightly, compute, shift, compute, etc). Most of the problems come from pushing Temp -> 0. I can get down to about 0.05 for the 27 x 27 case before eigenvalue overflow problems. I am using the routine eig() in python. Getting close to zero isnt that essential, its the compilation time that is stalling me. Any suggestions on making computation of this problem faster/overall strategy of attack??

John Schreck
Drexel University

2. Jul 27, 2009

### lstellyl

Is there any particular reason for you using python?

I don't have any experience with the linalg module that implements this, but i was under the impression that matlab or c++ is pretty fast with eigenvalue/vector solutions and diagonalization... you might want to give that a shot, but i'm not sure what other dependencies you have here with the actual folding model

3. Jul 28, 2009

### maverick_starstrider

3^7 x 3^7 is a VERY small matrix. Pretty much any standard diagonalization technique implemented in something like c++ or fortran should be really fast.

4. Jul 28, 2009

### Born2bwire

I would not think of using Python to handle that large of a matrix, Python is woefully slow in my experience with it for atomic scale simulations.

Once you get above 5000x5000 unknowns you may want to look into more intelligent eigenvalue solvers. A direct way is usually order N3, but a better solver will be N log N. I was just reading a paper that had to solve eigenvalue problems for a large matrices and they referenced:

Z. Zhu, B. Song, and J. White in Proceedings of the Deign Automation Conference (Anaheim, CA, 2003), pp. 712-717.
V. Rokhlin, Journal of Computational Physics 60, 187 (1985).
W. Hackbusch and Z. P. Nowak, Numerische Mathematik 54, 463 (1989)

Personally, I have been using the Intel MKL libraries and have solved 7000x7000 eigenvalue problems in less than a minute or so I think. These are just optimized versions of the Lapack routines. So I do not think you need to move up to better solvers just yet, but I do think you should try using a faster language and look at the Lapack library.

Though the question remains, why are you diagonalizing the matrix? Do you want the eigenvalues or some other end product? Because if you are diagonalizing as an intermediate step, it is generally better to look for algorithms that are designed for your desired end product.

5. Jul 28, 2009

### SW VandeCarr

What do you mean by pushing temperature: Temp -> 0? Of course you are going to have problems since I doubt you mean absolute zero. Try using degrees Kelvin (0 C = 273 K).

6. Jul 28, 2009

### maverick_starstrider

No they mean absolute zero. It's a boltzmann distribution like that used in Monte-Carlo. Anywho, now that I've actually READ your post (I only glanced at it before) it doesn't actually sound like it has anything to do with processing time. Your eigenvalues sound like they're blowing up because your boltzman factors exp(stuff/temp) are blowing up. I assume you're using double precision. How low are you hoping to go in terms of temperature?

7. Jul 28, 2009

### SW VandeCarr

Last edited: Jul 28, 2009
8. Jul 28, 2009

### SW VandeCarr

jsstreck

The lowest temperatures I've found in studies of protein folding is -10C (263K). I could see temperatures as low as -20C; but this is still a long way from absolute zero. Perhaps you are trying something new. I've been away from this for a while. If there's something to be gained from ultra-cold experiments, I'd like to hear about it. I haven't turned up anything on a web search, but I may not be using the right search terms.

http://www.whatislife.com/reader/motion/motion.html see Experimental Procedure under 'Protein Folding' 2nd, paragraph

EDIT: In any case, thermodynamic entropy goes to infinity at absolute zero.

Last edited: Jul 28, 2009
9. Sep 25, 2009

### jschreck

Sorry it took me forever to respond, I forgot to ask for emails when people post replies!

Anywho, some folks on here have suggested not seeing much action below ~270K, this is correct (to my knowledge). I am concerned more about numerical problems that result from having all matrix elements of the form \exp(E / kT ) .. where T -> 0 (when you set k_Boltzmann = 1), some other number in the 200s if I plug in k or use the universal gas constant, R (obviously you have to rescale the energy, which experimentally we found is a monotonically increasing function of T). Ill spare the details, but numerically I am dealing with a matrix full of products elements of the form exp( (a+bT) / kT) where I have 3 sets of constants (a,b). The problem I still have to some degree is that if I push the T past some limit in Python (while decreasing T), the eigenvalue routine will not converge. For certain cases it fails near temperatures of interest.

I have been able to do this for the case 3^7 x 3^7 on our cluster, took about 4 hours to run the eigenvalue routine 100 times, iterating over T (my matrix size is 3^odd x 3^odd, where odd means an odd number). I would like to diagonalize up to a 3^15 x 3^15. I live in PA, and so have access to a supercomputer center in Pittsburgh where on some machines the time is free (provided you are accepted). My advisor was just using the machine recently, so it wouldn't take long to get on, but this could be unnecessary?

That said, I think I should take those of you have said to try c or c++. A lot of my friends (grad students) have suggested this as well. I started using Python mainly because other people in my group use it, and the syntax is very easy (also please note that I am already using the LAPACK tools that you can import into Python as well as using double precision). Converting to c++ I doubt is hard...does anyone have any specific suggestions on going about this?

10. Sep 25, 2009

### Phyisab****

python seems to be the standard in modeling protein folding as far as I have seen.