Diagonalization of large non-sparse matrices

Click For Summary

Discussion Overview

The discussion revolves around the diagonalization of large non-sparse matrices in the context of a Potts model used for studying protein folding. Participants explore computational challenges, particularly related to the efficiency of eigenvalue calculations as the temperature approaches zero.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • John Schreck describes the need to diagonalize a transfer matrix of dimensions 3^(odd number) x 3^(odd number) for various temperatures, facing significant computation time and eigenvalue overflow issues as temperature approaches zero.
  • Some participants suggest using faster programming languages like C++ or MATLAB instead of Python for better performance in diagonalization tasks.
  • There is a consensus among some that 3^7 x 3^7 matrices are relatively small and should be manageable with standard diagonalization techniques in more efficient programming environments.
  • Concerns are raised about the numerical stability of eigenvalue calculations as temperature approaches absolute zero, particularly due to the nature of the Boltzmann factors involved.
  • Participants discuss the implications of pushing temperature towards zero, with some questioning the physical relevance of such low temperatures in protein folding studies.
  • John mentions the potential for using supercomputing resources to handle larger matrices, expressing interest in transitioning to C or C++ for improved performance.
  • There are references to specific libraries and techniques that could enhance computational efficiency, including LAPACK and optimized routines.

Areas of Agreement / Disagreement

Participants generally agree that Python may not be the most efficient choice for this type of computation, and there is a shared concern about the numerical issues arising from low temperatures. However, there is no consensus on the best approach to take moving forward, as suggestions vary widely.

Contextual Notes

Limitations include the dependence on the choice of programming language and libraries, as well as unresolved issues related to the convergence of eigenvalue routines at low temperatures. The discussion also highlights the complexity of the mathematical expressions involved in the calculations.

Who May Find This Useful

This discussion may be useful for researchers and students involved in computational physics, particularly those working on protein folding models or similar eigenvalue problems in large matrices.

jsschreck
Messages
1
Reaction score
0
Dear physics friends:

I am using a Potts model to study protein folding. In short, the partition function of the problem is written as the sum of the eigenvalues of the transfer matrix each to the Nth power (the transfer matrix factors the expression exp(H/Temperature) where H is the Hamiltonian of the problem). The dimension of the transfer matrix is 3^(odd number) x 3^(odd number). I wrote a small code in Python, and I can diagonalize up to 3^5 x 3^5 (plotting 500 points now takes about 3 hours on a 64 bit machine). I am looking for ways to speed up the calculation, as I need to get at least up to the 3^7 x 3^7 case. The matrix in question has all elements greater than zero, and of the form exp(stuff / temperature). What I am doing is diagonalizing the matrix for various temperatures (eg, compute e-values, shift the temp. slightly, compute, shift, compute, etc). Most of the problems come from pushing Temp -> 0. I can get down to about 0.05 for the 27 x 27 case before eigenvalue overflow problems. I am using the routine eig() in python. Getting close to zero isn't that essential, its the compilation time that is stalling me. Any suggestions on making computation of this problem faster/overall strategy of attack??

John Schreck
Drexel University
 
Physics news on Phys.org
Is there any particular reason for you using python?

I don't have any experience with the linalg module that implements this, but i was under the impression that MATLAB or c++ is pretty fast with eigenvalue/vector solutions and diagonalization... you might want to give that a shot, but I'm not sure what other dependencies you have here with the actual folding model
 
3^7 x 3^7 is a VERY small matrix. Pretty much any standard diagonalization technique implemented in something like c++ or fortran should be really fast.
 
maverick_starstrider said:
3^7 x 3^7 is a VERY small matrix. Pretty much any standard diagonalization technique implemented in something like c++ or fortran should be really fast.

I would not think of using Python to handle that large of a matrix, Python is woefully slow in my experience with it for atomic scale simulations.

Once you get above 5000x5000 unknowns you may want to look into more intelligent eigenvalue solvers. A direct way is usually order N3, but a better solver will be N log N. I was just reading a paper that had to solve eigenvalue problems for a large matrices and they referenced:

Z. Zhu, B. Song, and J. White in Proceedings of the Deign Automation Conference (Anaheim, CA, 2003), pp. 712-717.
V. Rokhlin, Journal of Computational Physics 60, 187 (1985).
W. Hackbusch and Z. P. Nowak, Numerische Mathematik 54, 463 (1989)

Personally, I have been using the Intel MKL libraries and have solved 7000x7000 eigenvalue problems in less than a minute or so I think. These are just optimized versions of the Lapack routines. So I do not think you need to move up to better solvers just yet, but I do think you should try using a faster language and look at the Lapack library.

Though the question remains, why are you diagonalizing the matrix? Do you want the eigenvalues or some other end product? Because if you are diagonalizing as an intermediate step, it is generally better to look for algorithms that are designed for your desired end product.
 
jsschreck said:
Dear physics friends:

Most of the problems come from pushing Temp -> 0. I can get down to about 0.05 for the 27 x 27 case before eigenvalue overflow problems. I am using the routine eig() in python. Getting close to zero isn't that essential, its the compilation time that is stalling me. Any suggestions on making computation of this problem faster/overall strategy of attack??

John Schreck
Drexel University

What do you mean by pushing temperature: Temp -> 0? Of course you are going to have problems since I doubt you mean absolute zero. Try using degrees Kelvin (0 C = 273 K).
 
SW VandeCarr said:
What do you mean by pushing temperature: Temp -> 0? Of course you are going to have problems since I doubt you mean absolute zero. Try using degrees Kelvin (0 C = 273 K).

No they mean absolute zero. It's a Boltzmann distribution like that used in Monte-Carlo. Anywho, now that I've actually READ your post (I only glanced at it before) it doesn't actually sound like it has anything to do with processing time. Your eigenvalues sound like they're blowing up because your Boltzmann factors exp(stuff/temp) are blowing up. I assume you're using double precision. How low are you hoping to go in terms of temperature?
 
SW VandeCarr said:
Well, if he's studying protein folding, I don't think he's going to see much action below 268K.

jsstreck

The lowest temperatures I've found in studies of protein folding is -10C (263K). I could see temperatures as low as -20C; but this is still a long way from absolute zero. Perhaps you are trying something new. I've been away from this for a while. If there's something to be gained from ultra-cold experiments, I'd like to hear about it. I haven't turned up anything on a web search, but I may not be using the right search terms.

http://www.whatislife.com/reader/motion/motion.html see Experimental Procedure under 'Protein Folding' 2nd, paragraph

EDIT: In any case, thermodynamic entropy goes to infinity at absolute zero.
 
Last edited:
Sorry it took me forever to respond, I forgot to ask for emails when people post replies!

Anywho, some folks on here have suggested not seeing much action below ~270K, this is correct (to my knowledge). I am concerned more about numerical problems that result from having all matrix elements of the form \exp(E / kT ) .. where T -> 0 (when you set k_Boltzmann = 1), some other number in the 200s if I plug in k or use the universal gas constant, R (obviously you have to rescale the energy, which experimentally we found is a monotonically increasing function of T). Ill spare the details, but numerically I am dealing with a matrix full of products elements of the form exp( (a+bT) / kT) where I have 3 sets of constants (a,b). The problem I still have to some degree is that if I push the T past some limit in Python (while decreasing T), the eigenvalue routine will not converge. For certain cases it fails near temperatures of interest.

I have been able to do this for the case 3^7 x 3^7 on our cluster, took about 4 hours to run the eigenvalue routine 100 times, iterating over T (my matrix size is 3^odd x 3^odd, where odd means an odd number). I would like to diagonalize up to a 3^15 x 3^15. I live in PA, and so have access to a supercomputer center in Pittsburgh where on some machines the time is free (provided you are accepted). My advisor was just using the machine recently, so it wouldn't take long to get on, but this could be unnecessary?

That said, I think I should take those of you have said to try c or c++. A lot of my friends (grad students) have suggested this as well. I started using Python mainly because other people in my group use it, and the syntax is very easy (also please note that I am already using the LAPACK tools that you can import into Python as well as using double precision). Converting to c++ I doubt is hard...does anyone have any specific suggestions on going about this?
 
  • #10
python seems to be the standard in modeling protein folding as far as I have seen.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K