I want to solve a generalized symmetric eigenproblem A(N,N)x(N,N) = v(N)B(N,N)x(N,N), N > 500 I try a call in Lapack (see http://www.netlib.org/lapack/double/dsygvd.f) DSYGVD(1,'V','U',N,A,N,B,N,V,WORK,1+6*N+2*N**2,IWORK,3+5*N,INFO) define: WORK(1+6*N+2*N**2), IWORK(3+5*N) when N<100, I can solve the equation successfully. When N>200, 300, or more, the program will fail with stack overflow. I try the linux system with large memory (>8GB), and still fail I guess that 1+6*N+2*N**2 is too large. How can I solve this problem? Best regards. sandf
Pls add OS/Kernel and compiler versions to your desc. No idea about other platform issues, but for Linux I'd guess it's a combination of -fast (or -O3) optimizations implicitly trying to put everything stack-side, and a low default user stack limit. What's the output of Code (Text): ulimit -s (issued from within bash) on your Linux box? Regards, Solkar
Solkar Thanks for you reply My OS/Kernel and compiler versions are as follows: gcc version 4.1.1 20070105 (Red Hat 4.1.1-52) x86_64-redhat-linux lapack-3.3.1 [root@xxxx ~]# ulimit -s unlimited [root@xxxx ~]# gfortran -o runHeee He_m.F90 He.F90 liblapack.a libblas.a libtmglib.a output the runHeee program
Check if the problem is really caused by a badly conditioned eigenproblem. (That shouldn't cause a problem with a library like Lapack, but all software has bugs!) Try setting either A or B to the identity matrix and what problem size you can handle. If the problem goes away, finding the eigenvalues and vectors of the A and B matrices individually might give you a clue where the problem is. I would expect you should be OK up to N = 500, maybe N = 1000. Bigger than that, you probably need a different algorithm - and you probably only want to solve for some eigenvalues and vectors, not all of them.
So I understand you get the also if you run the prog as root, thus with unlimited stack size. That could well be an important issue for LAPACK and Fortran in general. That's also important for a project of mine, I will look deeper into that. For the first, an attempted hotfix: Instead of e.g. Code (Text): integer, parameter :: M = 500 ! [...] real, dimension(M,M) :: A try Code (Text): integer, parameter :: M = 500 ! [...] real, dimension(:,:), allocatable :: A ! [...] allocate(A(M,M), stat=mstat) for all the large beasts. Addendum: Trying to track down the bug systematically is also an important aspect and good advice.
Dear Solkar and AlephZero, Great thanks for your help. using 'allocatable', I solve the issue of allocation of a large array because the mstate has a return value of zero. But still fail to solve the Ax=vBx, and the DSYGVD return a nozero value of INFO. The possible reason is that my B matrix is non-positive definite. So, as mentioned by AlephZero, maybe I should use a different algorithm.
You're welcome. Middle of next week I will find some time for trying to reproduce that stack error, and if I "succeed" I would propose filing a bug report with both the gfortran and the netlib/LAPACK team. Do I get you right, that now, using allocatable/allocate, you get a "proper" DSYGVD error delivered instead of the previous stop with stack error? Or are that two different topics? Best regards, Solkar
I can think it as two different topics now. If I do not use the allocatable/allocate, it will stop with stack overflow when meeting the declared line [e.g. real*8 :: WORK(1+6*N+2*N**2)] when N is too large. If I use the allocatable, it will pass the declared line with a 'mstat' value of zero for a large N But DSYGVD will fail with a INFO value of non-zero. The program is very simple, I upload it. If you are interested in it. You can download it and compile it with your own lapack. (lapack is too large, I can not upload) Notes: I use Windows version gfortran obtained from http://gcc.gnu.org/wiki/GFortranBinaries, (http://users.humboldt.edu/finneyb/gfortran-windows-20121210.exe, now is 20130301.exe) and in Linux system, I have the same issue. Bests, sandf
spooky... Even if I revert your He_m.F90 back to Code (Text): integer, parameter :: N = 500 ! 120 ! real*8, allocatable :: ECGWORK(:) ! real*8, allocatable :: ECGIWORK(:) real*8 :: ECGWORK(1+6*N+2*N**2) real*8 :: ECGIWORK(3+5*N) and increase the N param as shown (and comment out the alloc/dealloc in the main module, of course), I neither get a stack error with current lapack-3.4.2 nor your version lapack-3.3.1, but only an Code (Text): INFO = 586 Did you customize the LAPACK CMake-build by any means? I simply untgz'ed, changed to to the lapack source dir, md'ed a "build" dir, changed there and issued Code (Text): cmake .. make and copied the generated libs to the source dir of your package. What I've not yet done is adopting your compiler versions, mine are Code (Text): gcc (Debian 4.4.5-8) 4.4.5 GNU Fortran (Debian 4.4.5-8) 4.4.5 I'll get access to a different box for trying wilh older gcc & gfortran middle of next week and try again. Regards, Solkar
Dear Solkar, Thanks for your try. I have not customized the LAPACK CMake-build by any means. Windows version LAPACK, I downloaded it from http://icl.cs.utk.edu/lapack-for-windows/lapack/ based on CMake build. Stack overflow indeed puzzles me. Please let it alone. Main issue is that INFO !=0 when N is large. and DSYGVD does not return any other information which will help me to solve the problem. Best regards. sandf
That's the main topic for me. Will you pls drop me a note when your reproduced it successfully and/or tracked down the bug? http://www.netlib.org/lapack/double/dsygvd.f is applicable, so if we get e.g. INFO = 586 for N = 500 the left upper 85x85 submatrix seems to be o.k. and row or column #86 (counting from 1) seems to be problematic. Is there any voodoo in your code about these certain numbers relative to that certain choice of N?
Dear Solkar, I try to reproduce it. It possibly has something to do with the use of 'parameter'. not 'allocatable' But even if I do not use "parameter", and N<100, the program will also work fine. I reproduce under the Intel.Visual.Fortran.Compiler.v11 (Evaluation from intel) and MS visual studio 2008 (student https://www.dreamspark.com/) I remember I also have the same issure for windows gfortran. but unfortunately I cannot reproduce it now. I am very sorry if this issue only arises from my erroneous use of parameter or difference of compiler. 1, do not use 'parameter' In He.F90 Code (Text): program He use He_m implicit none N = 300 call sVM end program He subroutine sVM use He_m implicit none integer :: INFO,I,J,L,mstat real*8 :: X(12),SumH,SumS,SumSS,A1,A2,B1,B2,C1,C2 real*8 :: ECGS(N,N),STMP(N,N),ECGH(N,N),ENVal(N),ENVec(N,N),PHI(N,3) real*8 :: SSSS(N,N) allocate(ECGWORK(1+6*N+2*N**2),stat=mstat) allocate(ECGIWORK(3+5*N)) and In He_m.F90 Code (Text): module He_m ! integer*4 :: N ! Number of terms of basis set integer :: N ! integer, parameter :: N = 129 real*8, allocatable :: ECGWORK(:) real*8, allocatable :: ECGIWORK(:) end module He_m 2, use 'parameter' In He.F90 Code (Text): program He use He_m implicit none ! N = 100 call sVM end program He subroutine sVM use He_m implicit none integer :: INFO,I,J,L,mstat real*8 :: X(12),SumH,SumS,SumSS,A1,A2,B1,B2,C1,C2 real*8 :: ECGS(N,N),STMP(N,N),ECGH(N,N),ENVal(N),ENVec(N,N),PHI(N,3) real*8 :: SSSS(N,N) allocate(ECGWORK(1+6*N+2*N**2),stat=mstat) allocate(ECGIWORK(3+5*N)) and In He_m.F90 Code (Text): module He_m ! integer*4 :: N ! Number of terms of basis set integer, parameter :: N = 300 real*8, allocatable :: ECGWORK(:) real*8, allocatable :: ECGIWORK(:) end module He_m Thanks for your detail. I will do much work on it. Best regards. sandf