How to solve the larger order eigenproblem

sandf · Apr 20, 2013

I want to solve a generalized symmetric eigenproblem A(N,N)x(N,N) = v(N)B(N,N)x(N,N), N > 500
I try a call in Lapack (see http://www.netlib.org/lapack/double/dsygvd.f)
DSYGVD(1,'V','U',N,A,N,B,N,V,WORK,1+6*N+2*N**2,IWORK,3+5*N,INFO)
define: WORK(1+6*N+2*N**2), IWORK(3+5*N)
when N<100, I can solve the equation successfully.
When N>200, 300, or more, the program will fail with stack overflow.
I try the linux system with large memory (>8GB), and still fail
I guess that 1+6*N+2*N**2 is too large.

How can I solve this problem?

Best regards.
sandf

Solkar · Apr 20, 2013

sandf said:

[...] stack overflow.
I try the linux system with large memory (>8GB), and still fail
I guess that 1+6*N+2*N**2 is too large.

Pls add OS/Kernel and compiler versions to your desc.

No idea about other platform issues, but for Linux I'd guess it's a combination of
-fast (or -O3) optimizations
implicitly trying to put everything stack-side,
and a low default user stack limit.

What's the output of

Code:

ulimit -s

(issued from within bash) on your Linux box?

Regards, Solkar

sandf · Apr 20, 2013

Solkar said:
Pls add OS/Kernel and compiler versions to your desc.
No idea about other platform issues, but for Linux I'd guess it's a combination of
-fast (or -O3) optimizations
implicitly trying to put everything stack-side,
and a low default user stack limit.
What's the output of
Code:
ulimit -s
(issued from within bash) on your Linux box?
Regards, Solkar

Solkar
Thanks for you reply
My OS/Kernel and compiler versions are as follows:
gcc version 4.1.1 20070105 (Red Hat 4.1.1-52) x86_64-redhat-linux
lapack-3.3.1
[root@xxxx ~]# ulimit -s
unlimited
[root@xxxx ~]#
gfortran -o runHeee He_m.F90 He.F90 liblapack.a libblas.a libtmglib.a

output the runHeee program

AlephZero · Apr 20, 2013

Check if the problem is really caused by a badly conditioned eigenproblem. (That shouldn't cause a problem with a library like Lapack, but all software has bugs!)

Try setting either A or B to the identity matrix and what problem size you can handle. If the problem goes away, finding the eigenvalues and vectors of the A and B matrices individually might give you a clue where the problem is.

I would expect you should be OK up to N = 500, maybe N = 1000. Bigger than that, you probably need a different algorithm - and you probably only want to solve for some eigenvalues and vectors, not all of them.

Solkar · Apr 20, 2013

sandf said:

[root@xxxx ~]# ulimit -s
unlimited

So I understand you get the

stack overflow.

also if you run the prog as root, thus with unlimited stack size.

That could well be an important issue for LAPACK and Fortran in general.
That's also important for a project of mine, I will look deeper into that.

For the first, an attempted hotfix:

Instead of e.g.

Code:

integer, parameter :: M = 500
! [...]
real, dimension(M,M) :: A

try

Code:

integer, parameter :: M = 500
! [...]
real, dimension(:,:), allocatable :: A
! [...]
allocate(A(M,M), stat=mstat)

for all the large beasts.

Addendum:
Trying to track down the bug systematically

AlephZero said:

Check if the problem is really caused by a badly conditioned eigenproblem. (That shouldn't cause a problem with a library like Lapack, but all software has bugs!)

Try setting either A or B to the identity matrix and what problem size you can handle. If the problem goes away, finding the eigenvalues and vectors of the A and B matrices individually might give you a clue where the problem is.

is also an important aspect and good advice.

sandf · Apr 20, 2013

Dear Solkar and AlephZero,
Great thanks for your help.
using 'allocatable', I solve the issue of allocation of a large array because the mstate has a return value of zero.
But still fail to solve the Ax=vBx, and the DSYGVD return a nozero value of INFO.
The possible reason is that my B matrix is non-positive definite.
So, as mentioned by AlephZero, maybe I should use a different algorithm.

Solkar · Apr 20, 2013

sandf said:

Dear Solkar and AlephZero,
Great thanks for your help.

You're welcome.

sandf said:

using 'allocatable', I solve the issue of allocation of a large array because the mstate has a return value of zero. But still fail to solve the Ax=vBx, and the DSYGVD return a nozero value of INFO.
The possible reason is that my B matrix is non-positive definite.

Middle of next week I will find some time for trying to reproduce that stack error, and if I "succeed" I would propose filing a bug report with both the gfortran and the netlib/LAPACK team.

Do I get you right, that now, using allocatable/allocate, you get a "proper" DSYGVD error delivered instead of the previous stop with stack error?
Or are that two different topics?

Solkar

sandf · Apr 20, 2013

Solkar said:

Do I get you right, that now, using allocatable/allocate, you get a "proper" DSYGVD error delivered instead of the previous stop with stack error?
Or are that two different topics?
Solkar

I can think it as two different topics now.
If I do not use the allocatable/allocate, it will stop with stack overflow when meeting the declared line [e.g. real*8 :: WORK(1+6*N+2*N**2)] when N is too large.
If I use the allocatable, it will pass the declared line with a 'mstat' value of zero for a large N
But DSYGVD will fail with a INFO value of non-zero.

The program is very simple, I upload it. If you are interested in it. You can download it and compile it with your own lapack. (lapack is too large, I can not upload)
Notes: I use Windows version gfortran obtained from http://gcc.gnu.org/wiki/GFortranBinaries, (http://users.humboldt.edu/finneyb/gfortran-windows-20121210.exe , now is 20130301.exe)
and in Linux system, I have the same issue.
Bests,
sandf

Solkar · Apr 21, 2013

spooky...

Even if I revert your He_m.F90 back to

Code:

        integer, parameter :: N = 500 ! 120
        ! real*8, allocatable :: ECGWORK(:)
        ! real*8, allocatable :: ECGIWORK(:)
        real*8 :: ECGWORK(1+6*N+2*N**2)
        real*8 :: ECGIWORK(3+5*N)

and increase the N param as shown (and comment out the alloc/dealloc in the main module, of course), I neither get a stack error with current lapack-3.4.2 nor your version lapack-3.3.1, but only an

Code:

 INFO =         586

Did you customize the LAPACK CMake-build by any means?

I simply untgz'ed, changed to to the lapack source dir, md'ed a "build" dir, changed there and issued

Code:

cmake ..
make

and copied the generated libs to the source dir of your package.

What I've not yet done is adopting your compiler versions, mine are

Code:

gcc (Debian 4.4.5-8) 4.4.5
GNU Fortran (Debian 4.4.5-8) 4.4.5

I'll get access to a different box for trying wilh older gcc & gfortran middle of next week and try again.

Regards, Solkar

sandf · Apr 22, 2013

Dear Solkar,
Thanks for your try.
I have not customized the LAPACK CMake-build by any means.
Windows version LAPACK, I downloaded it from http://icl.cs.utk.edu/lapack-for-windows/lapack/ based on CMake build.

Stack overflow indeed puzzles me. Please let it alone.
Main issue is that INFO !=0 when N is large.
and DSYGVD does not return any other information which will help me to solve the problem.

Best regards.
sandf

Solkar · Apr 22, 2013

sandf said:

Stack overflow indeed puzzles me.

That's the main topic for me.

sandf said:

Please let it alone.

Will you pls drop me a note when your reproduced it successfully and/or tracked down the bug?

sandf said:

Main issue is that INFO !=0 when N is large.and DSYGVD does not return any other information which will help me to solve the problem.

http://www.netlib.org/lapack/double/dsygvd.f

[INFO] > N: if INFO = N + i, for 1 <= i <= N, then the leading minor of order i of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

is applicable, so if we get e.g. INFO = 586 for N = 500 the left upper 85x85 submatrix seems to be o.k. and row or column #86 (counting from 1) seems to be problematic. Is there any voodoo in your code about these certain numbers relative to that certain choice of N?

sandf · Apr 22, 2013

Dear Solkar,

Solkar said:

Will you pls drop me a note when your reproduced it successfully and/or tracked down the bug?

I try to reproduce it. It possibly has something to do with the use of 'parameter'. not 'allocatable'
But even if I do not use "parameter", and N<100, the program will also work fine.

I reproduce under the Intel.Visual.Fortran.Compiler.v11 (Evaluation from intel) and MS visual studio 2008 (student https://www.dreamspark.com/)
I remember I also have the same issure for windows gfortran. but unfortunately I cannot reproduce it now.

I am very sorry if this issue only arises from my erroneous use of parameter or difference of compiler.

1, do not use 'parameter'
In He.F90

Code:

      program He
        use He_m
        implicit none
        N = 300
        call sVM
      end program He
      
      subroutine sVM
        use He_m
        implicit none
        integer :: INFO,I,J,L,mstat
        real*8 ::  X(12),SumH,SumS,SumSS,A1,A2,B1,B2,C1,C2
        real*8 :: ECGS(N,N),STMP(N,N),ECGH(N,N),ENVal(N),ENVec(N,N),PHI(N,3)
        real*8 :: SSSS(N,N)
        
        allocate(ECGWORK(1+6*N+2*N**2),stat=mstat)
        allocate(ECGIWORK(3+5*N))

and In He_m.F90

Code:

module He_m
        ! integer*4 :: N  ! Number of terms of basis set
        integer :: N
        ! integer, parameter :: N = 129
        real*8, allocatable :: ECGWORK(:)
        real*8, allocatable :: ECGIWORK(:)
        
      end module He_m

2, use 'parameter'
In He.F90

Code:

      program He
        use He_m
        implicit none
        ! N = 100
        call sVM
      end program He
      
      subroutine sVM
        use He_m
        implicit none
        integer :: INFO,I,J,L,mstat
        real*8 ::  X(12),SumH,SumS,SumSS,A1,A2,B1,B2,C1,C2
        real*8 :: ECGS(N,N),STMP(N,N),ECGH(N,N),ENVal(N),ENVec(N,N),PHI(N,3)
        real*8 :: SSSS(N,N)
        
        allocate(ECGWORK(1+6*N+2*N**2),stat=mstat)
        allocate(ECGIWORK(3+5*N))

and In He_m.F90

Code:

      module He_m
        ! integer*4 :: N  ! Number of terms of basis set
        integer, parameter :: N = 300
        real*8, allocatable :: ECGWORK(:)
        real*8, allocatable :: ECGIWORK(:)
      end module He_m

http://www.netlib.org/lapack/double/dsygvd.f
is applicable, so if we get e.g. INFO = 586 for N = 500 the left upper 85x85 submatrix seems to be o.k. and row or column #86 (counting from 1) seems to be problematic. Is there any voodoo in your code about these certain numbers relative to that certain choice of N?

Thanks for your detail. I will do much work on it.

Best regards.
sandf

How to solve the larger order eigenproblem

Attachments

1. What is the larger order eigenproblem?

2. Why is solving the larger order eigenproblem difficult?

3. What methods are commonly used to solve the larger order eigenproblem?

4. How can I verify the accuracy of my solutions to the larger order eigenproblem?

5. What are some practical applications of solving the larger order eigenproblem?

Similar threads

Hot Threads

Recent Insights