How to solve the larger order eigenproblem

  1. I want to solve a generalized symmetric eigenproblem A(N,N)x(N,N) = v(N)B(N,N)x(N,N), N > 500
    I try a call in Lapack (see http://www.netlib.org/lapack/double/dsygvd.f)
    DSYGVD(1,'V','U',N,A,N,B,N,V,WORK,1+6*N+2*N**2,IWORK,3+5*N,INFO)
    define: WORK(1+6*N+2*N**2), IWORK(3+5*N)
    when N<100, I can solve the equation successfully.
    When N>200, 300, or more, the program will fail with stack overflow.
    I try the linux system with large memory (>8GB), and still fail
    I guess that 1+6*N+2*N**2 is too large.

    How can I solve this problem?

    Best regards.
    sandf
     
  2. jcsd
  3. Pls add OS/Kernel and compiler versions to your desc.

    No idea about other platform issues, but for Linux I'd guess it's a combination of
    -fast (or -O3) optimizations
    implicitly trying to put everything stack-side,
    and a low default user stack limit.

    What's the output of
    Code (Text):
    ulimit -s
    (issued from within bash) on your Linux box?

    Regards, Solkar
     
  4. Solkar
    Thanks for you reply
    My OS/Kernel and compiler versions are as follows:
    gcc version 4.1.1 20070105 (Red Hat 4.1.1-52) x86_64-redhat-linux
    lapack-3.3.1
    [root@xxxx ~]# ulimit -s
    unlimited
    [root@xxxx ~]#
    gfortran -o runHeee He_m.F90 He.F90 liblapack.a libblas.a libtmglib.a

    output the runHeee program
     
  5. AlephZero

    AlephZero 7,300
    Science Advisor
    Homework Helper

    Check if the problem is really caused by a badly conditioned eigenproblem. (That shouldn't cause a problem with a library like Lapack, but all software has bugs!)

    Try setting either A or B to the identity matrix and what problem size you can handle. If the problem goes away, finding the eigenvalues and vectors of the A and B matrices individually might give you a clue where the problem is.

    I would expect you should be OK up to N = 500, maybe N = 1000. Bigger than that, you probably need a different algorithm - and you probably only want to solve for some eigenvalues and vectors, not all of them.
     
  6. So I understand you get the
    also if you run the prog as root, thus with unlimited stack size.

    That could well be an important issue for LAPACK and Fortran in general.
    That's also important for a project of mine, I will look deeper into that.

    For the first, an attempted hotfix:

    Instead of e.g.
    Code (Text):
    integer, parameter :: M = 500
    ! [...]
    real, dimension(M,M) :: A
    try
    Code (Text):
    integer, parameter :: M = 500
    ! [...]
    real, dimension(:,:), allocatable :: A
    ! [...]
    allocate(A(M,M), stat=mstat)
    for all the large beasts.

    Addendum:
    Trying to track down the bug systematically
    is also an important aspect and good advice.
     
  7. Dear Solkar and AlephZero,
    Great thanks for your help.
    using 'allocatable', I solve the issue of allocation of a large array because the mstate has a return value of zero.
    But still fail to solve the Ax=vBx, and the DSYGVD return a nozero value of INFO.
    The possible reason is that my B matrix is non-positive definite.
    So, as mentioned by AlephZero, maybe I should use a different algorithm.
     
  8. You're welcome.

    Middle of next week I will find some time for trying to reproduce that stack error, and if I "succeed" I would propose filing a bug report with both the gfortran and the netlib/LAPACK team.

    Do I get you right, that now, using allocatable/allocate, you get a "proper" DSYGVD error delivered instead of the previous stop with stack error?
    Or are that two different topics?


    Best regards, Solkar
     
  9. I can think it as two different topics now.
    If I do not use the allocatable/allocate, it will stop with stack overflow when meeting the declared line [e.g. real*8 :: WORK(1+6*N+2*N**2)] when N is too large.
    If I use the allocatable, it will pass the declared line with a 'mstat' value of zero for a large N
    But DSYGVD will fail with a INFO value of non-zero.

    The program is very simple, I upload it. If you are interested in it. You can download it and compile it with your own lapack. (lapack is too large, I can not upload)
    Notes: I use Windows version gfortran obtained from http://gcc.gnu.org/wiki/GFortranBinaries, (http://users.humboldt.edu/finneyb/gfortran-windows-20121210.exe, now is 20130301.exe)
    and in Linux system, I have the same issue.
    Bests,
    sandf
     

    Attached Files:

  10. spooky...

    Even if I revert your He_m.F90 back to
    Code (Text):
            integer, parameter :: N = 500 ! 120
            ! real*8, allocatable :: ECGWORK(:)
            ! real*8, allocatable :: ECGIWORK(:)
            real*8 :: ECGWORK(1+6*N+2*N**2)
            real*8 :: ECGIWORK(3+5*N)
    and increase the N param as shown (and comment out the alloc/dealloc in the main module, of course), I neither get a stack error with current lapack-3.4.2 nor your version lapack-3.3.1, but only an

    Code (Text):
     INFO =         586
    Did you customize the LAPACK CMake-build by any means?

    I simply untgz'ed, changed to to the lapack source dir, md'ed a "build" dir, changed there and issued
    Code (Text):
    cmake ..
    make
    and copied the generated libs to the source dir of your package.

    What I've not yet done is adopting your compiler versions, mine are
    Code (Text):
    gcc (Debian 4.4.5-8) 4.4.5
    GNU Fortran (Debian 4.4.5-8) 4.4.5
    I'll get access to a different box for trying wilh older gcc & gfortran middle of next week and try again.

    Regards, Solkar
     
  11. Dear Solkar,
    Thanks for your try.
    I have not customized the LAPACK CMake-build by any means.
    Windows version LAPACK, I downloaded it from http://icl.cs.utk.edu/lapack-for-windows/lapack/ based on CMake build.

    Stack overflow indeed puzzles me. Please let it alone.
    Main issue is that INFO !=0 when N is large.
    and DSYGVD does not return any other information which will help me to solve the problem.

    Best regards.
    sandf
     
  12. That's the main topic for me.

    Will you pls drop me a note when your reproduced it successfully and/or tracked down the bug?


    http://www.netlib.org/lapack/double/dsygvd.f
    is applicable, so if we get e.g. INFO = 586 for N = 500 the left upper 85x85 submatrix seems to be o.k. and row or column #86 (counting from 1) seems to be problematic. Is there any voodoo in your code about these certain numbers relative to that certain choice of N?
     
  13. Dear Solkar,
    I try to reproduce it. It possibly has something to do with the use of 'parameter'. not 'allocatable'
    But even if I do not use "parameter", and N<100, the program will also work fine.

    I reproduce under the Intel.Visual.Fortran.Compiler.v11 (Evaluation from intel) and MS visual studio 2008 (student https://www.dreamspark.com/)
    I remember I also have the same issure for windows gfortran. but unfortunately I cannot reproduce it now.

    I am very sorry if this issue only arises from my erroneous use of parameter or difference of compiler.

    1, do not use 'parameter'
    In He.F90
    Code (Text):
         
          program He
            use He_m
            implicit none
            N = 300
            call sVM
          end program He
         
          subroutine sVM
            use He_m
            implicit none
            integer :: INFO,I,J,L,mstat
            real*8 ::  X(12),SumH,SumS,SumSS,A1,A2,B1,B2,C1,C2
            real*8 :: ECGS(N,N),STMP(N,N),ECGH(N,N),ENVal(N),ENVec(N,N),PHI(N,3)
            real*8 :: SSSS(N,N)
           
            allocate(ECGWORK(1+6*N+2*N**2),stat=mstat)
            allocate(ECGIWORK(3+5*N))
    and In He_m.F90
    Code (Text):
    module He_m
            ! integer*4 :: N  ! Number of terms of basis set
            integer :: N
            ! integer, parameter :: N = 129
            real*8, allocatable :: ECGWORK(:)
            real*8, allocatable :: ECGIWORK(:)
           
          end module He_m
    2, use 'parameter'
    In He.F90
    Code (Text):
          program He
            use He_m
            implicit none
            ! N = 100
            call sVM
          end program He
         
          subroutine sVM
            use He_m
            implicit none
            integer :: INFO,I,J,L,mstat
            real*8 ::  X(12),SumH,SumS,SumSS,A1,A2,B1,B2,C1,C2
            real*8 :: ECGS(N,N),STMP(N,N),ECGH(N,N),ENVal(N),ENVec(N,N),PHI(N,3)
            real*8 :: SSSS(N,N)
           
            allocate(ECGWORK(1+6*N+2*N**2),stat=mstat)
            allocate(ECGIWORK(3+5*N))
    and In He_m.F90
    Code (Text):

          module He_m
            ! integer*4 :: N  ! Number of terms of basis set
            integer, parameter :: N = 300
            real*8, allocatable :: ECGWORK(:)
            real*8, allocatable :: ECGIWORK(:)
          end module He_m
    Thanks for your detail. I will do much work on it.

    Best regards.
    sandf
     
Know someone interested in this topic? Share a link to this question via email, Google+, Twitter, or Facebook

Have something to add?

0
Draft saved Draft deleted