Localizing a Fortran Error - Get the Position of the Error

In summary: I'm modifying an already existing code, so I'm really not in full control of what's happening all the way. At least I've checked that there's no division by zero directly in the parts I've added, but the code is quite complex, with a lot of going back and forth between different files doing different stuff. Probably there's something I've missed to assign a value too, or something similar.In summary, the error was located in a code module that was being modified and the error was caused by a division by zero.
  • #1

EL

Science Advisor
558
0
Being a newbie on Fortran I need some help with localizing an error.
When I run my executable I get the message:
forrtl: error (73): floating divide by zero
I get that somewhere in my code there's a division by zero, which obviously will not be accepted, but my question is where?
Is there a way to get the position of the error pointed out?
 
Technology news on Phys.org
  • #2
Are you getting a core file? If so, open it up with gdb/other debugger; the state of the stack might give you some idea where it's barfing.

Is the error always in the same place? If it's not a huge run, maybe you can run under gdb or another debugger until it fails.

There's always printlining.

These will help you locate where the div by 0 happens; finding where the zero gets injected is tougher.

If you're running IEEE 754 shouldn't you be able to register your own error handlers? It's not something I've ever done so I can't give you details, but it might be doable.
 
  • #3
EL said:
Being a newbie on Fortran I need some help with localizing an error.
When I run my executable I get the message:

I get that somewhere in my code there's a division by zero, which obviously will not be accepted, but my question is where?
Is there a way to get the position of the error pointed out?
It's not on line 73, is it?
 
  • #4
nmtim said:
Are you getting a core file? If so, open it up with gdb/other debugger; the state of the stack might give you some idea where it's barfing.

Thanks, I got the same advice from a colleague earlier today, and it pointed out the command where it all went wrong. I'm just about to try to locate in which file this "command" is.
Btw I really hate Linux :devil: :biggrin: (For example, how logical is it to use a command called "grep" to do a search?...)
 
  • #5
0rthodontist said:
It's not on line 73, is it?
No, at first I thought something like that too. But I think (73) is just a code for "dividing by zero".
 
  • #6
0rthodontist said:
It's not on line 73, is it?

That seems the most likely candidate. But EL, if it's not 73, then are there any obvious candidates for a math divide by zero? Especially if it's a floating point complaint, it sounds like it's in a math operation. How many places are you doing floating point math divides? It could be from some other error, but the math sections would be the first to look at, it would seem. Also, can you control the execution with different data sets? What are the input and output data sets for this program?
 
  • #7
Here is a good reason for sub-programs -- it makes it easier to cut it down. Can you put your calculations into some modules and then see which sub causes the error?
 
  • #8
It might not always be as obvious as it may seem. For example, raising 0 to a negative power almost seems innocent.
 
Last edited:
  • #9
berkeman said:
That seems the most likely candidate. But EL, if it's not 73, then are there any obvious candidates for a math divide by zero? Especially if it's a floating point complaint, it sounds like it's in a math operation. How many places are you doing floating point math divides? It could be from some other error, but the math sections would be the first to look at, it would seem. Also, can you control the execution with different data sets? What are the input and output data sets for this program?
No, as said, 73 has nothing to do with any line number.
I'm modifying an already existing code, so I'm really not in full control of what's happening all the way. At least I've checked that there's no division by zero directly in the parts I've added, but the code is quite complex, with a lot of going back and forth between different files doing different stuff. Probably there's something I've missed to assign a value too, or something similar.
Anyway, I'm quite sure nmtim's advice is the right way to go. It gave me the name of the object causing the problem, so now I just have to find where (and what) it is ...
However, I'll have to wait for tomorrow since I discovered I couldn't access my files from home.
 
  • #10
If this is a run-time error, the best thing you could do is identify the position of the error. Seclude the area it is zero-dividing at by outputting a flag at several points in your program. See what the last flag it outputs is, and then place several more flags between that and the next flag. Keep repeating this until you have ioslated the position of the zero-division, then perhaps you can see what expression(s) are the cause of the error.
 
Last edited:
  • #11
I have now found the subroutine "ffset" which gdb pointed out as the source of the error:
**********************************************************

* This silly subroutine is called from ffini while determining
* the working precision of the machine we're running on.
* It works around the optimizer to guarantee that we're not in
* fact determining the precision of the FPU registers.

subroutine ffset(res, x)
implicit none
DOUBLE PRECISION res, x

res = x
end

**********************************************************
I cannot see where there could be any division by zero?
 
  • #12
Btw, what does "inplicit none" do?
 
  • #13
May it be that the "source" of the error is where the initiation of the variables involved took place? In other words, it would be after this called function, wherever res is used.

Output each value before and after the call to ffset takes place. See what's happening.

"implicit none" is an extention to Fortran 77, acceptable by most compilers. The command defines the variable declaration to be explicitely defined only. In other words, if the compiler finds a variable name that has not been defined, it will throw up your trash appropriately. Otherwise, its behaviour is undefined. It's there for your benefit.
 
Last edited:
  • #14
Sane said:
May it be that the "source" of the error is where the initiation of the variables involved took place? In other words, it would be after this called function, wherever res is used.
I appreciate your help Sane.
Here's the part of the code where ffset is called:
*
* the precision to which real calculations are done is
*
precx = 1
sold = 0
do 1 i=1,1000
precx = precx/2
call ffset(s, 1 + precx)
s = exp(log(s))
if ( s .eq. sold ) goto 2
sold = s
1 continue
2 continue
precx = precx*8
* (take three bits for safety)

*
* the precision to which complex calculations are done is
*
precc = 1
sold = 0
do 3 i=1,1000
precc = precc/2
call ffset(s, 1 + precc)
cs = exp(log(DCMPLX(s)))
if ( DBLE(cs) .eq. sold ) goto 4
sold = DBLE(cs)
3 continue
4 continue
precc = precc*8
* (take three bits for safety)

*
* for efficiency take them equal if they are not too different
*
if ( precx/precc .lt. 4 .and. precx/precc .gt. .25 ) then
precx = max(precc,precx)
precc = max(precc,precx)
endif
*
* and the minimum value the logarithm accepts without complaining
* about arguments zero is (DOUBLE PRECISION cq DOUBLE COMPLEX)
*
s = 1
xalogm = 1
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
6 continue
if ( xalogm.eq.0 ) xalogm = 1d-307

s = 1
xclogm = abs(DCMPLX(s))
do 7 i=1,10000
call ffset(s, s/2)
if ( 2*abs(DCMPLX(s)) .ne. xclogm ) goto 8
xclogm = abs(DCMPLX(s))
7 continue
8 continue
if ( xclogm.eq.0 ) xclogm = 1d-307
Can you spot any potential "zeros"?
 
Last edited:
  • #15
I'm almost positive it has to do with what's happening right after the call to ffset. There seems to be some funny stuff going on. I can't exactly see where the problem is, but if "DCMPLX" divides by s at any given time, it will be dividing by zero.

Are you able to isolate the problem with flags, as I had suggested earlier?
 
  • #16
Sane said:
I'm almost positive it has to do with what's happening right after the call to ffset. There seems to be some funny stuff going on. I can't exactly see where the problem is, but if "DCMPLX" divides by s at any given time, it will be dividing by zero.

Are you able to isolate the problem with flags, as I had suggested earlier?

Well, I guess my biggest problem here is that I really don't now how to programe. I simply don't know what a "flag" is...:blushing: ...neither how to use them...
But you're right I should try to check DCMPLX. (And get help to do some "flagging".)
 
  • #17
EL said:
Well, I guess my biggest problem here is that I really don't now how to programe. I simply don't know what a "flag" is...:blushing: ...neither how to use them...
But you're right I should try to check DCMPLX. (And get help to do some "flagging".)

Oh, it's really nothing special at all.

Sane said:
If this is a run-time error, the best thing you could do is identify the position of the error. Seclude the area it is zero-dividing at by outputting a flag at several points in your program. See what the last flag it outputs is, and then place several more flags between that and the next flag. Keep repeating this until you have ioslated the position of the zero-division, then perhaps you can see what expression(s) are the cause of the error.

When I say "output a flag", all I mean is print some unique information out, so you know where it currently is inside the program. For example, you could do the following...

Code:
*
* the precision to which real calculations are done is
*
[b]print *, "Reached Block 1"[/b]

precx = 1
sold = 0
do 1 i=1,1000
precx = precx/2
call ffset(s, 1 + precx)
s = exp(log(s))
if ( s .eq. sold ) goto 2
sold = s
1 continue
2 continue
precx = precx*8
* (take three bits for safety)

*
* the precision to which complex calculations are done is
*
[b]print *, "Reached Block 2"[/b]

precc = 1
sold = 0
do 3 i=1,1000
precc = precc/2
call ffset(s, 1 + precc)
cs = exp(log(DCMPLX(s)))
if ( DBLE(cs) .eq. sold ) goto 4
sold = DBLE(cs)
3 continue
4 continue
precc = precc*8
* (take three bits for safety)

*
* for efficiency take them equal if they are not too different
*
[b]print *, "Reached Block 3"[/b]

if ( precx/precc .lt. 4 .and. precx/precc .gt. .25 ) then
precx = max(precc,precx)
precc = max(precc,precx)
endif
*
* and the minimum value the logarithm accepts without complaining
* about arguments zero is (DOUBLE PRECISION cq DOUBLE COMPLEX)
*
[b]print *, "Reached Block 4"[/b]

s = 1
xalogm = 1
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
6 continue
if ( xalogm.eq.0 ) xalogm = 1d-307

s = 1
xclogm = abs(DCMPLX(s))
do 7 i=1,10000
call ffset(s, s/2)
if ( 2*abs(DCMPLX(s)) .ne. xclogm ) goto 8
xclogm = abs(DCMPLX(s))
7 continue
8 continue
if ( xclogm.eq.0 ) xclogm = 1d-307

You see which print statement was the last to occur. Then place more print statements after the location the last one was outputted. Keep repeating until you've isolated the area where execution is terminated. Simple. Right?

Basically, this helps pin-point the problematic area. Then all attention can be focused on the potential of one expression being invalid, rather than the entire program.
 
  • #18
Sane said:
When I say "output a flag", all I mean is print some unique information out, so you know where it currently is inside the program. For example, you could do the following...

Thanks again Sane, I'll go for it directly!
 
  • #19
:frown: No, nothing happened. Just the same old
forrtl: error (73): floating divide by zero
It didn't print anything out...
 
  • #20
Ok, learned I had to recompile a lot of stuff first to make it work. It's looking better...
 
  • #21
I placed out some "flags" and it went thourgh the whole part of the code I quoted above without problem. But still there's the same error showing up...:frown:
 
  • #22
/a tad wee bitty off topic
command called "grep" to do a search?
Its actually a Command Line program, which you evoke using the command grep :wink:
 
  • #23
Anttech said:
/a tad wee bitty off topic
Its actually a Command Line program, which you evoke using the command grep :wink:
no difference to me...:tongue2:
 
  • #24
Moreover, I've noticed that when I run gdb, the "flagging" works and I can pinpoint the error. It seems it's somewhere between "check 71" and "check72":
print *,"check7"
s = 1
xalogm = 1
print *,"check71"
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
print *,"check72"
6 continue
print *,"check73"
if ( xalogm.eq.0 ) xalogm = 1d-307
 
  • #25
When I lower the number of loops to about 1000 the problem suddenly goes away!...:bugeye: :rolleyes:
However, instead a new "error (73)" materialises. This time gdb points out the ifortran catalogue as the source of evil.:grumpy:
 
  • #26
EL said:
Moreover, I've noticed that when I run gdb, the "flagging" works and I can pinpoint the error. It seems it's somewhere between "check 71" and "check72":

There we go! That's what I was waiting for. :biggrin:

It helps, no?

Each time the loop iterates, it appears that s is being divided by 2. If it loops 10 000 times, it probably at one point rounds down to zero.

10
5
2.5
1.25
0.625
0.3125
... 10,000 loops later ...
0.0

That might be causing a problem. Printing out some values, of the relevant variables, as you go usually helps too, in order to make sure everything is what you want it to be.
 
  • #27
What is ffset? Did it come from some library or did you define it somewhere? Are you using ffset(s, s/2) as the assignment s=s/2? If so, s would go to 0 but doesn't explain the division by 0 error. If you have the source for ffset it would be nice to look at it.
 
  • #28
-Job- said:
What is ffset? Did it come from some library or did you define it somewhere? Are you using ffset(s, s/2) as the assignment s=s/2? If so, s would go to 0 but doesn't explain the division by 0 error. If you have the source for ffset it would be nice to look at it.
Yes, it's in post #11!
And I also cannot see why s going to 0 should cause the error "division by zero". In fact, it should be something like "0 divided by 2 to many times" instead...
 
  • #29
Sane said:
There we go! That's what I was waiting for. :biggrin:

It helps, no?

Each time the loop iterates, it appears that s is being divided by 2. If it loops 10 000 times, it probably at one point rounds down to zero.

10
5
2.5
1.25
0.625
0.3125
... 10,000 loops later ...
0.0

That might be causing a problem. Printing out some values, of the relevant variables, as you go usually helps too, in order to make sure everything is what you want it to be.
Yep, that's what I did. And it looks like as soon as s goes below 1d-307 the error arises. (1d-307 is the value the code assigns if s=0 after the loops, see the quoted code.) I really cannot see why it should be sensitive to exactly that number?
And I can't see why the error becomes "division by zero"? I mean, I am at most dividing zero by something, not the other way around...
And anyway, taking down the number of loops to 1015 only solves the problem temporarily, since now it points out a new "error (73)" arising from the "ifortran" catalogue...
 
Last edited:
  • #30
If you have possible multiple errors then you should isolate some code and make sure it works as it should without the rest of the program getting in your way.

Isolate the following code and run it. It should give you a division by 0 error.
Then replace ffset(s, s/2) with s = s/2 and run that. Let us know what happens.
Code:
s = 1
xalogm = 1
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
6 continue
 
  • #31
The problem disappeared when I switched compiler from iFort to g77!
Many thanks for your help anyway!
 
  • #32
The mighty ifort "falls" to g77 ... oh boy :tongue: .
 
  • #33
Putting on someone else's shoes is almost never a good idea. Unless they fit, that is.

That must have been a pretty painful fit, to be worth 3 pages of frantic posts.

Glad you got that sorted out. :biggrin:
 
Last edited:
  • #34
Thing is, I didn't chose compiler to begin with. There's a ranking list built into the code such that ifort is automatically chosen first. Only if there's no ifort installed it goes on to the next compiler on the list...
When I took away ifort from PATH, g77 was chosen and everything (well, at least this specific problem) worked out well.
 

Suggested for: Localizing a Fortran Error - Get the Position of the Error

Replies
4
Views
399
Replies
1
Views
161
Replies
1
Views
916
Replies
23
Views
4K
Replies
13
Views
1K
Replies
17
Views
4K
Replies
32
Views
2K
Replies
14
Views
1K
Back
Top