Localizing a Fortran Error - Get the Position of the Error

EL · Nov 30, 2006

Being a newbie on Fortran I need some help with localizing an error.
When I run my executable I get the message:

forrtl: error (73): floating divide by zero

I get that somewhere in my code there's a division by zero, which obviously will not be accepted, but my question is where?
Is there a way to get the position of the error pointed out?

nmtim · Nov 30, 2006

Are you getting a core file? If so, open it up with gdb/other debugger; the state of the stack might give you some idea where it's barfing.

Is the error always in the same place? If it's not a huge run, maybe you can run under gdb or another debugger until it fails.

There's always printlining.

These will help you locate where the div by 0 happens; finding where the zero gets injected is tougher.

If you're running IEEE 754 shouldn't you be able to register your own error handlers? It's not something I've ever done so I can't give you details, but it might be doable.

0rthodontist · Nov 30, 2006

EL said:

Being a newbie on Fortran I need some help with localizing an error.
When I run my executable I get the message:

I get that somewhere in my code there's a division by zero, which obviously will not be accepted, but my question is where?
Is there a way to get the position of the error pointed out?

It's not on line 73, is it?

EL · Nov 30, 2006

nmtim said:

Are you getting a core file? If so, open it up with gdb/other debugger; the state of the stack might give you some idea where it's barfing.

Thanks, I got the same advice from a colleague earlier today, and it pointed out the command where it all went wrong. I'm just about to try to locate in which file this "command" is.
Btw I really hate Linux :devil:

(For example, how logical is it to use a command called "grep" to do a search?...)

EL · Nov 30, 2006

0rthodontist said:

It's not on line 73, is it?

No, at first I thought something like that too. But I think (73) is just a code for "dividing by zero".

berkeman · Nov 30, 2006

0rthodontist said:

It's not on line 73, is it?

That seems the most likely candidate. But EL, if it's not 73, then are there any obvious candidates for a math divide by zero? Especially if it's a floating point complaint, it sounds like it's in a math operation. How many places are you doing floating point math divides? It could be from some other error, but the math sections would be the first to look at, it would seem. Also, can you control the execution with different data sets? What are the input and output data sets for this program?

wxrocks · Nov 30, 2006

Here is a good reason for sub-programs -- it makes it easier to cut it down. Can you put your calculations into some modules and then see which sub causes the error?

Sane · Nov 30, 2006

It might not always be as obvious as it may seem. For example, raising 0 to a negative power almost seems innocent.

EL · Nov 30, 2006

berkeman said:

That seems the most likely candidate. But EL, if it's not 73, then are there any obvious candidates for a math divide by zero? Especially if it's a floating point complaint, it sounds like it's in a math operation. How many places are you doing floating point math divides? It could be from some other error, but the math sections would be the first to look at, it would seem. Also, can you control the execution with different data sets? What are the input and output data sets for this program?

No, as said, 73 has nothing to do with any line number.
I'm modifying an already existing code, so I'm really not in full control of what's happening all the way. At least I've checked that there's no division by zero directly in the parts I've added, but the code is quite complex, with a lot of going back and forth between different files doing different stuff. Probably there's something I've missed to assign a value too, or something similar.
Anyway, I'm quite sure nmtim's advice is the right way to go. It gave me the name of the object causing the problem, so now I just have to find where (and what) it is ...
However, I'll have to wait for tomorrow since I discovered I couldn't access my files from home.

Sane · Nov 30, 2006

If this is a run-time error, the best thing you could do is identify the position of the error. Seclude the area it is zero-dividing at by outputting a flag at several points in your program. See what the last flag it outputs is, and then place several more flags between that and the next flag. Keep repeating this until you have ioslated the position of the zero-division, then perhaps you can see what expression(s) are the cause of the error.

EL · Dec 1, 2006

I have now found the subroutine "ffset" which gdb pointed out as the source of the error:

**********************************************************

* This silly subroutine is called from ffini while determining
* the working precision of the machine we're running on.
* It works around the optimizer to guarantee that we're not in
* fact determining the precision of the FPU registers.

subroutine ffset(res, x)
implicit none
DOUBLE PRECISION res, x

res = x
end

**********************************************************

I cannot see where there could be any division by zero?

EL · Dec 1, 2006

Btw, what does "inplicit none" do?

Sane · Dec 1, 2006

May it be that the "source" of the error is where the initiation of the variables involved took place? In other words, it would be after this called function, wherever res is used.

Output each value before and after the call to ffset takes place. See what's happening.

"implicit none" is an extention to Fortran 77, acceptable by most compilers. The command defines the variable declaration to be explicitely defined only. In other words, if the compiler finds a variable name that has not been defined, it will throw up your trash appropriately. Otherwise, its behaviour is undefined. It's there for your benefit.

EL · Dec 1, 2006

Sane said:

May it be that the "source" of the error is where the initiation of the variables involved took place? In other words, it would be after this called function, wherever res is used.

I appreciate your help Sane.
Here's the part of the code where ffset is called:

*
* the precision to which real calculations are done is
*
precx = 1
sold = 0
do 1 i=1,1000
precx = precx/2
call ffset(s, 1 + precx)
s = exp(log(s))
if ( s .eq. sold ) goto 2
sold = s
1 continue
2 continue
precx = precx*8
* (take three bits for safety)

*
* the precision to which complex calculations are done is
*
precc = 1
sold = 0
do 3 i=1,1000
precc = precc/2
call ffset(s, 1 + precc)
cs = exp(log(DCMPLX(s)))
if ( DBLE(cs) .eq. sold ) goto 4
sold = DBLE(cs)
3 continue
4 continue
precc = precc*8
* (take three bits for safety)

*
* for efficiency take them equal if they are not too different
*
if ( precx/precc .lt. 4 .and. precx/precc .gt. .25 ) then
precx = max(precc,precx)
precc = max(precc,precx)
endif
*
* and the minimum value the logarithm accepts without complaining
* about arguments zero is (DOUBLE PRECISION cq DOUBLE COMPLEX)
*
s = 1
xalogm = 1
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
6 continue
if ( xalogm.eq.0 ) xalogm = 1d-307

s = 1
xclogm = abs(DCMPLX(s))
do 7 i=1,10000
call ffset(s, s/2)
if ( 2*abs(DCMPLX(s)) .ne. xclogm ) goto 8
xclogm = abs(DCMPLX(s))
7 continue
8 continue
if ( xclogm.eq.0 ) xclogm = 1d-307

Can you spot any potential "zeros"?

Sane · Dec 1, 2006

I'm almost positive it has to do with what's happening right after the call to ffset. There seems to be some funny stuff going on. I can't exactly see where the problem is, but if "DCMPLX" divides by s at any given time, it will be dividing by zero.

Are you able to isolate the problem with flags, as I had suggested earlier?

EL · Dec 1, 2006

Sane said:

I'm almost positive it has to do with what's happening right after the call to ffset. There seems to be some funny stuff going on. I can't exactly see where the problem is, but if "DCMPLX" divides by s at any given time, it will be dividing by zero.

Are you able to isolate the problem with flags, as I had suggested earlier?

Well, I guess my biggest problem here is that I really don't now how to programe. I simply don't know what a "flag" is... :blushing:

...neither how to use them...
But you're right I should try to check DCMPLX. (And get help to do some "flagging".)

Sane · Dec 1, 2006

EL said:

Well, I guess my biggest problem here is that I really don't now how to programe. I simply don't know what a "flag" is... ...neither how to use them...
But you're right I should try to check DCMPLX. (And get help to do some "flagging".)

Oh, it's really nothing special at all.

Sane said:

If this is a run-time error, the best thing you could do is identify the position of the error. Seclude the area it is zero-dividing at by outputting a flag at several points in your program. See what the last flag it outputs is, and then place several more flags between that and the next flag. Keep repeating this until you have ioslated the position of the zero-division, then perhaps you can see what expression(s) are the cause of the error.

When I say "output a flag", all I mean is print some unique information out, so you know where it currently is inside the program. For example, you could do the following...

Code:

*
* the precision to which real calculations are done is
*
[b]print *, "Reached Block 1"[/b]

precx = 1
sold = 0
do 1 i=1,1000
precx = precx/2
call ffset(s, 1 + precx)
s = exp(log(s))
if ( s .eq. sold ) goto 2
sold = s
1 continue
2 continue
precx = precx*8
* (take three bits for safety)

*
* the precision to which complex calculations are done is
*
[b]print *, "Reached Block 2"[/b]

precc = 1
sold = 0
do 3 i=1,1000
precc = precc/2
call ffset(s, 1 + precc)
cs = exp(log(DCMPLX(s)))
if ( DBLE(cs) .eq. sold ) goto 4
sold = DBLE(cs)
3 continue
4 continue
precc = precc*8
* (take three bits for safety)

*
* for efficiency take them equal if they are not too different
*
[b]print *, "Reached Block 3"[/b]

if ( precx/precc .lt. 4 .and. precx/precc .gt. .25 ) then
precx = max(precc,precx)
precc = max(precc,precx)
endif
*
* and the minimum value the logarithm accepts without complaining
* about arguments zero is (DOUBLE PRECISION cq DOUBLE COMPLEX)
*
[b]print *, "Reached Block 4"[/b]

s = 1
xalogm = 1
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
6 continue
if ( xalogm.eq.0 ) xalogm = 1d-307

s = 1
xclogm = abs(DCMPLX(s))
do 7 i=1,10000
call ffset(s, s/2)
if ( 2*abs(DCMPLX(s)) .ne. xclogm ) goto 8
xclogm = abs(DCMPLX(s))
7 continue
8 continue
if ( xclogm.eq.0 ) xclogm = 1d-307

You see which print statement was the last to occur. Then place more print statements after the location the last one was outputted. Keep repeating until you've isolated the area where execution is terminated. Simple. Right?

Basically, this helps pin-point the problematic area. Then all attention can be focused on the potential of one expression being invalid, rather than the entire program.

EL · Dec 4, 2006

Sane said:

When I say "output a flag", all I mean is print some unique information out, so you know where it currently is inside the program. For example, you could do the following...

Thanks again Sane, I'll go for it directly!

EL · Dec 4, 2006

No, nothing happened. Just the same old

forrtl: error (73): floating divide by zero

It didn't print anything out...

EL · Dec 4, 2006

Ok, learned I had to recompile a lot of stuff first to make it work. It's looking better...

EL · Dec 4, 2006

I placed out some "flags" and it went thourgh the whole part of the code I quoted above without problem. But still there's the same error showing up...

Anttech · Dec 4, 2006

/a tad wee bitty off topic

command called "grep" to do a search?

Its actually a Command Line program, which you evoke using the command grep

EL · Dec 4, 2006

Anttech said:

/a tad wee bitty off topic
Its actually a Command Line program, which you evoke using the command grep

no difference to me...:tongue2:

EL · Dec 4, 2006

Moreover, I've noticed that when I run gdb, the "flagging" works and I can pinpoint the error. It seems it's somewhere between "check 71" and "check72":

print *,"check7"
s = 1
xalogm = 1
print *,"check71"
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
print *,"check72"
6 continue
print *,"check73"
if ( xalogm.eq.0 ) xalogm = 1d-307

EL · Dec 4, 2006

When I lower the number of loops to about 1000 the problem suddenly goes away!... :bugeye:

However, instead a new "error (73)" materialises. This time gdb points out the ifortran catalogue as the source of evil.:grumpy:

Sane · Dec 4, 2006

EL said:

Moreover, I've noticed that when I run gdb, the "flagging" works and I can pinpoint the error. It seems it's somewhere between "check 71" and "check72":

There we go! That's what I was waiting for.

It helps, no?

Each time the loop iterates, it appears that s is being divided by 2. If it loops 10 000 times, it probably at one point rounds down to zero.

10
5
2.5
1.25
0.625
0.3125
... 10,000 loops later ...
0.0

That might be causing a problem. Printing out some values, of the relevant variables, as you go usually helps too, in order to make sure everything is what you want it to be.

-Job- · Dec 5, 2006

What is ffset? Did it come from some library or did you define it somewhere? Are you using ffset(s, s/2) as the assignment s=s/2? If so, s would go to 0 but doesn't explain the division by 0 error. If you have the source for ffset it would be nice to look at it.

EL · Dec 5, 2006

-Job- said:

What is ffset? Did it come from some library or did you define it somewhere? Are you using ffset(s, s/2) as the assignment s=s/2? If so, s would go to 0 but doesn't explain the division by 0 error. If you have the source for ffset it would be nice to look at it.

Yes, it's in post #11!
And I also cannot see why s going to 0 should cause the error "division by zero". In fact, it should be something like "0 divided by 2 to many times" instead...

EL · Dec 5, 2006

Sane said:

There we go! That's what I was waiting for.

It helps, no?

Each time the loop iterates, it appears that s is being divided by 2. If it loops 10 000 times, it probably at one point rounds down to zero.

10
5
2.5
1.25
0.625
0.3125
... 10,000 loops later ...
0.0

That might be causing a problem. Printing out some values, of the relevant variables, as you go usually helps too, in order to make sure everything is what you want it to be.

Yep, that's what I did. And it looks like as soon as s goes below 1d-307 the error arises. (1d-307 is the value the code assigns if s=0 after the loops, see the quoted code.) I really cannot see why it should be sensitive to exactly that number?
And I can't see why the error becomes "division by zero"? I mean, I am at most dividing zero by something, not the other way around...
And anyway, taking down the number of loops to 1015 only solves the problem temporarily, since now it points out a new "error (73)" arising from the "ifortran" catalogue...

-Job- · Dec 5, 2006

If you have possible multiple errors then you should isolate some code and make sure it works as it should without the rest of the program getting in your way.

Isolate the following code and run it. It should give you a division by 0 error.
Then replace ffset(s, s/2) with s = s/2 and run that. Let us know what happens.

Code:

s = 1
xalogm = 1
do 5 i=1,10000
call ffset(s, s/2)
if ( 2*s .ne. xalogm ) goto 6
xalogm = s
5 continue
6 continue

EL · Dec 7, 2006

The problem disappeared when I switched compiler from iFort to g77!
Many thanks for your help anyway!

PerennialII · Dec 7, 2006

The mighty ifort "falls" to g77 ... oh boy :tongue: .

Sane · Dec 11, 2006

Putting on someone else's shoes is almost never a good idea. Unless they fit, that is.

That must have been a pretty painful fit, to be worth 3 pages of frantic posts.

Glad you got that sorted out.

EL · Dec 11, 2006

Thing is, I didn't chose compiler to begin with. There's a ranking list built into the code such that ifort is automatically chosen first. Only if there's no ifort installed it goes on to the next compiler on the list...
When I took away ifort from PATH, g77 was chosen and everything (well, at least this specific problem) worked out well.

Localizing a Fortran Error - Get the Position of the Error

1. What is the purpose of localizing a Fortran error?

2. How can I determine the position of a Fortran error?

3. What are some common causes of Fortran errors?

4. Can I localize errors in legacy Fortran code?

5. Are there any tools or techniques that can help with localizing Fortran errors?

Similar threads

Hot Threads

Recent Insights