Double precision computation time in fortran

In summary, the calculation takes 1.2 times as long to complete in double precision as it does in single precision.
  • #1
JoAuSc
198
1
I've written a program that repeats a calculation a certain number of times in single precision, then in double precision. The times are recorded by the cpu_time(real t1) subroutine. (Specifically, the calculation is aReal/(aReal+0.01) .) It seems that the time for the double calculation is 1.20 times the time for the single precision calculation. (There was little difference in the times for ~1 second times, but over many minutes there's a clear 1.2 factor.) Why isn't it larger than that? (Not that I'm complaining.) I'd figure that for addition and subtraction, dealing with double variables would take twice as long, and it would be four times as long for multiplication (and somewhere around there for division).
 
Technology news on Phys.org
  • #2
JoAuSc said:
I've written a program that repeats a calculation a certain number of times in single precision, then in double precision. The times are recorded by the cpu_time(real t1) subroutine. (Specifically, the calculation is aReal/(aReal+0.01) .) It seems that the time for the double calculation is 1.20 times the time for the single precision calculation. (There was little difference in the times for ~1 second times, but over many minutes there's a clear 1.2 factor.) Why isn't it larger than that? (Not that I'm complaining.) I'd figure that for addition and subtraction, dealing with double variables would take twice as long, and it would be four times as long for multiplication (and somewhere around there for division).
Depends on the machine, the internal precision on a PC can be 80 bits if floating point is used. If the SSE instructions are used, it can be single or double precision.
 
  • #3
They days when "double precision cost twice or 4 times as much as single" are long gone. That rule of thumb applied when hardware was expensive, and double precision calculations were done by splitting the number into two parts and using the single-precision add and multiply hardware several times.

Back in those days, floating-point arithmetic cost an order of magnitude more time than fixed-point arithmetic for the same general reason. Now, the costs are more or less identical.

The main difference now is that you are accessing twice as much memory, but whether that costs more time depends on your problem.

In fact for languages where the standard specifies that all arithemetic is done internally in double precision (e.g. ANSI C), single precision can run slower than double because of the extra conversions - though there's usually a compiler switch to disable this if you want speed rather than absolute conformance to the standard.
 
  • #4
As Jeff and AlephZero point out, performance depends on hardware, compiler, code, and the algorithm.

If you write your program to run on a relatively recent x86/x86-64 machine using vector (packed) SSE, you might see a sharp difference between 32 and 64 bit FP. This is because there are two DP floats and 4 SP floats in an SSE vector, and also because the throughputs for division are quite a bit different (on some models, it looks like SP throughput is about 4x better than DP, it's not so bad on others).

If you're running x87 FPDIV instructions, the ratio of instruction throughput DP/SP is 40/30, at least for some models: the CPU waits 40 cycles before it can issue the next DP divide, it waits 30 cycles before issuing another SP divide. Memory latency would tend to diminish this ratio.

By the way, is your code of the form
Code:
y[i] = y[i-1]/(y[i-1]+c)
or
Code:
y[i] = x[i]/(x[i] + c)
 
  • #5
Thanks for the answers.

nmtim said:
By the way, is your code of the form
Code:
y[i] = y[i-1]/(y[i-1]+c)
or
Code:
y[i] = x[i]/(x[i] + c)
It's the second one, but I just redid the same calculation over and over, rather than going through an array. This was a test to see if it would drastically slow down a molecular dynamics simulation I'm doing.
 
  • #6
There is also overhead in the loop you used to perform the calculation over and over again. You will get more accurate timing results if you fold the loop. Example of an unfolded loop (written in C rather than Fortran):
Code:
for (ii = 0; ii < limit; ii++) {
   y[i] = x[i]/(x[i] + c;
}
Folding the loop four times over, (assuming limit is divisible by four),
Code:
for (ii = 0; ii < limit; ii += 4) {
   y[i] = x[i]/(x[i] + c);
   y[i+1] = x[i+1]/(x[i+1] + c);
   y[i+2] = x[i+2]/(x[i+2] + c);
   y[i+3] = x[i+3]/(x[i+3] + c);
}
Even more folding will lessen the burden of the test infrastructure.
 
  • #7
I just tried folding the loop so that for each iteration, the command's repeated ten times. I didn't notice a significant difference in speed, but granted, this run was only 18 seconds (and the difference I mentioned earlier only seems to be consistent if you let it run for at least a few minutes), and there was like 1,000,000,000 iterations each total (there were two nested loops), and this was run on a different terminal.
 
  • #8
I ran a few tests on a Mac (10e6 iterations) and a Linux box (60e6 iterations). Times are in seconds. I programmed in C and used gettimeofday() for timing.

Code:
Loop          Mac          Linux
Folding   float double  float double
 1         0.74  1.05    0.91  0.91
 2         0.65  0.90    0.91  0.91
 4         0.54  0.84    0.91  0.91
 8         0.49  0.82    0.91  0.91
16         0.49  0.82    0.91  0.91
32         0.48  0.81    0.91  0.91

The code was unoptimized (an optimizer would have optimized the silly calculation away). The Linux box is completely indifferent to loop folding and to numerical precision.
 
  • #9
AlephZero
In fact for languages where the standard specifies that all arithemetic is done internally in double precision (e.g. ANSI C), single precision can run slower than double because of the extra conversions - though there's usually a compiler switch to disable this if you want speed rather than absolute conformance to the standard.

I have a copy of ISO C99 standards and I cannot manage to find anything like the above.
Can you verify where you got your information, please. If what you said is correct, with the ANSI compliance switch on, then I should be able to see this in the .s (asm source) file?

Or are you talking about promotion?
 
Last edited:
  • #10
jim mcnamara said:
I have a copy of ISO C99 standards and I cannot manage to find anything like the above.
Can you verify where you got your information, please. If what you said is correct, with the ANSI compliance switch on, then I should be able to see this in the .s (asm source) file?

Or are you talking about promotion?

Sorry, my memory lapsed here.

Ref: "The Standard C Library", P.J. Plauger, 1992, ISBN 0-13-131509-9.

Page 57:

"[Because of the architecture of the PDP11 floating point processor where C was first developed] It is no surprise that C for many years promised to produce a double result for any operator involving floating point operands, even one with two float operands."

And after some discussion of floating point hardware that did not support this principle, on page 59:

"If the C standard had tried to outlaw this [less precise floating point] behaviour, it would never have been approved".

So by implication, it's not part of the standard (though it's pretty clear Plauger would have preferred it to be).

Promotion is another potential source of confusion here - e.g. code like

float x,y;
x = y + 0.1;

might not be doing exactly what you thought it was doing!
 
  • #11
D H said:
The code was unoptimized (an optimizer would have optimized the silly calculation away).

You can prevent that by using the result of the calculation in a way that the optimizer must honor. The easiest way to do that is to print it; the second easiest, though less certain as inter-procedural optimizers improve, is to insert the loop into a function.
 
  • #12
Printing or embedding in a function will not help here. I am trying to minimize the overhead not associated with performing a floating point calculation, not maximize it. That is the reason for folding the loop. Printing is one of the most expensive operations available in any language. The cost of a function call overwhelms the simple increment/test/branch cost of a loop.
 
  • #13
D H said:
The code was unoptimized (an optimizer would have optimized the silly calculation away). The Linux box is completely indifferent to loop folding and to numerical precision.

Did you look at the assembler (.s) file to see what you were actually executing?

An intelligent (unix?) compiler may have deleted redundant code (i.e. everything inside your loop!) even in non-optimized mode, even if it kept the loop, and gave you constant run speed just counting your loop variable up to a large number.

In C, declaring variables volatile helps, because it says to the compiler something else may change the value at any time so optimisations based on "logic" are banned.

E.g.
volatile float x;
float y;

y = x + x;
is not equivalent to
y = 2*x;

because the two values of "x" in the first statement, obtained by two separate reads of memory, may have different values.
 
Last edited:
  • #14
D H said:
Printing or embedding in a function will not help here. I am trying to minimize the overhead not associated with performing a floating point calculation, not maximize it. That is the reason for folding the loop. Printing is one of the most expensive operations available in any language. The cost of a function call overwhelms the simple increment/test/branch cost of a loop.

Code:
...
double result( init_val);
gettimeofday( &t_begin, &tz);
for(size_t i = 0; i < maxit; ++i)
{
    result = result/(result+c);
}
gettimeofday( &t_end, &tz);

// The next line prevents the compiler from eliminating the above loop.
// It does not contribute to the measured time.
std::cout << "result = " << result << "\n";
 
  • #15
D H said:
The print statement does not prevent the compiler from eliminating the loop. Any good optimizing compiler will move the result = result/(result+c) statement outside the loop, see that the loop is empty, and eliminate the loop.

Are you saying that the compiler would detect whether the loop runs an odd or even number of times, and computes the appropriate result for that case? That seems a little much to expect of a compiler to me.
 
  • #16
No. After looking at this particular loop again I realized I was wrong. I deleted my post, but you got to it to quickly. This particular expression is not a constant expression. I thought nmtim had chosen an expression suitable for timing studies. He didn't.

The non-constant behavior makes this expression not a particularly good candidate for studying timing behavior. This particular expression (result=result/(result+c) converges to zero or 1-c, depending on the initial value of result. This makes this particular expression a very bad one for timing studies.
 

What is double precision computation time in Fortran?

Double precision computation time refers to the amount of time it takes for a computer program written in Fortran to perform calculations using double precision floating-point numbers. These numbers have a higher precision and can store larger values compared to single precision floating-point numbers.

Why is double precision computation time important?

Double precision computation time is important because it affects the accuracy and reliability of numerical calculations. It is especially crucial in scientific and engineering applications where small errors can have significant impacts on the results.

How is double precision computation time measured?

Double precision computation time is typically measured in seconds or milliseconds. It can be calculated by timing the execution of a specific code segment or by using profiling tools that track the time taken by each function or subroutine.

What factors affect double precision computation time in Fortran?

The main factors that affect double precision computation time in Fortran are the complexity of the calculations, the efficiency of the algorithm used, and the hardware and software environment in which the program is executed. Other factors such as compiler optimizations and input data size can also play a role.

How can double precision computation time be optimized in Fortran?

There are several ways to optimize double precision computation time in Fortran, such as using efficient algorithms, minimizing unnecessary calculations, and optimizing memory usage. Additionally, using compiler optimizations, parallelization techniques, and optimizing the hardware environment can also improve computation time.

Similar threads

  • Programming and Computer Science
Replies
12
Views
3K
  • Programming and Computer Science
Replies
10
Views
21K
  • Programming and Computer Science
Replies
1
Views
3K
  • Programming and Computer Science
Replies
4
Views
2K
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
2
Views
3K
Replies
2
Views
2K
  • Programming and Computer Science
Replies
2
Views
6K
  • Engineering and Comp Sci Homework Help
Replies
0
Views
2K
Back
Top