Data type: Float Point or Double?

Click For Summary

Discussion Overview

The discussion revolves around the choice between using float and double data types in programming, particularly in the context of computational physics. Participants explore the implications of precision, performance, and compiler behavior associated with these data types.

Discussion Character

  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants suggest that double precision is preferred for accuracy in computational physics, as it minimizes rounding errors compared to float.
  • Others argue that float may be sufficient for simpler tasks, such as homework problems, but caution against its use in critical calculations, like those for space missions.
  • One participant highlights that different programming languages and compilers have varying implementations of float and double, with specific byte sizes assigned to each type.
  • Another participant mentions that while double-precision arithmetic is often considered slower, there are cases where certain architectures may optimize double operations better than single operations.
  • A later reply provides a detailed assembler code example to illustrate how different data types are handled in terms of efficiency and conversion during operations.

Areas of Agreement / Disagreement

Participants express differing views on the performance and accuracy of float versus double, indicating that there is no consensus on which data type is universally superior. The discussion remains unresolved with multiple competing perspectives.

Contextual Notes

Limitations include the dependence on specific compiler implementations and hardware architectures, which may affect the performance and efficiency of float and double operations.

dimensionless
Messages
461
Reaction score
1
I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?
 
Technology news on Phys.org
dimensionless said:
I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?

The reason that they use Double is because in order to get the best accuracy, using double precision floating point values allows them use significant digit calculations with the smallest loss of precision due to rounding as possible.

Float would be fine for a home work problem, but I would not use it when calculating vectors on a trip to mars.
 
Languages have limits on their implementations of data types.
In C the "#include <limits.h>" brings in defined values that tell you how much precision a datatype has.

Number of accurate digits in floating point from HP UX (UNIX) C:

FLT_DIG 6 digits of precision
DBL_DIG 15 digits of precision

Which would you rather have when it takes no more CPU (with a floating point processor) to do "double" math or "float" math operations?
 
Jim,

Different compilers, in fact, regard the types "float" and "double" differently. On most compilers, floats are 4 bytes and doubles are 8 bytes.

Also, double-precision arithmetic is certainly slower than single-precision arithmetic. Modern processors have vector-math units (MMX, SSE, etc.), the use of which is scheduled by your compiler. You can do twice as many single-precision operations per unit time as double-precision operations with these vector-math units.

- Warren
 
chroot -

I disagree with your statement that double precision is always slower than single. If your model were correct everywhere, the following assembler sequences would never occur.

HPUX V-class PA-RISC boxes preferentially use double precision FP
operations, because they are more efficient. I've been on other platforms
where this is also true. Here is a concrete HPUX 11.00 example.

Consider some C code compiled with cc -S myfile.c -DTYPE=<float or double, see below> to create ASM:

Code:
#include <math.h>
/* TYPE defined by -DTYPE=float or -DTYPE=double */
TYPE process(TYPE a, TYPE b)
{
    TYPE tmp=a;
    tmp -=.05;
    tmp*=.5;
    tmp+=b;
    if ( fabs(a+b)>0.) tmp/=(a+b);
    return tmp;
}
On HPUX V-class, PA-RISC boxes, when compiled with -DTYPE=double
This is the assember produced when the compiler is dealing with double datatypes,
I put some *'s in front of one area of interest. Note there are no FCNV calls,
and all FP operations (FMPY FSUB, etc) are on double precision FP numbers.
Code:
process
        .PROC
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        STW     %r2,-20(%r30)   ;offset 0x0
        LDO     64(%r30),%r30   ;offset 0x4
*       FSTD    %fr5,-104(%r30) ;offset 0x8
*       FSTD    %fr7,-112(%r30) ;offset 0xc
*       FLDD    -104(%r30),%fr4 ;offset 0x10
*       FSTD    %fr4,-56(%r30)  ;offset 0x14
*       FLDD    -56(%r30),%fr5  ;offset 0x18
*       LDIL    LR'S$6$process,%r1      ;offset 0x1c
*       FLDD    RR'S$6$process(%r1),%fr6        ;offset 0x20
*       FSUB,DBL        %fr5,%fr6,%fr7  ;offset 0x24
        FSTD    %fr7,-56(%r30)  ;offset 0x28
        FLDD    -56(%r30),%fr8  ;offset 0x2c
        LDIL    LR'S$6$process,%r31     ;offset 0x30
        FLDD    RR'S$6$process+8(%r31),%fr9     ;offset 0x34
        FMPY,DBL        %fr8,%fr9,%fr10 ;offset 0x38
        FSTD    %fr10,-56(%r30) ;offset 0x3c
        FLDD    -56(%r30),%fr11 ;offset 0x40
        FLDD    -112(%r30),%fr22        ;offset 0x44
        FADD,DBL        %fr11,%fr22,%fr23       ;offset 0x48
        FSTD    %fr23,-56(%r30) ;offset 0x4c
        FLDD    -104(%r30),%fr24        ;offset 0x50
        FLDD    -112(%r30),%fr25        ;offset 0x54
        FADD,DBL        %fr24,%fr25,%fr5        ;offset 0x58
        LDIL    L'fabs,%r31     ;offset 0x5c
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x60
        COPY    %r31,%r2        ;offset 0x64
        FCPY,DBL        %fr0,%fr26      ;offset 0x68
        FCMP,DBL,>      %fr4,%fr26      ;offset 0x6c
        FTEST           ;offset 0x70
        B,N     $00000001       ;offset 0x74
        FLDD    -104(%r30),%fr27        ;offset 0x78
        FLDD    -112(%r30),%fr28        ;offset 0x7c
        FADD,DBL        %fr27,%fr28,%fr29       ;offset 0x80
        FLDD    -56(%r30),%fr30 ;offset 0x84
        FDIV,DBL        %fr30,%fr29,%fr31       ;offset 0x88
        ..... code omitted to save space.

When the same code is compiled with -DTYPE=float, note that FCNV is called to convert float to double, and FMPY and FSUB use double precision. FADD does not.
Code:
process                                                             
        .PROC                                                       
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE 
        .ENTRY                                                      
        STW     %r2,-20(%r30)   ;offset 0x0                         
        LDO     64(%r30),%r30   ;offset 0x4                         
*       FSTW    %fr4L,-100(%r30)        ;offset 0x8                 
*       FSTW    %fr5L,-104(%r30)        ;offset 0xc                 
*       FLDW    -100(%r30),%fr4L        ;offset 0x10                
*       FSTW    %fr4L,-56(%r30) ;offset 0x14                        
*       FLDW    -56(%r30),%fr4R ;offset 0x18                        
*       FCNV,SGL,DBL    %fr4R,%fr4      ;offset 0x1c                
*       LDIL    LR'S$6$process,%r1      ;offset 0x20                
*       FLDD    RR'S$6$process(%r1),%fr5        ;offset 0x24        
*       FSUB,DBL        %fr4,%fr5,%fr6  ;offset 0x28                
*       FCNV,DBL,SGL    %fr6,%fr5L      ;offset 0x2c                
        FSTW    %fr5L,-56(%r30) ;offset 0x30                        
        FLDW    -56(%r30),%fr5R ;offset 0x34                        
        FCNV,SGL,DBL    %fr5R,%fr7      ;offset 0x38                
        LDIL    LR'S$6$process,%r31     ;offset 0x3c                
        FLDD    RR'S$6$process+8(%r31),%fr8     ;offset 0x40        
        FMPY,DBL        %fr7,%fr8,%fr9  ;offset 0x44                
        FCNV,DBL,SGL    %fr9,%fr6L      ;offset 0x48                
        FSTW    %fr6L,-56(%r30) ;offset 0x4c                        
        FLDW    -56(%r30),%fr6R ;offset 0x50                        
        FLDW    -104(%r30),%fr7L        ;offset 0x54                
        FADD,SGL        %fr6R,%fr7L,%fr7R       ;offset 0x58        
        FSTW    %fr7R,-56(%r30) ;offset 0x5c                        
        FLDW    -100(%r30),%fr8L        ;offset 0x60                
        FLDW    -104(%r30),%fr8R        ;offset 0x64                
        FADD,SGL        %fr8L,%fr8R,%fr9L       ;offset 0x68        
        FCNV,SGL,DBL    %fr9L,%fr5      ;offset 0x6c                
        LDIL    L'fabs,%r31     ;offset 0x70                        
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x74                
        COPY    %r31,%r2        ;offset 0x78                        
        FCPY,DBL        %fr0,%fr10      ;offset 0x7c                
        FCMP,DBL,>      %fr4,%fr10      ;offset 0x80                
        FTEST           ;offset 0x84                                
        B,N     $00000001       ;offset 0x88   
        .... code omitted
Please notice that floats are converted to doubles (FCNV mnemonic is float convert)
before most arithmetic operations on floats.
 
Jim,

I never said double-precision is always slower than single-precision. I just took offense to your statement that single-precision is never faster than double-precision. I don't have much experience with the PA-RISC instruction set, but, on many processors, under many conditions, single-precision is indeed faster than double-precision.

- Warren
 

Similar threads

Replies
42
Views
9K
  • · Replies 30 ·
2
Replies
30
Views
7K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 10 ·
Replies
10
Views
4K
  • · Replies 32 ·
2
Replies
32
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 12 ·
Replies
12
Views
2K
Replies
19
Views
4K