Data type: Float Point or Double?

dimensionless · Dec 23, 2005

I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?

ComputerGeek · Dec 23, 2005

dimensionless said:

I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?

The reason that they use Double is because in order to get the best accuracy, using double precision floating point values allows them use significant digit calculations with the smallest loss of precision due to rounding as possible.

Float would be fine for a home work problem, but I would not use it when calculating vectors on a trip to mars.

jim mcnamara · Dec 28, 2005

Languages have limits on their implementations of data types.
In C the "#include <limits.h>" brings in defined values that tell you how much precision a datatype has.

Number of accurate digits in floating point from HP UX (UNIX) C:

FLT_DIG 6 digits of precision
DBL_DIG 15 digits of precision

Which would you rather have when it takes no more CPU (with a floating point processor) to do "double" math or "float" math operations?

chroot · Dec 28, 2005

Jim,

Different compilers, in fact, regard the types "float" and "double" differently. On most compilers, floats are 4 bytes and doubles are 8 bytes.

Also, double-precision arithmetic is certainly slower than single-precision arithmetic. Modern processors have vector-math units (MMX, SSE, etc.), the use of which is scheduled by your compiler. You can do twice as many single-precision operations per unit time as double-precision operations with these vector-math units.

- Warren

jim mcnamara · Dec 30, 2005

chroot -

I disagree with your statement that double precision is always slower than single. If your model were correct everywhere, the following assembler sequences would never occur.

HPUX V-class PA-RISC boxes preferentially use double precision FP
operations, because they are more efficient. I've been on other platforms
where this is also true. Here is a concrete HPUX 11.00 example.

Consider some C code compiled with cc -S myfile.c -DTYPE=<float or double, see below> to create ASM:

Code:

#include <math.h>
/* TYPE defined by -DTYPE=float or -DTYPE=double */
TYPE process(TYPE a, TYPE b)
{
    TYPE tmp=a;
    tmp -=.05;
    tmp*=.5;
    tmp+=b;
    if ( fabs(a+b)>0.) tmp/=(a+b);
    return tmp;
}

On HPUX V-class, PA-RISC boxes, when compiled with -DTYPE=double
This is the assember produced when the compiler is dealing with double datatypes,
I put some *'s in front of one area of interest. Note there are no FCNV calls,
and all FP operations (FMPY FSUB, etc) are on double precision FP numbers.

Code:

process
        .PROC
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        STW     %r2,-20(%r30)   ;offset 0x0
        LDO     64(%r30),%r30   ;offset 0x4
*       FSTD    %fr5,-104(%r30) ;offset 0x8
*       FSTD    %fr7,-112(%r30) ;offset 0xc
*       FLDD    -104(%r30),%fr4 ;offset 0x10
*       FSTD    %fr4,-56(%r30)  ;offset 0x14
*       FLDD    -56(%r30),%fr5  ;offset 0x18
*       LDIL    LR'S$6$process,%r1      ;offset 0x1c
*       FLDD    RR'S$6$process(%r1),%fr6        ;offset 0x20
*       FSUB,DBL        %fr5,%fr6,%fr7  ;offset 0x24
        FSTD    %fr7,-56(%r30)  ;offset 0x28
        FLDD    -56(%r30),%fr8  ;offset 0x2c
        LDIL    LR'S$6$process,%r31     ;offset 0x30
        FLDD    RR'S$6$process+8(%r31),%fr9     ;offset 0x34
        FMPY,DBL        %fr8,%fr9,%fr10 ;offset 0x38
        FSTD    %fr10,-56(%r30) ;offset 0x3c
        FLDD    -56(%r30),%fr11 ;offset 0x40
        FLDD    -112(%r30),%fr22        ;offset 0x44
        FADD,DBL        %fr11,%fr22,%fr23       ;offset 0x48
        FSTD    %fr23,-56(%r30) ;offset 0x4c
        FLDD    -104(%r30),%fr24        ;offset 0x50
        FLDD    -112(%r30),%fr25        ;offset 0x54
        FADD,DBL        %fr24,%fr25,%fr5        ;offset 0x58
        LDIL    L'fabs,%r31     ;offset 0x5c
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x60
        COPY    %r31,%r2        ;offset 0x64
        FCPY,DBL        %fr0,%fr26      ;offset 0x68
        FCMP,DBL,>      %fr4,%fr26      ;offset 0x6c
        FTEST           ;offset 0x70
        B,N     $00000001       ;offset 0x74
        FLDD    -104(%r30),%fr27        ;offset 0x78
        FLDD    -112(%r30),%fr28        ;offset 0x7c
        FADD,DBL        %fr27,%fr28,%fr29       ;offset 0x80
        FLDD    -56(%r30),%fr30 ;offset 0x84
        FDIV,DBL        %fr30,%fr29,%fr31       ;offset 0x88
        ..... code omitted to save space.

When the same code is compiled with -DTYPE=float, note that FCNV is called to convert float to double, and FMPY and FSUB use double precision. FADD does not.

Code:

process                                                             
        .PROC                                                       
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE 
        .ENTRY                                                      
        STW     %r2,-20(%r30)   ;offset 0x0                         
        LDO     64(%r30),%r30   ;offset 0x4                         
*       FSTW    %fr4L,-100(%r30)        ;offset 0x8                 
*       FSTW    %fr5L,-104(%r30)        ;offset 0xc                 
*       FLDW    -100(%r30),%fr4L        ;offset 0x10                
*       FSTW    %fr4L,-56(%r30) ;offset 0x14                        
*       FLDW    -56(%r30),%fr4R ;offset 0x18                        
*       FCNV,SGL,DBL    %fr4R,%fr4      ;offset 0x1c                
*       LDIL    LR'S$6$process,%r1      ;offset 0x20                
*       FLDD    RR'S$6$process(%r1),%fr5        ;offset 0x24        
*       FSUB,DBL        %fr4,%fr5,%fr6  ;offset 0x28                
*       FCNV,DBL,SGL    %fr6,%fr5L      ;offset 0x2c                
        FSTW    %fr5L,-56(%r30) ;offset 0x30                        
        FLDW    -56(%r30),%fr5R ;offset 0x34                        
        FCNV,SGL,DBL    %fr5R,%fr7      ;offset 0x38                
        LDIL    LR'S$6$process,%r31     ;offset 0x3c                
        FLDD    RR'S$6$process+8(%r31),%fr8     ;offset 0x40        
        FMPY,DBL        %fr7,%fr8,%fr9  ;offset 0x44                
        FCNV,DBL,SGL    %fr9,%fr6L      ;offset 0x48                
        FSTW    %fr6L,-56(%r30) ;offset 0x4c                        
        FLDW    -56(%r30),%fr6R ;offset 0x50                        
        FLDW    -104(%r30),%fr7L        ;offset 0x54                
        FADD,SGL        %fr6R,%fr7L,%fr7R       ;offset 0x58        
        FSTW    %fr7R,-56(%r30) ;offset 0x5c                        
        FLDW    -100(%r30),%fr8L        ;offset 0x60                
        FLDW    -104(%r30),%fr8R        ;offset 0x64                
        FADD,SGL        %fr8L,%fr8R,%fr9L       ;offset 0x68        
        FCNV,SGL,DBL    %fr9L,%fr5      ;offset 0x6c                
        LDIL    L'fabs,%r31     ;offset 0x70                        
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x74                
        COPY    %r31,%r2        ;offset 0x78                        
        FCPY,DBL        %fr0,%fr10      ;offset 0x7c                
        FCMP,DBL,>      %fr4,%fr10      ;offset 0x80                
        FTEST           ;offset 0x84                                
        B,N     $00000001       ;offset 0x88   
        .... code omitted

Please notice that floats are converted to doubles (FCNV mnemonic is float convert)
before most arithmetic operations on floats.

chroot · Dec 30, 2005

Jim,

I never said double-precision is always slower than single-precision. I just took offense to your statement that single-precision is never faster than double-precision. I don't have much experience with the PA-RISC instruction set, but, on many processors, under many conditions, single-precision is indeed faster than double-precision.

- Warren

Data type: Float Point or Double?

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Data type: Float Point or Double?

Similar threads