# Data type: Float Point or Double?

I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?

The reason that they use Double is because in order to get the best accuracy, using double precision floating point values allows them use significant digit calculations with the smallest loss of precision due to rounding as possible.

Float would be fine for a home work problem, but I would not use it when calculating vectors on a trip to mars.

Languages have limits on their implementations of data types.
In C the "#include <limits.h>" brings in defined values that tell you how much precision a datatype has.

Number of accurate digits in floating point from HP UX (UNIX) C:

FLT_DIG 6 digits of precision
DBL_DIG 15 digits of precision

Which would you rather have when it takes no more CPU (with a floating point processor) to do "double" math or "float" math operations?

Different compilers, in fact, regard the types "float" and "double" differently. On most compilers, floats are 4 bytes and doubles are 8 bytes.

Also, double-precision arithmetic is certainly slower than single-precision arithmetic. Modern processors have vector-math units (MMX, SSE, etc.), the use of which is scheduled by your compiler. You can do twice as many single-precision operations per unit time as double-precision operations with these vector-math units.

I disagree with your statement that double precision is always slower than single. If your model were correct everywhere, the following assembler sequences would never occur.

HPUX V-class PA-RISC boxes preferentially use double precision FP
operations, because they are more efficient. I've been on other platforms
where this is also true. Here is a concrete HPUX 11.00 example.

Consider some C code compiled with cc -S myfile.c -DTYPE=<float or double, see below> to create ASM:

Code:
#include <math.h>
/* TYPE defined by -DTYPE=float or -DTYPE=double */
TYPE process(TYPE a, TYPE b)
{
TYPE tmp=a;
tmp -=.05;
tmp*=.5;
tmp+=b;
if ( fabs(a+b)>0.) tmp/=(a+b);
return tmp;
}
On HPUX V-class, PA-RISC boxes, when compiled with -DTYPE=double
This is the assember produced when the compiler is dealing with double datatypes,
I put some *'s in front of one area of interest. Note there are no FCNV calls,
and all FP operations (FMPY FSUB, etc) are on double precision FP numbers.
Code:
process
.PROC
.CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE
.ENTRY
STW     %r2,-20(%r30)   ;offset 0x0
LDO     64(%r30),%r30   ;offset 0x4
*       FSTD    %fr5,-104(%r30) ;offset 0x8
*       FSTD    %fr7,-112(%r30) ;offset 0xc
*       FLDD    -104(%r30),%fr4 ;offset 0x10
*       FSTD    %fr4,-56(%r30)  ;offset 0x14
*       FLDD    -56(%r30),%fr5  ;offset 0x18
*       LDIL    LR'S$6$process,%r1      ;offset 0x1c
*       FLDD    RR'S$6$process(%r1),%fr6        ;offset 0x20
*       FSUB,DBL        %fr5,%fr6,%fr7  ;offset 0x24
FSTD    %fr7,-56(%r30)  ;offset 0x28
FLDD    -56(%r30),%fr8  ;offset 0x2c
LDIL    LR'S$6$process,%r31     ;offset 0x30
FLDD    RR'S$6$process+8(%r31),%fr9     ;offset 0x34
FMPY,DBL        %fr8,%fr9,%fr10 ;offset 0x38
FSTD    %fr10,-56(%r30) ;offset 0x3c
FLDD    -56(%r30),%fr11 ;offset 0x40
FLDD    -112(%r30),%fr22        ;offset 0x44
FSTD    %fr23,-56(%r30) ;offset 0x4c
FLDD    -104(%r30),%fr24        ;offset 0x50
FLDD    -112(%r30),%fr25        ;offset 0x54
LDIL    L'fabs,%r31     ;offset 0x5c
.CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x60
COPY    %r31,%r2        ;offset 0x64
FCPY,DBL        %fr0,%fr26      ;offset 0x68
FCMP,DBL,>      %fr4,%fr26      ;offset 0x6c
FTEST           ;offset 0x70
B,N     $00000001 ;offset 0x74 FLDD -104(%r30),%fr27 ;offset 0x78 FLDD -112(%r30),%fr28 ;offset 0x7c FADD,DBL %fr27,%fr28,%fr29 ;offset 0x80 FLDD -56(%r30),%fr30 ;offset 0x84 FDIV,DBL %fr30,%fr29,%fr31 ;offset 0x88 .............. code omitted to save space. When the same code is compiled with -DTYPE=float, note that FCNV is called to convert float to double, and FMPY and FSUB use double precision. FADD does not. Code: process .PROC .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE .ENTRY STW %r2,-20(%r30) ;offset 0x0 LDO 64(%r30),%r30 ;offset 0x4 * FSTW %fr4L,-100(%r30) ;offset 0x8 * FSTW %fr5L,-104(%r30) ;offset 0xc * FLDW -100(%r30),%fr4L ;offset 0x10 * FSTW %fr4L,-56(%r30) ;offset 0x14 * FLDW -56(%r30),%fr4R ;offset 0x18 * FCNV,SGL,DBL %fr4R,%fr4 ;offset 0x1c * LDIL LR'S$6$process,%r1 ;offset 0x20 * FLDD RR'S$6$process(%r1),%fr5 ;offset 0x24 * FSUB,DBL %fr4,%fr5,%fr6 ;offset 0x28 * FCNV,DBL,SGL %fr6,%fr5L ;offset 0x2c FSTW %fr5L,-56(%r30) ;offset 0x30 FLDW -56(%r30),%fr5R ;offset 0x34 FCNV,SGL,DBL %fr5R,%fr7 ;offset 0x38 LDIL LR'S$6$process,%r31 ;offset 0x3c FLDD RR'S$6$process+8(%r31),%fr8 ;offset 0x40 FMPY,DBL %fr7,%fr8,%fr9 ;offset 0x44 FCNV,DBL,SGL %fr9,%fr6L ;offset 0x48 FSTW %fr6L,-56(%r30) ;offset 0x4c FLDW -56(%r30),%fr6R ;offset 0x50 FLDW -104(%r30),%fr7L ;offset 0x54 FADD,SGL %fr6R,%fr7L,%fr7R ;offset 0x58 FSTW %fr7R,-56(%r30) ;offset 0x5c FLDW -100(%r30),%fr8L ;offset 0x60 FLDW -104(%r30),%fr8R ;offset 0x64 FADD,SGL %fr8L,%fr8R,%fr9L ;offset 0x68 FCNV,SGL,DBL %fr9L,%fr5 ;offset 0x6c LDIL L'fabs,%r31 ;offset 0x70 .CALL ARGW0=FR,ARGW1=FU,RTNVAL=FU ;fpin=105;fpout=104; BE,L R'fabs(%sr4,%r31),%r31 ;offset 0x74 COPY %r31,%r2 ;offset 0x78 FCPY,DBL %fr0,%fr10 ;offset 0x7c FCMP,DBL,> %fr4,%fr10 ;offset 0x80 FTEST ;offset 0x84 B,N$00000001       ;offset 0x88
............. code omitted
Please notice that floats are converted to doubles (FCNV mnemonic is float convert)
before most arithmetic operations on floats.

chroot
Staff Emeritus