Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Data type: Float Point or Double?

  1. Dec 23, 2005 #1
    I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?
  2. jcsd
  3. Dec 23, 2005 #2
    The reason that they use Double is because in order to get the best accuracy, using double precision floating point values allows them use significant digit calculations with the smallest loss of precision due to rounding as possible.

    Float would be fine for a home work problem, but I would not use it when calculating vectors on a trip to mars.
  4. Dec 28, 2005 #3

    jim mcnamara

    User Avatar
    Science Advisor
    Gold Member

    Languages have limits on their implementations of data types.
    In C the "#include <limits.h>" brings in defined values that tell you how much precision a datatype has.

    Number of accurate digits in floating point from HP UX (UNIX) C:

    FLT_DIG 6 digits of precision
    DBL_DIG 15 digits of precision

    Which would you rather have when it takes no more CPU (with a floating point processor) to do "double" math or "float" math operations?
  5. Dec 28, 2005 #4


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member


    Different compilers, in fact, regard the types "float" and "double" differently. On most compilers, floats are 4 bytes and doubles are 8 bytes.

    Also, double-precision arithmetic is certainly slower than single-precision arithmetic. Modern processors have vector-math units (MMX, SSE, etc.), the use of which is scheduled by your compiler. You can do twice as many single-precision operations per unit time as double-precision operations with these vector-math units.

    - Warren
  6. Dec 30, 2005 #5

    jim mcnamara

    User Avatar
    Science Advisor
    Gold Member

    chroot -

    I disagree with your statement that double precision is always slower than single. If your model were correct everywhere, the following assembler sequences would never occur.

    HPUX V-class PA-RISC boxes preferentially use double precision FP
    operations, because they are more efficient. I've been on other platforms
    where this is also true. Here is a concrete HPUX 11.00 example.

    Consider some C code compiled with cc -S myfile.c -DTYPE=<float or double, see below> to create ASM:

    Code (Text):

    #include <math.h>
    /* TYPE defined by -DTYPE=float or -DTYPE=double */
    TYPE process(TYPE a, TYPE b)
        TYPE tmp=a;
        tmp -=.05;
        if ( fabs(a+b)>0.) tmp/=(a+b);
        return tmp;

    On HPUX V-class, PA-RISC boxes, when compiled with -DTYPE=double
    This is the assember produced when the compiler is dealing with double datatypes,
    I put some *'s in front of one area of interest. Note there are no FCNV calls,
    and all FP operations (FMPY FSUB, etc) are on double precision FP numbers.
    Code (Text):

            STW     %r2,-20(%r30)   ;offset 0x0
            LDO     64(%r30),%r30   ;offset 0x4
    *       FSTD    %fr5,-104(%r30) ;offset 0x8
    *       FSTD    %fr7,-112(%r30) ;offset 0xc
    *       FLDD    -104(%r30),%fr4 ;offset 0x10
    *       FSTD    %fr4,-56(%r30)  ;offset 0x14
    *       FLDD    -56(%r30),%fr5  ;offset 0x18
    *       LDIL    LR'S$6$process,%r1      ;offset 0x1c
    *       FLDD    RR'S$6$process(%r1),%fr6        ;offset 0x20
    *       FSUB,DBL        %fr5,%fr6,%fr7  ;offset 0x24
            FSTD    %fr7,-56(%r30)  ;offset 0x28
            FLDD    -56(%r30),%fr8  ;offset 0x2c
            LDIL    LR'S$6$process,%r31     ;offset 0x30
            FLDD    RR'S$6$process+8(%r31),%fr9     ;offset 0x34
            FMPY,DBL        %fr8,%fr9,%fr10 ;offset 0x38
            FSTD    %fr10,-56(%r30) ;offset 0x3c
            FLDD    -56(%r30),%fr11 ;offset 0x40
            FLDD    -112(%r30),%fr22        ;offset 0x44
            FADD,DBL        %fr11,%fr22,%fr23       ;offset 0x48
            FSTD    %fr23,-56(%r30) ;offset 0x4c
            FLDD    -104(%r30),%fr24        ;offset 0x50
            FLDD    -112(%r30),%fr25        ;offset 0x54
            FADD,DBL        %fr24,%fr25,%fr5        ;offset 0x58
            LDIL    L'fabs,%r31     ;offset 0x5c
            .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
            BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x60
            COPY    %r31,%r2        ;offset 0x64
            FCPY,DBL        %fr0,%fr26      ;offset 0x68
            FCMP,DBL,>      %fr4,%fr26      ;offset 0x6c
            FTEST           ;offset 0x70
            B,N     $00000001       ;offset 0x74
            FLDD    -104(%r30),%fr27        ;offset 0x78
            FLDD    -112(%r30),%fr28        ;offset 0x7c
            FADD,DBL        %fr27,%fr28,%fr29       ;offset 0x80
            FLDD    -56(%r30),%fr30 ;offset 0x84
            FDIV,DBL        %fr30,%fr29,%fr31       ;offset 0x88
            .............. code omitted to save space.
    When the same code is compiled with -DTYPE=float, note that FCNV is called to convert float to double, and FMPY and FSUB use double precision. FADD does not.
    Code (Text):

            STW     %r2,-20(%r30)   ;offset 0x0                        
            LDO     64(%r30),%r30   ;offset 0x4                        
    *       FSTW    %fr4L,-100(%r30)        ;offset 0x8                
    *       FSTW    %fr5L,-104(%r30)        ;offset 0xc                
    *       FLDW    -100(%r30),%fr4L        ;offset 0x10                
    *       FSTW    %fr4L,-56(%r30) ;offset 0x14                        
    *       FLDW    -56(%r30),%fr4R ;offset 0x18                        
    *       FCNV,SGL,DBL    %fr4R,%fr4      ;offset 0x1c                
    *       LDIL    LR'S$6$process,%r1      ;offset 0x20                
    *       FLDD    RR'S$6$process(%r1),%fr5        ;offset 0x24        
    *       FSUB,DBL        %fr4,%fr5,%fr6  ;offset 0x28                
    *       FCNV,DBL,SGL    %fr6,%fr5L      ;offset 0x2c                
            FSTW    %fr5L,-56(%r30) ;offset 0x30                        
            FLDW    -56(%r30),%fr5R ;offset 0x34                        
            FCNV,SGL,DBL    %fr5R,%fr7      ;offset 0x38                
            LDIL    LR'S$6$process,%r31     ;offset 0x3c                
            FLDD    RR'S$6$process+8(%r31),%fr8     ;offset 0x40        
            FMPY,DBL        %fr7,%fr8,%fr9  ;offset 0x44                
            FCNV,DBL,SGL    %fr9,%fr6L      ;offset 0x48                
            FSTW    %fr6L,-56(%r30) ;offset 0x4c                        
            FLDW    -56(%r30),%fr6R ;offset 0x50                        
            FLDW    -104(%r30),%fr7L        ;offset 0x54                
            FADD,SGL        %fr6R,%fr7L,%fr7R       ;offset 0x58        
            FSTW    %fr7R,-56(%r30) ;offset 0x5c                        
            FLDW    -100(%r30),%fr8L        ;offset 0x60                
            FLDW    -104(%r30),%fr8R        ;offset 0x64                
            FADD,SGL        %fr8L,%fr8R,%fr9L       ;offset 0x68        
            FCNV,SGL,DBL    %fr9L,%fr5      ;offset 0x6c                
            LDIL    L'fabs,%r31     ;offset 0x70                        
            .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
            BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x74                
            COPY    %r31,%r2        ;offset 0x78                        
            FCPY,DBL        %fr0,%fr10      ;offset 0x7c                
            FCMP,DBL,>      %fr4,%fr10      ;offset 0x80                
            FTEST           ;offset 0x84                                
            B,N     $00000001       ;offset 0x88  
            ............. code omitted
    Please notice that floats are converted to doubles (FCNV mnemonic is float convert)
    before most arithmetic operations on floats.
  7. Dec 30, 2005 #6


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member


    I never said double-precision is always slower than single-precision. I just took offense to your statement that single-precision is never faster than double-precision. I don't have much experience with the PA-RISC instruction set, but, on many processors, under many conditions, single-precision is indeed faster than double-precision.

    - Warren
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?

Similar Discussions: Data type: Float Point or Double?
  1. Floating point error (Replies: 8)

  2. Matlab floating point (Replies: 6)

  3. Floating point numbers. (Replies: 18)