Data type: Float Point or Double?

AI Thread Summary
The discussion centers on the use of different data types in programming, particularly the comparison between float and double types. Double precision is favored in computational physics for its accuracy, allowing for significant digit calculations with minimal rounding errors. While float can suffice for simpler tasks, it is deemed inadequate for critical calculations, such as those needed for space missions. The conversation highlights that the size of float and double types varies across compilers, with floats typically being 4 bytes and doubles 8 bytes. Although double precision arithmetic is generally slower, some processors may optimize double operations, making them more efficient in specific contexts. The debate also touches on how different compilers handle these data types, with examples from assembly code illustrating the performance implications of using float versus double. Ultimately, the choice between these data types depends on the required precision and the computational context.
dimensionless
Messages
460
Reaction score
1
I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?
 
Technology news on Phys.org
dimensionless said:
I'm sure one could find a reason to use any data type. There is a lot you can do with just intergers. I do, however, have a computational physics book and they make wide use of the type double. Could float be used as an alternative?

The reason that they use Double is because in order to get the best accuracy, using double precision floating point values allows them use significant digit calculations with the smallest loss of precision due to rounding as possible.

Float would be fine for a home work problem, but I would not use it when calculating vectors on a trip to mars.
 
Languages have limits on their implementations of data types.
In C the "#include <limits.h>" brings in defined values that tell you how much precision a datatype has.

Number of accurate digits in floating point from HP UX (UNIX) C:

FLT_DIG 6 digits of precision
DBL_DIG 15 digits of precision

Which would you rather have when it takes no more CPU (with a floating point processor) to do "double" math or "float" math operations?
 
Jim,

Different compilers, in fact, regard the types "float" and "double" differently. On most compilers, floats are 4 bytes and doubles are 8 bytes.

Also, double-precision arithmetic is certainly slower than single-precision arithmetic. Modern processors have vector-math units (MMX, SSE, etc.), the use of which is scheduled by your compiler. You can do twice as many single-precision operations per unit time as double-precision operations with these vector-math units.

- Warren
 
chroot -

I disagree with your statement that double precision is always slower than single. If your model were correct everywhere, the following assembler sequences would never occur.

HPUX V-class PA-RISC boxes preferentially use double precision FP
operations, because they are more efficient. I've been on other platforms
where this is also true. Here is a concrete HPUX 11.00 example.

Consider some C code compiled with cc -S myfile.c -DTYPE=<float or double, see below> to create ASM:

Code:
#include <math.h>
/* TYPE defined by -DTYPE=float or -DTYPE=double */
TYPE process(TYPE a, TYPE b)
{
    TYPE tmp=a;
    tmp -=.05;
    tmp*=.5;
    tmp+=b;
    if ( fabs(a+b)>0.) tmp/=(a+b);
    return tmp;
}
On HPUX V-class, PA-RISC boxes, when compiled with -DTYPE=double
This is the assember produced when the compiler is dealing with double datatypes,
I put some *'s in front of one area of interest. Note there are no FCNV calls,
and all FP operations (FMPY FSUB, etc) are on double precision FP numbers.
Code:
process
        .PROC
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE
        .ENTRY
        STW     %r2,-20(%r30)   ;offset 0x0
        LDO     64(%r30),%r30   ;offset 0x4
*       FSTD    %fr5,-104(%r30) ;offset 0x8
*       FSTD    %fr7,-112(%r30) ;offset 0xc
*       FLDD    -104(%r30),%fr4 ;offset 0x10
*       FSTD    %fr4,-56(%r30)  ;offset 0x14
*       FLDD    -56(%r30),%fr5  ;offset 0x18
*       LDIL    LR'S$6$process,%r1      ;offset 0x1c
*       FLDD    RR'S$6$process(%r1),%fr6        ;offset 0x20
*       FSUB,DBL        %fr5,%fr6,%fr7  ;offset 0x24
        FSTD    %fr7,-56(%r30)  ;offset 0x28
        FLDD    -56(%r30),%fr8  ;offset 0x2c
        LDIL    LR'S$6$process,%r31     ;offset 0x30
        FLDD    RR'S$6$process+8(%r31),%fr9     ;offset 0x34
        FMPY,DBL        %fr8,%fr9,%fr10 ;offset 0x38
        FSTD    %fr10,-56(%r30) ;offset 0x3c
        FLDD    -56(%r30),%fr11 ;offset 0x40
        FLDD    -112(%r30),%fr22        ;offset 0x44
        FADD,DBL        %fr11,%fr22,%fr23       ;offset 0x48
        FSTD    %fr23,-56(%r30) ;offset 0x4c
        FLDD    -104(%r30),%fr24        ;offset 0x50
        FLDD    -112(%r30),%fr25        ;offset 0x54
        FADD,DBL        %fr24,%fr25,%fr5        ;offset 0x58
        LDIL    L'fabs,%r31     ;offset 0x5c
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x60
        COPY    %r31,%r2        ;offset 0x64
        FCPY,DBL        %fr0,%fr26      ;offset 0x68
        FCMP,DBL,>      %fr4,%fr26      ;offset 0x6c
        FTEST           ;offset 0x70
        B,N     $00000001       ;offset 0x74
        FLDD    -104(%r30),%fr27        ;offset 0x78
        FLDD    -112(%r30),%fr28        ;offset 0x7c
        FADD,DBL        %fr27,%fr28,%fr29       ;offset 0x80
        FLDD    -56(%r30),%fr30 ;offset 0x84
        FDIV,DBL        %fr30,%fr29,%fr31       ;offset 0x88
        ..... code omitted to save space.

When the same code is compiled with -DTYPE=float, note that FCNV is called to convert float to double, and FMPY and FSUB use double precision. FADD does not.
Code:
process                                                             
        .PROC                                                       
        .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED,ORDERING_AWARE 
        .ENTRY                                                      
        STW     %r2,-20(%r30)   ;offset 0x0                         
        LDO     64(%r30),%r30   ;offset 0x4                         
*       FSTW    %fr4L,-100(%r30)        ;offset 0x8                 
*       FSTW    %fr5L,-104(%r30)        ;offset 0xc                 
*       FLDW    -100(%r30),%fr4L        ;offset 0x10                
*       FSTW    %fr4L,-56(%r30) ;offset 0x14                        
*       FLDW    -56(%r30),%fr4R ;offset 0x18                        
*       FCNV,SGL,DBL    %fr4R,%fr4      ;offset 0x1c                
*       LDIL    LR'S$6$process,%r1      ;offset 0x20                
*       FLDD    RR'S$6$process(%r1),%fr5        ;offset 0x24        
*       FSUB,DBL        %fr4,%fr5,%fr6  ;offset 0x28                
*       FCNV,DBL,SGL    %fr6,%fr5L      ;offset 0x2c                
        FSTW    %fr5L,-56(%r30) ;offset 0x30                        
        FLDW    -56(%r30),%fr5R ;offset 0x34                        
        FCNV,SGL,DBL    %fr5R,%fr7      ;offset 0x38                
        LDIL    LR'S$6$process,%r31     ;offset 0x3c                
        FLDD    RR'S$6$process+8(%r31),%fr8     ;offset 0x40        
        FMPY,DBL        %fr7,%fr8,%fr9  ;offset 0x44                
        FCNV,DBL,SGL    %fr9,%fr6L      ;offset 0x48                
        FSTW    %fr6L,-56(%r30) ;offset 0x4c                        
        FLDW    -56(%r30),%fr6R ;offset 0x50                        
        FLDW    -104(%r30),%fr7L        ;offset 0x54                
        FADD,SGL        %fr6R,%fr7L,%fr7R       ;offset 0x58        
        FSTW    %fr7R,-56(%r30) ;offset 0x5c                        
        FLDW    -100(%r30),%fr8L        ;offset 0x60                
        FLDW    -104(%r30),%fr8R        ;offset 0x64                
        FADD,SGL        %fr8L,%fr8R,%fr9L       ;offset 0x68        
        FCNV,SGL,DBL    %fr9L,%fr5      ;offset 0x6c                
        LDIL    L'fabs,%r31     ;offset 0x70                        
        .CALL   ARGW0=FR,ARGW1=FU,RTNVAL=FU     ;fpin=105;fpout=104;
        BE,L    R'fabs(%sr4,%r31),%r31  ;offset 0x74                
        COPY    %r31,%r2        ;offset 0x78                        
        FCPY,DBL        %fr0,%fr10      ;offset 0x7c                
        FCMP,DBL,>      %fr4,%fr10      ;offset 0x80                
        FTEST           ;offset 0x84                                
        B,N     $00000001       ;offset 0x88   
        .... code omitted
Please notice that floats are converted to doubles (FCNV mnemonic is float convert)
before most arithmetic operations on floats.
 
Jim,

I never said double-precision is always slower than single-precision. I just took offense to your statement that single-precision is never faster than double-precision. I don't have much experience with the PA-RISC instruction set, but, on many processors, under many conditions, single-precision is indeed faster than double-precision.

- Warren
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...
Back
Top