Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

C/++/# Precision double in C/C++?

  1. Oct 25, 2016 #1

    ORF

    User Avatar

    Hello

    I am currently using double type for numbers with the format 6f8 (6 digits before the point, 8 digits after the point).

    Double structure is enough for this format, or should I use int64_t instead?

    Thank you in advance :)

    Greetings.
    PS: I tried with this example code, but the result it's a bit strange for me... (precision is lost for 4-digit double ? )
    Code (C):

    #include <iostream>
    #include "stdint.h"

    int main()
    {
      std::cout << "check precision of Double vs uint64_t\n";
      uint64_t k(1);
      while( k < 1e14 )
      {
          k=k*3+1;
         //-- check precision of myDouble
         double myDouble( k/1e8);
         uint64_t test( 1e8 * myDouble );
         if( k != test ) std::cout << k << "\t" << test << std::endl;
       
      }
      return 0;
    }
     
    http://cpp.sh/8zqfx
    The output I got is:
    check precision of Double vs uint64_t
    3280 3279
    3812798742493 3812798742492
     
  2. jcsd
  3. Oct 25, 2016 #2

    rbelli1

    User Avatar
    Gold Member

    double is not a structure. It is an IEEE754 floating point type. Conversion in C++ is truncation so if you actual number comes out to be 3812798742492.999 then you will still get 3812798742492 when converted to uint64_t. Try printing out the double value too and see what happens.

    BoB
     
  4. Oct 25, 2016 #3

    ORF

    User Avatar

    Hello

    You were right: without conversion (truncation) it seems that the number is the same
    Code (C):
    #include <iostream>
    #include "stdint.h"
    #include <iomanip>
    int main()
    {
      std::cout << "check precision of Double vs uint64_t\n";
      uint64_t k(1);
      while( k < 1e14 )
      {
          k=k*3+1;
         //-- check precision of myDouble
         double myDouble( k/1e8);
         uint64_t test( 1e8 * myDouble );
         if( k != test ) std::cout << std::setprecision(16) << k << "\t" << myDouble*1e8 << std::endl;
         
      }
      return 0;
    }
     
    So, does it mean that no-digit is lost with 6f8 format using doubles? Where is the limit? (in cases that k > 1e15, the last digits are lost...)

    Thank you for your time :)

    Greetings
     
  5. Oct 25, 2016 #4

    rcgldr

    User Avatar
    Homework Helper

    IEEE754 has a 52 bit mantissa, good enough for 15 digits ((log10(2^52) = 15.6...), so 14 digits (6f8) shouldn't be an issue if you round instead of truncate.

    Code (Text):

    #include <iostream>

    typedef unsigned long long uint64_t;

    int main()
    {
        std::cout << "check precision of Double vs uint64_t\n";
        uint64_t k(1);
        while( k < 1e14 )
        {
            k=k*3+1;
            //-- check precision of myDouble
            double myDouble = k/1e8;    // 1e8 is a double
            uint64_t test = (uint64_t)( 1e8 * myDouble + 0.5);  // + 0.5 for round
            if( k != test ) std::cout << k << "\t" << test << std::endl;
        }
        return 0;
    }
     
     
    Last edited: Oct 27, 2016
  6. Oct 27, 2016 #5
    Wait, your types are all wrong.

    You can't just mix types like that, you have to cast them when you do any calculation or else division or multiplication with an integer will result in another integer THEN get cast as a double. You code is ambiguous as best.

    Also, have you checked to see if long double is present in your compiler? It's not standard (it might be after c++11) but most compilers support it as either a 96 or 128 bit double. Use sizeof(long double) to check, it'll give you bytes.
     
  7. Oct 27, 2016 #6

    Mark44

    Staff: Mentor

    Which post are you replying to?
    I don't think this is correct. Long ago, the Borland C/C++ compiler I had distinguished between double and long double as 64 bits and 80 bits, respectively. The Microsoft compiler I have now (VS 2015) supports both, but both are the same size - 64 bits.

    Microsoft (and possibly others) have a __m128 data type that can be used with Streaming SIMD Extensions 2 (SSE2) intrinsics, but this is different from float, double, and long double.
     
  8. Oct 27, 2016 #7
    Long double should be available in any C89 (or C90, C95, C99) compatible compiler. However technically long double could be equivalent to the float data type (it must not be worse than a double which must not be worse than a float). The C standards for floating point datatypes are very forgiving for doubles and long double so a compiler writer can implement a wide variety of solutions.

    The range requirements for double/long double are for instance the same as for float (6 decimal digits) and precision requirements for double/long double are only specified as 10 decimal digits which is in-between the commonally used IEEE-754 binary32 (6 digit) and binary64 (15 digits) datatypes. I did some quick calculations and it looks like even 56 bits is actually overkill to satisfy the long double specification. So, usually float and double are IEEE-754 binary32 and binary64 respectivly but long double could be anything (but usually not worse than binary64)

    This reminds me of an issue we had at work where some code failed because char turned out to be a 32 bit integer datatype on a DSP platform. It is an unusual but valid implementation ... the joys of the C standard.

    If you want to marvel at it the C99 standards can be found here: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf and the IEEE-754-2008 standard can be found here: http://www.csee.umbc.edu/~tsimo1/CMSC455/IEEE-754-2008.pdf [Broken]
     
    Last edited by a moderator: May 8, 2017
  9. Oct 27, 2016 #8

    rcgldr

    User Avatar
    Homework Helper

    1e8 is a double (at least with Visual Studio, so in the case of k / 1e8, k gets promoted to a double. I updated post #4 with a comment to note this.

    The older 16 bit Microsoft compilers use 80 bit floating point format for long doubles, but 32/64 bit Microsoft/Visual Studio compilers use 64 bit floating point for long doubles, the same as regular doubles.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Precision double in C/C++?
  1. Setting precision? C++ (Replies: 5)

  2. To C or not to C (Replies: 31)

  3. C or C++? (Replies: 8)

Loading...