What is the difference between double and int64_t in C/C++ for precision?

ORF · Oct 25, 2016

Hello

I am currently using double type for numbers with the format 6f8 (6 digits before the point, 8 digits after the point).

Double structure is enough for this format, or should I use int64_t instead?

Thank you in advance :)

Greetings.
PS: I tried with this example code, but the result it's a bit strange for me... (precision is lost for 4-digit double ? )

Code:

#include <iostream>
#include "stdint.h"

int main()
{
  std::cout << "check precision of Double vs uint64_t\n";
  uint64_t k(1);
  while( k < 1e14 )
  {
      k=k*3+1;
     //-- check precision of myDouble
     double myDouble( k/1e8);
     uint64_t test( 1e8 * myDouble );
     if( k != test ) std::cout << k << "\t" << test << std::endl;
    
  }
  return 0;
}

http://cpp.sh/8zqfx
The output I got is:
check precision of Double vs uint64_t
3280 3279
3812798742493 3812798742492

rbelli1 · Oct 25, 2016

double is not a structure. It is an IEEE754 floating point type. Conversion in C++ is truncation so if you actual number comes out to be 3812798742492.999 then you will still get 3812798742492 when converted to uint64_t. Try printing out the double value too and see what happens.

BoB

ORF · Oct 25, 2016

Hello

You were right: without conversion (truncation) it seems that the number is the same

Code:

#include <iostream>
#include "stdint.h"
#include <iomanip>
int main()
{
  std::cout << "check precision of Double vs uint64_t\n";
  uint64_t k(1);
  while( k < 1e14 )
  {
      k=k*3+1;
     //-- check precision of myDouble
     double myDouble( k/1e8);
     uint64_t test( 1e8 * myDouble );
     if( k != test ) std::cout << std::setprecision(16) << k << "\t" << myDouble*1e8 << std::endl;
     
  }
  return 0;
}

So, does it mean that no-digit is lost with 6f8 format using doubles? Where is the limit? (in cases that k > 1e15, the last digits are lost...)

Thank you for your time :)

Greetings

rcgldr · Oct 25, 2016

IEEE754 has a 52 bit mantissa, good enough for 15 digits ((log10(2^52) = 15.6...), so 14 digits (6f8) shouldn't be an issue if you round instead of truncate.

Code:

#include <iostream>

typedef unsigned long long uint64_t;

int main()
{
    std::cout << "check precision of Double vs uint64_t\n";
    uint64_t k(1);
    while( k < 1e14 )
    {
        k=k*3+1;
        //-- check precision of myDouble
        double myDouble = k/1e8;    // 1e8 is a double
        uint64_t test = (uint64_t)( 1e8 * myDouble + 0.5);  // + 0.5 for round
        if( k != test ) std::cout << k << "\t" << test << std::endl;
    }
    return 0;
}

newjerseyrunner · Oct 27, 2016

Wait, your types are all wrong.

You can't just mix types like that, you have to cast them when you do any calculation or else division or multiplication with an integer will result in another integer THEN get cast as a double. You code is ambiguous as best.

Also, have you checked to see if long double is present in your compiler? It's not standard (it might be after c++11) but most compilers support it as either a 96 or 128 bit double. Use sizeof(long double) to check, it'll give you bytes.

Mark44 · Oct 27, 2016

newjerseyrunner said:

Wait, your types are all wrong.

Which post are you replying to?

newjerseyrunner said:

You can't just mix types like that, you have to cast them when you do any calculation or else division or multiplication with an integer will result in another integer THEN get cast as a double. You code is ambiguous as best.

Also, have you checked to see if long double is present in your compiler? It's not standard (it might be after c++11) but most compilers support it as either a 96 or 128 bit double.

I don't think this is correct. Long ago, the Borland C/C++ compiler I had distinguished between double and long double as 64 bits and 80 bits, respectively. The Microsoft compiler I have now (VS 2015) supports both, but both are the same size - 64 bits.

Microsoft (and possibly others) have a __m128 data type that can be used with Streaming SIMD Extensions 2 (SSE2) intrinsics, but this is different from float, double, and long double.

newjerseyrunner said:

Use sizeof(long double) to check, it'll give you bytes.

glappkaeft · Oct 27, 2016

newjerseyrunner said:

Also, have you checked to see if long double is present in your compiler? It's not standard (it might be after c++11) but most compilers support it as either a 96 or 128 bit double. Use sizeof(long double) to check, it'll give you bytes.

Long double should be available in any C89 (or C90, C95, C99) compatible compiler. However technically long double could be equivalent to the float data type (it must not be worse than a double which must not be worse than a float). The C standards for floating point datatypes are very forgiving for doubles and long double so a compiler writer can implement a wide variety of solutions.

The range requirements for double/long double are for instance the same as for float (6 decimal digits) and precision requirements for double/long double are only specified as 10 decimal digits which is in-between the commonally used IEEE-754 binary32 (6 digit) and binary64 (15 digits) datatypes. I did some quick calculations and it looks like even 56 bits is actually overkill to satisfy the long double specification. So, usually float and double are IEEE-754 binary32 and binary64 respectivly but long double could be anything (but usually not worse than binary64)

This reminds me of an issue we had at work where some code failed because char turned out to be a 32 bit integer datatype on a DSP platform. It is an unusual but valid implementation ... the joys of the C standard.

If you want to marvel at it the C99 standards can be found here: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf and the IEEE-754-2008 standard can be found here: http://www.csee.umbc.edu/~tsimo1/CMSC455/IEEE-754-2008.pdf

rcgldr · Oct 27, 2016

newjerseyrunner said:

Wait, your types are all wrong.

1e8 is a double (at least with Visual Studio, so in the case of k / 1e8, k gets promoted to a double. I updated post #4 with a comment to note this.

long double

The older 16 bit Microsoft compilers use 80 bit floating point format for long doubles, but 32/64 bit Microsoft/Visual Studio compilers use 64 bit floating point for long doubles, the same as regular doubles.

What is the difference between double and int64_t in C/C++ for precision?

1. What is precision double in C/C++?

2. How is precision double different from float?

3. When should I use precision double in my code?

4. How do I declare a precision double variable in C/C++?

5. Are there any drawbacks to using precision double?

Similar threads

Hot Threads

Recent Insights