Summation function to minimize rounding issues

Click For Summary
SUMMARY

The discussion focuses on a C++ class designed to minimize rounding errors during the summation of double-precision floating point numbers. The class, named NUM, utilizes an array of 2048 doubles indexed by exponent to ensure that only numbers with the same exponent are added together. Key functions include NUM::clear() for resetting the array, NUM::addnum() for adding numbers, and NUM::getsum() for retrieving the current sum. A source file with example test code is provided, demonstrating the class's functionality in handling large sums.

PREREQUISITES
  • Understanding of C++ programming language
  • Familiarity with floating point arithmetic and its limitations
  • Knowledge of data structures, specifically arrays
  • Experience with numerical methods for error minimization
NEXT STEPS
  • Explore advanced floating point precision techniques in C++
  • Research the implications of floating point errors in financial calculations
  • Learn about the performance characteristics of the SPARC64 VIIIfx CPU
  • Examine the white paper on variable floating point precision and rounding
USEFUL FOR

Software developers, particularly those working with numerical computations, financial analysts dealing with precision issues, and researchers interested in advanced floating point processing techniques.

rcgldr
Homework Helper
Messages
8,948
Reaction score
687
This is a C++ class to be used for summation of doubles (floating point). It uses an array of 2048 doubles indexed by exponent to minimize rounding errors by only adding numbers that have the same exponent.

NUM::NUM - array is cleared out when an instance of NUM is created
NUM::clear() - clears out the array
NUM::addnum() - add a number to the array
NUM::getsum() - returns the current sum of the array

link to zip of source file with example test code that adds 1./5. 4,294,967,295 (2^32-1) times and displays a sum using normal addition and using the SUM class.

http://rcgldr.net/misc/sum.zip

This is the key part of the SUM class, the addnum() function:

Code:
void SUM::addnum(double d)      // add a number into the array
{
size_t i;

    while(1){
//      i = exponent of d
        i = ((size_t)((*(unsigned long long *)&d)>>52))&0x7ff;
        if(i == 0x7ff){ // max exponent, could be overflow
            asum[i] += d;
            return;
        }
        if(this->asum[i] == 0){ // if empty slot store d
            asum[i] = d;
            return;
        }
        d += asum[i];           // else add slot to d, clear slot
        asum[i] = 0.;           // and continue until empty slot
    }
}
 
  • Like
Likes   Reactions: jim mcnamara
Technology news on Phys.org
https://www.extremetech.com/computing/272558-japan-tests-silicon-for-exascale-computing-in-2021Futjitsu has a line of CPU's (One for historical SPARC: SPARC64 VIIIfx ), the newest ones are mentioned in the article above. CPU natively supports a variable floating point precision - with correct rounding. It would be interesting to read the white paper behind this effort.

While what you provide is very useful, I would never put it to work with production code without really extensive testing. Which means, in practice, I could not deploy it. Floating point is the bane of financial calculations, as you obviously know.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
11K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 23 ·
Replies
23
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K