Why Are There Floating Point Errors When Parsing Data in Pandas?

Click For Summary
Floating point errors occur when parsing data from Excel into Pandas, leading to discrepancies like 25.15 being represented as 25.149999999999977. This issue arises because certain decimal numbers cannot be precisely represented in binary, a common problem in programming. Users expressed concerns about data integrity, emphasizing the need for accurate representation in databases. A proposed solution involves using Python's formatting functions to display floating point numbers correctly, although some argue this merely masks the underlying precision issue. Ultimately, while there are methods to mitigate the visibility of these errors, the fundamental challenge of floating point representation remains.
  • #31
Zap said:
I noticed that when reading the file, there are floating point errors in the results. For example, a cell that contains 25.15 will be 25.149999999999977 in the pandas data frame.
For double precision floating point; 25.15 = 25.149999999999977;
That is NOT an error. It is a failure to format and round the output.

If you look at the excel formatting information for numbers in the column that shows 25.15, you will probably see it is specified to display as "xxx.xx"; which will show the number rounded to 2 decimal places. Excel is storing the double precision number as 25.149999999999977 internally.

If you want to control the way a double precision number appears in pandas, you will need to read the pandas manual on display formatting, which probably uses the Round(,) format function. Like Excel, pandas is also storing the double precision number as 25.149999999999977 internally.
https://www.xspdf.com/resolution/51898492.html
 
  • Like
Likes lomidrevo, Vanadium 50 and BvU
Technology news on Phys.org
  • #32
Zap said:
Are you reading this thread?

Are you?

Lots of good stuff here.
 
  • #33
Zap said:
Are you reading this thread? Do you seriously think I meant an infinitely repeating number on a computer? I am not interested in increasing precision. Please read the thread before responding.
I have read the entire thread. No, I don't believe you meant an infinitely repeating number, but that's what you wrote:
Zap said:
even though the true value is something like 25.1499999999999 ...

As I recall, you have a degree in physics, so you should know exactly what the ellipsis means in 25.1499999999999 ...
 
  • #34
Strange.

There is a way to coerce Python to represent numbers with floating point errors as exact values, even though internally they are being stored as a binary approximation, as Baluncore has said.

It is not the same as rounding, because you cannot round to a number that is impossible to calculate in binary.

So for example, if you have a floating point number of 25.149999999999977, you cannot round it to 25.15, because 25.15 cannot be calculated exactly in binary. If you attempt to round it to 25.15, Python will display the value as 25.15, even though internally it is still something like 25.149999999999977. So, it's not really rounding but coercing Python to represent the binary approximation as an exact value.

However, as stated earlier, I don't want to use the round() function, because it's impossible to tell the computer in a simple way at which decimal place to round or which numbers have floating point errors and which do not, and I don't actually want to round numbers.

So, instead, there is a way to coerce Python to display certain numbers that contain floating point errors as exact results while keeping the original value intact, which is not rounding.
 
Last edited:
  • #35
Zap said:
So for example, if you have a floating point number of 25.149999999999977, you cannot round it to 25.15, because 25.15 cannot be calculated exactly in binary.
You're confusing the internal representation of a number (e.g., 25.149999999999977) with its displayed value, which in your example is the displayed value of 25.15.
The value that is displayed is a string of numerical digit characters together with a decimal point.
Zap said:
If you attempt to round it to 25.15, Python will display the value as 25.15, even though internally it is still something like 25.149999999999977. So, it's not really rounding but coercing Python to represent the binary approximation as an exact value.
The number 25.149999999999977 does not exist in memory. Instead, there is the binary representation, most likely in 64 bits. This 64-bit value is rounded or truncated to the desired precision and converted to a string of characters by print() or format(), but the underlying number is not changed.

The Python code I showed a while back shows a sequence of displayed values for the variable x, with varying numbers of decimal digits. At no time is the underlying value of x changed due to rounding or truncation. The only things that change are the (string) representations of this number according to how it's being formatted.
 
  • #36
Couple of things:

1) While Python (or rather 64-bit IEEE floats) can represent neither the numbers 25.15 nor 25.149999999999977 perfectly precisely, it can represent them both closely enough to distinguish them:
Python:
>>> x = 25.15
>>> y = 25.149999999999977
>>> x == y
False
>>> x - y
2.1316282072803006e-14
The closest floating-point approximations of these numbers are not the same.

2) Whatever that :n format directive is doing, it is clearly not trying to represent floating-point numbers to anywhere near machine precision:
Python:
>>> '{:n}'.format(1.23456789)
'1.23457'
Checking the documentation seems to confirm this:
'n'Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters.
and, just above:
'g'[...] With no precision given, uses a precision of 6 significant digits for float. [...]


 
  • Like
Likes berkeman, Jarvis323 and Mark44
  • #37
I used the :n formatting on the data set, and it appeared to totally fix the problem I was having. I don't know precisely what it is doing, but since it gave me the desired result, I went ahead and used it.

Mark44 said:
The value that is displayed is a string of numerical digit characters together with a decimal point.
The number 25.149999999999977 does not exist in memory. Instead, there is the binary representation, most likely in 64 bits. This 64-bit value is rounded or truncated to the desired precision and converted to a string of characters by print() or format(), but the underlying number is not changed.

The Python code I showed a while back shows a sequence of displayed values for the variable x, with varying numbers of decimal digits. At no time is the underlying value of x changed due to rounding or truncation. The only things that change are the (string) representations of this number according to how it's being formatted.

I thought that's what I was saying. I think the term for this is round tripping.
 
Last edited:
  • #38
Zap said:
You can't calculate 25.15 in binary.

This is untrue.

It is true that 25.15 cannot be represented exactly in a IEEE754 value.

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack1 the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

Zap said:
However, as stated earlier, I don't want to use the round() function, because it's impossible to tell the computer in a simple way at which decimal place to round

https://docs.python.org/3/library/functions.html#round
round(number[, ndigits])

This rounds towards even rather then up for the halfway values so may not give the desired results. EDIT: This is the behavior of formatting and printing of text.

Zap said:
or which numbers have floating point errors and which do not, and I don't actually want to round numbers.

You can leave the values in the database with full precision and only alter the displayed information.

BoB

1 If it walks like a rounding...
 
  • #39
rbelli1 said:
This is untrue.

It is true that 25.15 cannot be represented exactly in a IEEE754 value.

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack1 the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

I'm so confused lol. Is this not what I have been saying the entire time?
 
  • #40
Zap said:
I'm so confused lol. Is this not what I have been saying the entire time?
Not from my perspective. It seems to me you have consistently misinterpreted or misunderstood the difference between what is displayed versus the underlying number itself. At least that's how I see it. If that's not it, why have we spent 40 posts on this question?
 
  • Like
Likes rbelli1 and Baluncore
  • #41
I repeatedly said that I wanted the exact representation of the floating point number in Python, and repeatedly said that by exact representation I mean that the floating point number will be displayed as exact even though the actual number was a binary approximation. I think I've said that about 6 times by now, but everyone continued to argue with me. I have just been repeating this over and over again. I cannot comprehend what the argument is. I seems as though you guys are messing with me, at this point.

I solved my problem in like post #5. Even though it's not a perfect solution, it's better than using round() or simply leaving it as is.

And by not a perfect solution, I mean that I don't totally understand how the solution is working, but it seems to be doing what I want it to do 100% of the time so far. The issue is parsing Excel data into Python, and the floating point precision and also the display of the floating point number is not being transferred correctly between programs, which I consider an error. Using the aforementioned method, this error is being fixed.

What is the argument here?
 
Last edited:
  • #42
rbelli1 said:
If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack1 the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.
Zap said:
Interestingly, if you do float( '{:n}'.format( x ) ), the result is 25.15, even though x = 25.149999999999977. Python has some built-in way of outputting floats exactly, even though behind the scenes the displayed value 25.15 may be stored as something like 25.1499999999999 ...

Maybe, you can point out in these two quotes where I was wrong. And by 25.1499999999999 ... , I don't mean an infinitely repeating number ... I mean some kind of approximation to some number of digits ... Almost very post I've made in this thread has just been repeating what I said above over and over.
 
  • #43
I would say its time to close this thread with the understanding that:
  1. @Zap has found a workable solution to his problem,
  2. that there is a clear rift of understanding the meaning of mathematical real numbers and how they are represented in IEEE floating point format in memory and how computers can coax them into more human readable formats.

Thanks to all who have contributed here.

I leave these videos on floating point number for future generations of readers of this thread:





and here's a discussion of using Pandas for currency arithmetic:

https://beepscore.com/website/2018/10/12/using-pandas-with-python-decimal.html

so those pennies won't be vacuumed by by Richard Prior (Superman III)

 
  • Like
Likes jim mcnamara

Similar threads

Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
6K
  • · Replies 19 ·
Replies
19
Views
14K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 23 ·
Replies
23
Views
6K
  • · Replies 25 ·
Replies
25
Views
6K
  • · Replies 1 ·
Replies
1
Views
4K