Why Are There Floating Point Errors When Parsing Data in Pandas?

AI Thread Summary
Floating point errors occur when parsing data from Excel into Pandas, leading to discrepancies like 25.15 being represented as 25.149999999999977. This issue arises because certain decimal numbers cannot be precisely represented in binary, a common problem in programming. Users expressed concerns about data integrity, emphasizing the need for accurate representation in databases. A proposed solution involves using Python's formatting functions to display floating point numbers correctly, although some argue this merely masks the underlying precision issue. Ultimately, while there are methods to mitigate the visibility of these errors, the fundamental challenge of floating point representation remains.
Zap
Messages
406
Reaction score
120
TL;DR Summary
25.15 = 25.149999999999977??
I am parsing data from an excel sheet using pandas ExcelFile class and read_excel function. I noticed that when reading the file, there are floating point errors in the results. For example, a cell that contains 25.15 will be 25.149999999999977 in the pandas data frame.

The value 25.15 in Excel is not rounded. It is the result of adding 298.15 minus 273.

This link describes the problem, but there is not solution : https://github.com/pandas-dev/pandas/issues/21885

I'm surprised ExcelFile and read_excel has an error like this. It pretty much makes them unusable, in my opinion.
 
Technology news on Phys.org
Do you need 13 digits of precision? And if you do, why are you using floats?
 
Zap said:
It pretty much makes them unusable
Is that so ?
What accuracy is required by your application ?
 
By the way, doesn't the fact that 1 + 1e-15 - 1 =0 in Excel make the whole spreadsheet program Excel unusable in the first place ?
a
1​
b
1.00E-15​
c = a+ b
1.000000000000000000​
c-1
0.000000000000000000​
 
I need valid data. It's not about accuracy. I don't want to upload data that is clearly an error like that. It's a data quality issue, not a precision issue.

I've tried parsing them as strings, but the same issue arises. It's a string '25.149999999999977 ' instead of the expected '25.15'.

I'm wondering if there is a way to correct this floating point error without doing something like round(), because it will not be easy to tell the computer at which decimal place to round.
 
BvU said:
By the way, doesn't the fact that 1 + 1e-15 - 1 =0 in Excel make the whole spreadsheet program Excel unusable in the first place ?
a
1​
b
1.00E-15​
c = a+ b
1.000000000000000000​
c-1
0.000000000000000000​
Context is key here. I'm not doing calculations. I'm uploading cited data that needs to have a certain level of integrity.
 
Excel has about 16 digits of accuracy. What kind of data has its integrity damaged by that ? and, again:
BvU said:
What accuracy is required by your application ?
Zap said:
Context is key here.
I'm really curious about the context !

Because there are solutions for many applications...
 
It's a floating point error. I don't want 16 digits.
 
Yes No Yes No

You still haven't explained why
 
  • #10
I don't want to upload data into a data base that is clearly an error such as a floating point error, because data quality is of upmost importance for this database, and it will be available publicly. Would you want to access a website and see that floating point error as a cited experimental measurement?
 
  • #11
I am asking if you know of a way to correct for this error. If not, then thank you for your response.
 
  • #12
Zap said:
Would you want to access a website and see that floating point error as a cited experimental measurement?
A decent website doesn't present all digits. For empirical data 16 digits is more than enough

Zap said:
I am asking if you know of a way to correct for this error.
It's not an error.

##\ ##
 
  • Like
Likes anorlunda
  • #13
It's called a floating point error because you can't represent 25.15 exactly in binary.
 
  • #14
Okay. So, in Python, '{:n}'.format( x ), where x = 25.149999999999977 evaluates to '25.15,' which solves my problem.

Interestingly, if you do float( '{:n}'.format( x ) ), the result is 25.15, even though x = 25.149999999999977. Python has some built-in way of outputting floats exactly, even though behind the scenes the displayed value 25.15 may be stored as something like 25.1499999999999 ... I'm assuming SQL has something similar, but if I upload the data as 25.1499999999999, I will not get the clean display of 25.15. This was the original issue, and I have fixed it myself.

In a Pandas dataframe, you can apply the function lambda x : float( '{:n}'.format( float( x ) ) ) to remove floating point error.

Thanks,
Zap.
 
Last edited:
  • Like
Likes BvU
  • #15
Zap said:
It's called a floating point error because you can't represent 25.15 exactly in binary.
There are very many decimal numbers that can't be represented exactly in a finite number of binary digits. Excel is almost certainly using double precision (64 bits) to represent real numbers. Some decimal numbers, such as 0.2 or 0.1 have terminating decimal representations, but their binary representations don't terminate. As a result they get rounded and truncated to fit in 64 bits.

This is a common problem in virtually all programming languages, particularly Fortran, C, and C++. Some languages, such as C#, have a Decimal class that can maintain higher precision for use by financial applications. Java has a DecimalFormat that can be used to deal with the same kind of problem. I'm not an expert in Java, so it could be that some other class or package can be used.
 
  • Like
Likes BvU and lomidrevo
  • #16
I understand what is a floating point error. I was looking for a fix. Typically, Python auto corrects the display of floating point numbers, unless they are the result of a calculation or are otherwise not displayed exactly. To display a floating point number exactly, you can simply apply the function below, which will display 25.149999999999977 as 25.15 and totally solves my problem.

Python:
fix_float_error = lambda x : float( '{:n}'.format( x ) )
 
  • #17
Zap said:
I understand what is a floating point error. I was looking for a fix. Typically, Python auto corrects the display of floating point numbers, unless they are the result of a calculation or are otherwise not displayed exactly. To display a floating point number exactly, you can simply apply the function below, which will display 25.149999999999977 as 25.15 and totally solves my problem.

Python:
fix_float_error = lambda x : float( '{:n}'.format( x ) )

Isn't that just rounding the numbers like like you suggested doing yourself. It is possible it will erroniously truncate some numbers. And converting the formatted string of the rounded float back to float won't help.

Python:
>>> x = 10.15
>>> print(x)
10.15
>>> format(x,'.60g')
'10.1500000000000003552713678800500929355621337890625'
>>> x = float( '{:n}'.format( x ) )
>>> format(x, '.60g')
'10.1500000000000003552713678800500929355621337890625'
>>> y = 10.00000000000000001
>>> print(y)
10.0
>>> '{:n}'.format( y )
'10'

Do you have access to the correct numbers? If so, why not store them as strings?
 
Last edited:
  • #18
If I simply convert it to a string, or try to parse the data as strings, the pandas Excel parser gives '25.149999999999977' instead of ' 25.15.' Additionally, I want to store these values as double data types in SQL. So, strings will not work. SQL has some way of representing 25.15 exactly, just as Python does, but it will not represent the number as 25.15 if I upload it as 25.149999999999977.

I don't think float( '{:n}'.format( x ) ) is the same as rounding. The documentation is here. From what I understand, float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly, although it will still be stored as something like 25.149999999999977, because storing the exact value 25.15 is not possible in binary. Python will display the number as 25.15, even though it is really something like 25.149999999999977. SQL also has some way of displaying the exact value of floating point numbers, when the real value is an approximation in binary. That's why you can write x = 25.15 in Python, and it will display as 25.15. The same is true with SQL and Excel.

The problem has something to do with converting Excel data to Python data, and this built-in feature of representing floating point numbers as exact values in both systems is somehow lost in translation.

I don't want to put 25.149999999999977 into a database, because this data will be part of a professional website, and that's not what a user should see. It's an obvious floating point error. No one wants to see that.
 
Last edited:
  • #19
Zap said:
SQL has some way of representing 25.15 exactly, just as Python does,
Yes, but both languages will just round and/or truncate a number to a desired number of places. SQL has a Decimal data type that allows you to specify the precision and scale. In SQL they define precision as the total number of digits displayed, and scale as the number of digits to the right of the decimal point. Some more info: Understanding the SQL Decimal data type (sqlshack.com)
Zap said:
From what I understand, float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly, although it will still be stored as something like 25.149999999999977, because storing the exact value 25.15 is not possible in binary.
You can do something simpler to force Python to display a floating point number to a desired precision.
Python:
x = 20.15

for i in range(18):
    print(format(x, str( '.'+ str(i) + 'f')))
In the print statement, the 2nd argument of format is a string that I'm constructing. The first few format specifiers will be .0f, .1f, .2f, and so on, up to .17f.
Here's the output of the above:
Code:
C:\Users\Mark\Documents\Python3.4.2>python test2.py
20
20.1
20.15
20.150
20.1500
20.15000
20.150000
20.1500000
20.15000000
20.150000000
20.1500000000
20.15000000000
20.150000000000
20.1500000000000
20.15000000000000
20.149999999999999
20.1499999999999986
20.14999999999999858
Of course, a number such as 20.125 will display the exact result with any number of decimal places, because the bit pattern for 20.125 terminates.
 
  • Like
Likes lomidrevo
  • #20
Zap said:
SQL has some way of representing 25.15 exactly, just as Python does
Python and some SQL databases have the decimal type. This will avoid the floating point errors. A quick look at pandas indicates that you may be able to use boxed decimals but it doesn't support them natively.

Zap said:
I don't think float( '{:n}'.format( x ) ) is the same as rounding... float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly

What else would it be doing?

The closest value that a double precision IEEE754 floating point number can get to 25.15 is:
25.14999999999999857891452847979962825775146484375

The exact value of this number is 25.14999999999999857891452847979962825775146484375

Any program that uses IEEE754 floating point values internally that tells you the number is 25.15 is rounding. Any claim to the contrary is a lie.

If you tell python to round 25.15 to one decimal place you get 25.1 since you are actually telling it to round 25.14999999999999857891452847979962825775146484375
Python:
>>> print('{:.1f}'.format( 25.15 ))
25.1

BoB

EDIT> Mark44 and Jarvis323 posted while I was typing so most of this is redundant.
 
  • #21
That's displaying a string value, no?

Python does represent some floating point numbers as exact results, even though behind the scenes they are approximations in binary. The same is true for SQL and Excel. Otherwise, you could not assign a floating point value of 25.15 to a cell or variable, but you can.

This documentation alludes to '{:n}'.format as a way of enforcing the false representation of approximated floating point numbers as exact values. From what I can tell, it is doing that. The floating point errors encountered after reading the Excel sheets disappeared after applying float( '{:n}'.format( x ) ), so I'm not in a rush to try another method.

I will double check to see if there were any truncation errors, but it seemed to have worked beautifully.
 
  • #22
Jarvis323 said:
Anyways, I think that rounding the numbers this way is fine. But note that lambda x : float( '{:n}'.format( float( x ) ) ) is pointless. In the case of 25.15, stored as 25.149999999999977, it will just round and convert it to a string, and then convert it back to a float with the same value of 25.149999999999977. It doesn't change anything at all. In some cases, as in the case of y in my example (where the rounded value is less correct than the stored value), it will actually just cause you to lose some precision.

It changes it into a floating representation of 25.15, which is a display of 25.15 for a value of 25.149999999999977.
 
  • #23
rbelli1 said:
Python and some SQL databases have the decimal type. This will avoid the floating point errors. A quick look at pandas indicates that you may be able to use boxed decimals but it doesn't support them natively.
What else would it be doing?

The closest value that a double precision IEEE754 floating point number can get to 25.15 is:
25.14999999999999857891452847979962825775146484375

The exact value of this number is 25.14999999999999857891452847979962825775146484375

Any program that uses IEEE754 floating point values internally that tells you the number is 25.15 is rounding. Any claim to the contrary is a lie.

If you tell python to round 25.15 to one decimal place you get 25.1 since you are actually telling it to round 25.14999999999999857891452847979962825775146484375
Python:
>>> print('{:.1f}'.format( 25.15 ))
25.1

BoB

EDIT> Mark44 and Jarvis323 posted while I was typing so most of this is redundant.

You can't round to 25.15 , because you can't calculate 25.15 in binary. If you see a "rounded" 25.15 floating point number, the program has hid the nasty floating point error and is displaying 25.15 to you instead, even though the true value is something like 25.1499999999999 ...
 
  • #24
Zap said:
You can't round to 25.15 , because you can't calculate 25.15 in binary.

When we round the "ugly" floating point value to the string value "25.15" we certainly can do that exactly in binary.

Python:
>>> print('{:.2f}'.format( 25.14999999999999857891452847979962825775146484375 ))
25.15

double 0x4039266666666666 gets converted to the series of bytes 0x32, 0x35, 0x2E, 0x31, 0x35.

BoB
 
  • #25
You can round to a floating point number of 25.15, but it's not really 25.15, it is displayed as 25.15, and it's not a string. It's a floating point number. Try it for yourself, or read the documentation. You can display the floating point number as 25.15, even though it's something else, for the hundredth time!
 
  • #26
Zap said:
If you see a "rounded" 25.15 floating point number, the program has hid the nasty floating point error and is displaying 25.15 to you instead, even though the true value is something like 25.1499999999999 ...
You do know that 25.1499999999999... is exactly equal to 25.15, right? The ellipsis (...) means that the 9 digits repeat that pattern forever.

As someone already asked, do you really need more than 13 decimal digits of precision in your application? I can't think of any measurements that would be that precise.
 
  • #27
Are you reading this thread? Do you seriously think I meant an infinitely repeating number on a computer? I am not interested in increasing precision. Please read the thread before responding.
 
  • #28
Python:
>>> x = 25.149999999999977
>>> x
25.149999999999977
>>> float( '{:n}'.format( x ) )
25.15
>>> x = float( '{:n}'.format( x ) )
>>> type( x )
<class 'float'>
>>> x
25.15
>>> x = 0.3 - 0.2
>>> x
0.09999999999999998
>>> float( '{:n}'.format( x ) )
0.1
>>> x = float( '{:n}'.format( x ) )
>>> x
0.1
>>> type( x )
<class 'float'>
>>>
 
  • #29
The whole point was representing the float as 25.15. The point I'm trying to make is that the floating point number is being represented exactly as 25.15, even though in the background it is still a binary approximation, and float( '{:n}'.format( x ) ) is coercing Python to represent the binary approximation as an exact value. How many times do I have to say this?
 
  • #30
You can't round to 25.15 ...

The number is being displayed as 25.15, it isn't being rounded. You can't calculate 25.15 in binary. So, how is Python giving me a floating point number of 25.15? Because it's being DISPLAYED AS 25.15 AND NOT ACTUALLY 25.15!
 
  • #31
Zap said:
I noticed that when reading the file, there are floating point errors in the results. For example, a cell that contains 25.15 will be 25.149999999999977 in the pandas data frame.
For double precision floating point; 25.15 = 25.149999999999977;
That is NOT an error. It is a failure to format and round the output.

If you look at the excel formatting information for numbers in the column that shows 25.15, you will probably see it is specified to display as "xxx.xx"; which will show the number rounded to 2 decimal places. Excel is storing the double precision number as 25.149999999999977 internally.

If you want to control the way a double precision number appears in pandas, you will need to read the pandas manual on display formatting, which probably uses the Round(,) format function. Like Excel, pandas is also storing the double precision number as 25.149999999999977 internally.
https://www.xspdf.com/resolution/51898492.html
 
  • Like
Likes lomidrevo, Vanadium 50 and BvU
  • #32
Zap said:
Are you reading this thread?

Are you?

Lots of good stuff here.
 
  • #33
Zap said:
Are you reading this thread? Do you seriously think I meant an infinitely repeating number on a computer? I am not interested in increasing precision. Please read the thread before responding.
I have read the entire thread. No, I don't believe you meant an infinitely repeating number, but that's what you wrote:
Zap said:
even though the true value is something like 25.1499999999999 ...

As I recall, you have a degree in physics, so you should know exactly what the ellipsis means in 25.1499999999999 ...
 
  • #34
Strange.

There is a way to coerce Python to represent numbers with floating point errors as exact values, even though internally they are being stored as a binary approximation, as Baluncore has said.

It is not the same as rounding, because you cannot round to a number that is impossible to calculate in binary.

So for example, if you have a floating point number of 25.149999999999977, you cannot round it to 25.15, because 25.15 cannot be calculated exactly in binary. If you attempt to round it to 25.15, Python will display the value as 25.15, even though internally it is still something like 25.149999999999977. So, it's not really rounding but coercing Python to represent the binary approximation as an exact value.

However, as stated earlier, I don't want to use the round() function, because it's impossible to tell the computer in a simple way at which decimal place to round or which numbers have floating point errors and which do not, and I don't actually want to round numbers.

So, instead, there is a way to coerce Python to display certain numbers that contain floating point errors as exact results while keeping the original value intact, which is not rounding.
 
Last edited:
  • #35
Zap said:
So for example, if you have a floating point number of 25.149999999999977, you cannot round it to 25.15, because 25.15 cannot be calculated exactly in binary.
You're confusing the internal representation of a number (e.g., 25.149999999999977) with its displayed value, which in your example is the displayed value of 25.15.
The value that is displayed is a string of numerical digit characters together with a decimal point.
Zap said:
If you attempt to round it to 25.15, Python will display the value as 25.15, even though internally it is still something like 25.149999999999977. So, it's not really rounding but coercing Python to represent the binary approximation as an exact value.
The number 25.149999999999977 does not exist in memory. Instead, there is the binary representation, most likely in 64 bits. This 64-bit value is rounded or truncated to the desired precision and converted to a string of characters by print() or format(), but the underlying number is not changed.

The Python code I showed a while back shows a sequence of displayed values for the variable x, with varying numbers of decimal digits. At no time is the underlying value of x changed due to rounding or truncation. The only things that change are the (string) representations of this number according to how it's being formatted.
 
  • #36
Couple of things:

1) While Python (or rather 64-bit IEEE floats) can represent neither the numbers 25.15 nor 25.149999999999977 perfectly precisely, it can represent them both closely enough to distinguish them:
Python:
>>> x = 25.15
>>> y = 25.149999999999977
>>> x == y
False
>>> x - y
2.1316282072803006e-14
The closest floating-point approximations of these numbers are not the same.

2) Whatever that :n format directive is doing, it is clearly not trying to represent floating-point numbers to anywhere near machine precision:
Python:
>>> '{:n}'.format(1.23456789)
'1.23457'
Checking the documentation seems to confirm this:
'n'Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters.
and, just above:
'g'[...] With no precision given, uses a precision of 6 significant digits for float. [...]


 
  • Like
Likes berkeman, Jarvis323 and Mark44
  • #37
I used the :n formatting on the data set, and it appeared to totally fix the problem I was having. I don't know precisely what it is doing, but since it gave me the desired result, I went ahead and used it.

Mark44 said:
The value that is displayed is a string of numerical digit characters together with a decimal point.
The number 25.149999999999977 does not exist in memory. Instead, there is the binary representation, most likely in 64 bits. This 64-bit value is rounded or truncated to the desired precision and converted to a string of characters by print() or format(), but the underlying number is not changed.

The Python code I showed a while back shows a sequence of displayed values for the variable x, with varying numbers of decimal digits. At no time is the underlying value of x changed due to rounding or truncation. The only things that change are the (string) representations of this number according to how it's being formatted.

I thought that's what I was saying. I think the term for this is round tripping.
 
Last edited:
  • #38
Zap said:
You can't calculate 25.15 in binary.

This is untrue.

It is true that 25.15 cannot be represented exactly in a IEEE754 value.

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack1 the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

Zap said:
However, as stated earlier, I don't want to use the round() function, because it's impossible to tell the computer in a simple way at which decimal place to round

https://docs.python.org/3/library/functions.html#round
round(number[, ndigits])

This rounds towards even rather then up for the halfway values so may not give the desired results. EDIT: This is the behavior of formatting and printing of text.

Zap said:
or which numbers have floating point errors and which do not, and I don't actually want to round numbers.

You can leave the values in the database with full precision and only alter the displayed information.

BoB

1 If it walks like a rounding...
 
  • #39
rbelli1 said:
This is untrue.

It is true that 25.15 cannot be represented exactly in a IEEE754 value.

If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack1 the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.

I'm so confused lol. Is this not what I have been saying the entire time?
 
  • #40
Zap said:
I'm so confused lol. Is this not what I have been saying the entire time?
Not from my perspective. It seems to me you have consistently misinterpreted or misunderstood the difference between what is displayed versus the underlying number itself. At least that's how I see it. If that's not it, why have we spent 40 posts on this question?
 
  • Like
Likes rbelli1 and Baluncore
  • #41
I repeatedly said that I wanted the exact representation of the floating point number in Python, and repeatedly said that by exact representation I mean that the floating point number will be displayed as exact even though the actual number was a binary approximation. I think I've said that about 6 times by now, but everyone continued to argue with me. I have just been repeating this over and over again. I cannot comprehend what the argument is. I seems as though you guys are messing with me, at this point.

I solved my problem in like post #5. Even though it's not a perfect solution, it's better than using round() or simply leaving it as is.

And by not a perfect solution, I mean that I don't totally understand how the solution is working, but it seems to be doing what I want it to do 100% of the time so far. The issue is parsing Excel data into Python, and the floating point precision and also the display of the floating point number is not being transferred correctly between programs, which I consider an error. Using the aforementioned method, this error is being fixed.

What is the argument here?
 
Last edited:
  • #42
rbelli1 said:
If it was absolutely forbidden to calculate 25.15 in any way in binary then the Python examples in this thread would not have displayed 25.15 in them. However, there is an algorithm in python that knows how to quack1 the value from 25.14999999999999857891452847979962825775146484375 in a double precision type to 25.15 in textual format when you ask for two decimal places.
Zap said:
Interestingly, if you do float( '{:n}'.format( x ) ), the result is 25.15, even though x = 25.149999999999977. Python has some built-in way of outputting floats exactly, even though behind the scenes the displayed value 25.15 may be stored as something like 25.1499999999999 ...

Maybe, you can point out in these two quotes where I was wrong. And by 25.1499999999999 ... , I don't mean an infinitely repeating number ... I mean some kind of approximation to some number of digits ... Almost very post I've made in this thread has just been repeating what I said above over and over.
 
  • #43
I would say its time to close this thread with the understanding that:
  1. @Zap has found a workable solution to his problem,
  2. that there is a clear rift of understanding the meaning of mathematical real numbers and how they are represented in IEEE floating point format in memory and how computers can coax them into more human readable formats.

Thanks to all who have contributed here.

I leave these videos on floating point number for future generations of readers of this thread:





and here's a discussion of using Pandas for currency arithmetic:

https://beepscore.com/website/2018/10/12/using-pandas-with-python-decimal.html

so those pennies won't be vacuumed by by Richard Prior (Superman III)

 
  • Like
Likes jim mcnamara

Similar threads

Replies
8
Views
6K
Replies
13
Views
3K
Replies
23
Views
6K
Replies
25
Views
6K
Replies
1
Views
4K
Back
Top