Why Are There Floating Point Errors When Parsing Data in Pandas?

Click For Summary
Floating point errors occur when parsing data from Excel into Pandas, leading to discrepancies like 25.15 being represented as 25.149999999999977. This issue arises because certain decimal numbers cannot be precisely represented in binary, a common problem in programming. Users expressed concerns about data integrity, emphasizing the need for accurate representation in databases. A proposed solution involves using Python's formatting functions to display floating point numbers correctly, although some argue this merely masks the underlying precision issue. Ultimately, while there are methods to mitigate the visibility of these errors, the fundamental challenge of floating point representation remains.
Zap
Messages
406
Reaction score
120
TL;DR
25.15 = 25.149999999999977??
I am parsing data from an excel sheet using pandas ExcelFile class and read_excel function. I noticed that when reading the file, there are floating point errors in the results. For example, a cell that contains 25.15 will be 25.149999999999977 in the pandas data frame.

The value 25.15 in Excel is not rounded. It is the result of adding 298.15 minus 273.

This link describes the problem, but there is not solution : https://github.com/pandas-dev/pandas/issues/21885

I'm surprised ExcelFile and read_excel has an error like this. It pretty much makes them unusable, in my opinion.
 
Technology news on Phys.org
Do you need 13 digits of precision? And if you do, why are you using floats?
 
Zap said:
It pretty much makes them unusable
Is that so ?
What accuracy is required by your application ?
 
By the way, doesn't the fact that 1 + 1e-15 - 1 =0 in Excel make the whole spreadsheet program Excel unusable in the first place ?
a
1​
b
1.00E-15​
c = a+ b
1.000000000000000000​
c-1
0.000000000000000000​
 
I need valid data. It's not about accuracy. I don't want to upload data that is clearly an error like that. It's a data quality issue, not a precision issue.

I've tried parsing them as strings, but the same issue arises. It's a string '25.149999999999977 ' instead of the expected '25.15'.

I'm wondering if there is a way to correct this floating point error without doing something like round(), because it will not be easy to tell the computer at which decimal place to round.
 
BvU said:
By the way, doesn't the fact that 1 + 1e-15 - 1 =0 in Excel make the whole spreadsheet program Excel unusable in the first place ?
a
1​
b
1.00E-15​
c = a+ b
1.000000000000000000​
c-1
0.000000000000000000​
Context is key here. I'm not doing calculations. I'm uploading cited data that needs to have a certain level of integrity.
 
Excel has about 16 digits of accuracy. What kind of data has its integrity damaged by that ? and, again:
BvU said:
What accuracy is required by your application ?
Zap said:
Context is key here.
I'm really curious about the context !

Because there are solutions for many applications...
 
It's a floating point error. I don't want 16 digits.
 
Yes No Yes No

You still haven't explained why
 
  • #10
I don't want to upload data into a data base that is clearly an error such as a floating point error, because data quality is of upmost importance for this database, and it will be available publicly. Would you want to access a website and see that floating point error as a cited experimental measurement?
 
  • #11
I am asking if you know of a way to correct for this error. If not, then thank you for your response.
 
  • #12
Zap said:
Would you want to access a website and see that floating point error as a cited experimental measurement?
A decent website doesn't present all digits. For empirical data 16 digits is more than enough

Zap said:
I am asking if you know of a way to correct for this error.
It's not an error.

##\ ##
 
  • Like
Likes anorlunda
  • #13
It's called a floating point error because you can't represent 25.15 exactly in binary.
 
  • #14
Okay. So, in Python, '{:n}'.format( x ), where x = 25.149999999999977 evaluates to '25.15,' which solves my problem.

Interestingly, if you do float( '{:n}'.format( x ) ), the result is 25.15, even though x = 25.149999999999977. Python has some built-in way of outputting floats exactly, even though behind the scenes the displayed value 25.15 may be stored as something like 25.1499999999999 ... I'm assuming SQL has something similar, but if I upload the data as 25.1499999999999, I will not get the clean display of 25.15. This was the original issue, and I have fixed it myself.

In a Pandas dataframe, you can apply the function lambda x : float( '{:n}'.format( float( x ) ) ) to remove floating point error.

Thanks,
Zap.
 
Last edited:
  • Like
Likes BvU
  • #15
Zap said:
It's called a floating point error because you can't represent 25.15 exactly in binary.
There are very many decimal numbers that can't be represented exactly in a finite number of binary digits. Excel is almost certainly using double precision (64 bits) to represent real numbers. Some decimal numbers, such as 0.2 or 0.1 have terminating decimal representations, but their binary representations don't terminate. As a result they get rounded and truncated to fit in 64 bits.

This is a common problem in virtually all programming languages, particularly Fortran, C, and C++. Some languages, such as C#, have a Decimal class that can maintain higher precision for use by financial applications. Java has a DecimalFormat that can be used to deal with the same kind of problem. I'm not an expert in Java, so it could be that some other class or package can be used.
 
  • Like
Likes BvU and lomidrevo
  • #16
I understand what is a floating point error. I was looking for a fix. Typically, Python auto corrects the display of floating point numbers, unless they are the result of a calculation or are otherwise not displayed exactly. To display a floating point number exactly, you can simply apply the function below, which will display 25.149999999999977 as 25.15 and totally solves my problem.

Python:
fix_float_error = lambda x : float( '{:n}'.format( x ) )
 
  • #17
Zap said:
I understand what is a floating point error. I was looking for a fix. Typically, Python auto corrects the display of floating point numbers, unless they are the result of a calculation or are otherwise not displayed exactly. To display a floating point number exactly, you can simply apply the function below, which will display 25.149999999999977 as 25.15 and totally solves my problem.

Python:
fix_float_error = lambda x : float( '{:n}'.format( x ) )

Isn't that just rounding the numbers like like you suggested doing yourself. It is possible it will erroniously truncate some numbers. And converting the formatted string of the rounded float back to float won't help.

Python:
>>> x = 10.15
>>> print(x)
10.15
>>> format(x,'.60g')
'10.1500000000000003552713678800500929355621337890625'
>>> x = float( '{:n}'.format( x ) )
>>> format(x, '.60g')
'10.1500000000000003552713678800500929355621337890625'
>>> y = 10.00000000000000001
>>> print(y)
10.0
>>> '{:n}'.format( y )
'10'

Do you have access to the correct numbers? If so, why not store them as strings?
 
Last edited:
  • #18
If I simply convert it to a string, or try to parse the data as strings, the pandas Excel parser gives '25.149999999999977' instead of ' 25.15.' Additionally, I want to store these values as double data types in SQL. So, strings will not work. SQL has some way of representing 25.15 exactly, just as Python does, but it will not represent the number as 25.15 if I upload it as 25.149999999999977.

I don't think float( '{:n}'.format( x ) ) is the same as rounding. The documentation is here. From what I understand, float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly, although it will still be stored as something like 25.149999999999977, because storing the exact value 25.15 is not possible in binary. Python will display the number as 25.15, even though it is really something like 25.149999999999977. SQL also has some way of displaying the exact value of floating point numbers, when the real value is an approximation in binary. That's why you can write x = 25.15 in Python, and it will display as 25.15. The same is true with SQL and Excel.

The problem has something to do with converting Excel data to Python data, and this built-in feature of representing floating point numbers as exact values in both systems is somehow lost in translation.

I don't want to put 25.149999999999977 into a database, because this data will be part of a professional website, and that's not what a user should see. It's an obvious floating point error. No one wants to see that.
 
Last edited:
  • #19
Zap said:
SQL has some way of representing 25.15 exactly, just as Python does,
Yes, but both languages will just round and/or truncate a number to a desired number of places. SQL has a Decimal data type that allows you to specify the precision and scale. In SQL they define precision as the total number of digits displayed, and scale as the number of digits to the right of the decimal point. Some more info: Understanding the SQL Decimal data type (sqlshack.com)
Zap said:
From what I understand, float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly, although it will still be stored as something like 25.149999999999977, because storing the exact value 25.15 is not possible in binary.
You can do something simpler to force Python to display a floating point number to a desired precision.
Python:
x = 20.15

for i in range(18):
    print(format(x, str( '.'+ str(i) + 'f')))
In the print statement, the 2nd argument of format is a string that I'm constructing. The first few format specifiers will be .0f, .1f, .2f, and so on, up to .17f.
Here's the output of the above:
Code:
C:\Users\Mark\Documents\Python3.4.2>python test2.py
20
20.1
20.15
20.150
20.1500
20.15000
20.150000
20.1500000
20.15000000
20.150000000
20.1500000000
20.15000000000
20.150000000000
20.1500000000000
20.15000000000000
20.149999999999999
20.1499999999999986
20.14999999999999858
Of course, a number such as 20.125 will display the exact result with any number of decimal places, because the bit pattern for 20.125 terminates.
 
  • Like
Likes lomidrevo
  • #20
Zap said:
SQL has some way of representing 25.15 exactly, just as Python does
Python and some SQL databases have the decimal type. This will avoid the floating point errors. A quick look at pandas indicates that you may be able to use boxed decimals but it doesn't support them natively.

Zap said:
I don't think float( '{:n}'.format( x ) ) is the same as rounding... float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly

What else would it be doing?

The closest value that a double precision IEEE754 floating point number can get to 25.15 is:
25.14999999999999857891452847979962825775146484375

The exact value of this number is 25.14999999999999857891452847979962825775146484375

Any program that uses IEEE754 floating point values internally that tells you the number is 25.15 is rounding. Any claim to the contrary is a lie.

If you tell python to round 25.15 to one decimal place you get 25.1 since you are actually telling it to round 25.14999999999999857891452847979962825775146484375
Python:
>>> print('{:.1f}'.format( 25.15 ))
25.1

BoB

EDIT> Mark44 and Jarvis323 posted while I was typing so most of this is redundant.
 
  • #21
That's displaying a string value, no?

Python does represent some floating point numbers as exact results, even though behind the scenes they are approximations in binary. The same is true for SQL and Excel. Otherwise, you could not assign a floating point value of 25.15 to a cell or variable, but you can.

This documentation alludes to '{:n}'.format as a way of enforcing the false representation of approximated floating point numbers as exact values. From what I can tell, it is doing that. The floating point errors encountered after reading the Excel sheets disappeared after applying float( '{:n}'.format( x ) ), so I'm not in a rush to try another method.

I will double check to see if there were any truncation errors, but it seemed to have worked beautifully.
 
  • #22
Jarvis323 said:
Anyways, I think that rounding the numbers this way is fine. But note that lambda x : float( '{:n}'.format( float( x ) ) ) is pointless. In the case of 25.15, stored as 25.149999999999977, it will just round and convert it to a string, and then convert it back to a float with the same value of 25.149999999999977. It doesn't change anything at all. In some cases, as in the case of y in my example (where the rounded value is less correct than the stored value), it will actually just cause you to lose some precision.

It changes it into a floating representation of 25.15, which is a display of 25.15 for a value of 25.149999999999977.
 
  • #23
rbelli1 said:
Python and some SQL databases have the decimal type. This will avoid the floating point errors. A quick look at pandas indicates that you may be able to use boxed decimals but it doesn't support them natively.
What else would it be doing?

The closest value that a double precision IEEE754 floating point number can get to 25.15 is:
25.14999999999999857891452847979962825775146484375

The exact value of this number is 25.14999999999999857891452847979962825775146484375

Any program that uses IEEE754 floating point values internally that tells you the number is 25.15 is rounding. Any claim to the contrary is a lie.

If you tell python to round 25.15 to one decimal place you get 25.1 since you are actually telling it to round 25.14999999999999857891452847979962825775146484375
Python:
>>> print('{:.1f}'.format( 25.15 ))
25.1

BoB

EDIT> Mark44 and Jarvis323 posted while I was typing so most of this is redundant.

You can't round to 25.15 , because you can't calculate 25.15 in binary. If you see a "rounded" 25.15 floating point number, the program has hid the nasty floating point error and is displaying 25.15 to you instead, even though the true value is something like 25.1499999999999 ...
 
  • #24
Zap said:
You can't round to 25.15 , because you can't calculate 25.15 in binary.

When we round the "ugly" floating point value to the string value "25.15" we certainly can do that exactly in binary.

Python:
>>> print('{:.2f}'.format( 25.14999999999999857891452847979962825775146484375 ))
25.15

double 0x4039266666666666 gets converted to the series of bytes 0x32, 0x35, 0x2E, 0x31, 0x35.

BoB
 
  • #25
You can round to a floating point number of 25.15, but it's not really 25.15, it is displayed as 25.15, and it's not a string. It's a floating point number. Try it for yourself, or read the documentation. You can display the floating point number as 25.15, even though it's something else, for the hundredth time!
 
  • #26
Zap said:
If you see a "rounded" 25.15 floating point number, the program has hid the nasty floating point error and is displaying 25.15 to you instead, even though the true value is something like 25.1499999999999 ...
You do know that 25.1499999999999... is exactly equal to 25.15, right? The ellipsis (...) means that the 9 digits repeat that pattern forever.

As someone already asked, do you really need more than 13 decimal digits of precision in your application? I can't think of any measurements that would be that precise.
 
  • #27
Are you reading this thread? Do you seriously think I meant an infinitely repeating number on a computer? I am not interested in increasing precision. Please read the thread before responding.
 
  • #28
Python:
>>> x = 25.149999999999977
>>> x
25.149999999999977
>>> float( '{:n}'.format( x ) )
25.15
>>> x = float( '{:n}'.format( x ) )
>>> type( x )
<class 'float'>
>>> x
25.15
>>> x = 0.3 - 0.2
>>> x
0.09999999999999998
>>> float( '{:n}'.format( x ) )
0.1
>>> x = float( '{:n}'.format( x ) )
>>> x
0.1
>>> type( x )
<class 'float'>
>>>
 
  • #29
The whole point was representing the float as 25.15. The point I'm trying to make is that the floating point number is being represented exactly as 25.15, even though in the background it is still a binary approximation, and float( '{:n}'.format( x ) ) is coercing Python to represent the binary approximation as an exact value. How many times do I have to say this?
 
  • #30
You can't round to 25.15 ...

The number is being displayed as 25.15, it isn't being rounded. You can't calculate 25.15 in binary. So, how is Python giving me a floating point number of 25.15? Because it's being DISPLAYED AS 25.15 AND NOT ACTUALLY 25.15!
 

Similar threads

Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
6K
  • · Replies 19 ·
Replies
19
Views
14K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 23 ·
Replies
23
Views
6K
  • · Replies 25 ·
Replies
25
Views
6K
  • · Replies 1 ·
Replies
1
Views
4K