Pandas Floating Point Error

  • Thread starter Zap
  • Start date
  • #1
Zap
387
99
Summary:
25.15 = 25.149999999999977??
I am parsing data from an excel sheet using pandas ExcelFile class and read_excel function. I noticed that when reading the file, there are floating point errors in the results. For example, a cell that contains 25.15 will be 25.149999999999977 in the pandas data frame.

The value 25.15 in Excel is not rounded. It is the result of adding 298.15 minus 273.

This link describes the problem, but there is not solution : https://github.com/pandas-dev/pandas/issues/21885

I'm surprised ExcelFile and read_excel has an error like this. It pretty much makes them unusable, in my opinion.
 

Answers and Replies

  • #2
Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
27,001
10,835
Do you need 13 digits of precision? And if you do, why are you using floats?
 
  • #3
BvU
Science Advisor
Homework Helper
14,405
3,717
It pretty much makes them unusable
Is that so ?
What accuracy is required by your application ?
 
  • #4
BvU
Science Advisor
Homework Helper
14,405
3,717
By the way, doesn't the fact that 1 + 1e-15 - 1 =0 in Excel make the whole spreadsheet program Excel unusable in the first place ?
a
1​
b
1.00E-15​
c = a+ b
1.000000000000000000​
c-1
0.000000000000000000​
 
  • #5
Zap
387
99
I need valid data. It's not about accuracy. I don't want to upload data that is clearly an error like that. It's a data quality issue, not a precision issue.

I've tried parsing them as strings, but the same issue arises. It's a string '25.149999999999977 ' instead of the expected '25.15'.

I'm wondering if there is a way to correct this floating point error without doing something like round(), because it will not be easy to tell the computer at which decimal place to round.
 
  • #6
Zap
387
99
By the way, doesn't the fact that 1 + 1e-15 - 1 =0 in Excel make the whole spreadsheet program Excel unusable in the first place ?
a
1​
b
1.00E-15​
c = a+ b
1.000000000000000000​
c-1
0.000000000000000000​
Context is key here. I'm not doing calculations. I'm uploading cited data that needs to have a certain level of integrity.
 
  • #7
BvU
Science Advisor
Homework Helper
14,405
3,717
Excel has about 16 digits of accuracy. What kind of data has its integrity damaged by that ? and, again:
What accuracy is required by your application ?
Context is key here.
I'm really curious about the context !

Because there are solutions for many applications...
 
  • #8
Zap
387
99
It's a floating point error. I don't want 16 digits.
 
  • #9
BvU
Science Advisor
Homework Helper
14,405
3,717
Yes No Yes No

You still haven't explained why
 
  • #10
Zap
387
99
I don't want to upload data into a data base that is clearly an error such as a floating point error, because data quality is of upmost importance for this database, and it will be available publicly. Would you want to access a website and see that floating point error as a cited experimental measurement?
 
  • #11
Zap
387
99
I am asking if you know of a way to correct for this error. If not, then thank you for your response.
 
  • #12
BvU
Science Advisor
Homework Helper
14,405
3,717
Would you want to access a website and see that floating point error as a cited experimental measurement?
A decent website doesn't present all digits. For empirical data 16 digits is more than enough

I am asking if you know of a way to correct for this error.
It's not an error.

##\ ##
 
  • Like
Likes anorlunda
  • #13
Zap
387
99
It's called a floating point error because you can't represent 25.15 exactly in binary.
 
  • #14
Zap
387
99
Okay. So, in Python, '{:n}'.format( x ), where x = 25.149999999999977 evaluates to '25.15,' which solves my problem.

Interestingly, if you do float( '{:n}'.format( x ) ), the result is 25.15, even though x = 25.149999999999977. Python has some built-in way of outputting floats exactly, even though behind the scenes the displayed value 25.15 may be stored as something like 25.1499999999999 ... I'm assuming SQL has something similar, but if I upload the data as 25.1499999999999, I will not get the clean display of 25.15. This was the original issue, and I have fixed it myself.

In a Pandas dataframe, you can apply the function lambda x : float( '{:n}'.format( float( x ) ) ) to remove floating point error.

Thanks,
Zap.
 
Last edited:
  • Like
Likes BvU
  • #15
35,052
6,791
It's called a floating point error because you can't represent 25.15 exactly in binary.
There are very many decimal numbers that can't be represented exactly in a finite number of binary digits. Excel is almost certainly using double precision (64 bits) to represent real numbers. Some decimal numbers, such as 0.2 or 0.1 have terminating decimal representations, but their binary representations don't terminate. As a result they get rounded and truncated to fit in 64 bits.

This is a common problem in virtually all programming languages, particularly Fortran, C, and C++. Some languages, such as C#, have a Decimal class that can maintain higher precision for use by financial applications. Java has a DecimalFormat that can be used to deal with the same kind of problem. I'm not an expert in Java, so it could be that some other class or package can be used.
 
  • Like
Likes BvU and lomidrevo
  • #16
Zap
387
99
I understand what is a floating point error. I was looking for a fix. Typically, Python auto corrects the display of floating point numbers, unless they are the result of a calculation or are otherwise not displayed exactly. To display a floating point number exactly, you can simply apply the function below, which will display 25.149999999999977 as 25.15 and totally solves my problem.

Python:
fix_float_error = lambda x : float( '{:n}'.format( x ) )
 
  • #17
659
538
I understand what is a floating point error. I was looking for a fix. Typically, Python auto corrects the display of floating point numbers, unless they are the result of a calculation or are otherwise not displayed exactly. To display a floating point number exactly, you can simply apply the function below, which will display 25.149999999999977 as 25.15 and totally solves my problem.

Python:
fix_float_error = lambda x : float( '{:n}'.format( x ) )

Isn't that just rounding the numbers like like you suggested doing yourself. It is possible it will erroniously truncate some numbers. And converting the formatted string of the rounded float back to float won't help.

Python:
>>> x = 10.15
>>> print(x)
10.15
>>> format(x,'.60g')
'10.1500000000000003552713678800500929355621337890625'
>>> x = float( '{:n}'.format( x ) )
>>> format(x, '.60g')
'10.1500000000000003552713678800500929355621337890625'
>>> y = 10.00000000000000001
>>> print(y)
10.0
>>> '{:n}'.format( y )
'10'

Do you have access to the correct numbers? If so, why not store them as strings?
 
Last edited:
  • #18
Zap
387
99
If I simply convert it to a string, or try to parse the data as strings, the pandas Excel parser gives '25.149999999999977' instead of ' 25.15.' Additionally, I want to store these values as double data types in SQL. So, strings will not work. SQL has some way of representing 25.15 exactly, just as Python does, but it will not represent the number as 25.15 if I upload it as 25.149999999999977.

I don't think float( '{:n}'.format( x ) ) is the same as rounding. The documentation is here. From what I understand, float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly, although it will still be stored as something like 25.149999999999977, because storing the exact value 25.15 is not possible in binary. Python will display the number as 25.15, even though it is really something like 25.149999999999977. SQL also has some way of displaying the exact value of floating point numbers, when the real value is an approximation in binary. That's why you can write x = 25.15 in Python, and it will display as 25.15. The same is true with SQL and Excel.

The problem has something to do with converting Excel data to Python data, and this built-in feature of representing floating point numbers as exact values in both systems is somehow lost in translation.

I don't want to put 25.149999999999977 into a database, because this data will be part of a professional website, and that's not what a user should see. It's an obvious floating point error. No one wants to see that.
 
Last edited:
  • #19
35,052
6,791
SQL has some way of representing 25.15 exactly, just as Python does,
Yes, but both languages will just round and/or truncate a number to a desired number of places. SQL has a Decimal data type that allows you to specify the precision and scale. In SQL they define precision as the total number of digits displayed, and scale as the number of digits to the right of the decimal point. Some more info: Understanding the SQL Decimal data type (sqlshack.com)
From what I understand, float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly, although it will still be stored as something like 25.149999999999977, because storing the exact value 25.15 is not possible in binary.
You can do something simpler to force Python to display a floating point number to a desired precision.
Python:
x = 20.15

for i in range(18):
    print(format(x, str( '.'+ str(i) + 'f')))
In the print statement, the 2nd argument of format is a string that I'm constructing. The first few format specifiers will be .0f, .1f, .2f, and so on, up to .17f.
Here's the output of the above:
Code:
C:\Users\Mark\Documents\Python3.4.2>python test2.py
20
20.1
20.15
20.150
20.1500
20.15000
20.150000
20.1500000
20.15000000
20.150000000
20.1500000000
20.15000000000
20.150000000000
20.1500000000000
20.15000000000000
20.149999999999999
20.1499999999999986
20.14999999999999858
Of course, a number such as 20.125 will display the exact result with any number of decimal places, because the bit pattern for 20.125 terminates.
 
  • Like
Likes lomidrevo
  • #20
rbelli1
Gold Member
997
379
SQL has some way of representing 25.15 exactly, just as Python does
Python and some SQL databases have the decimal type. This will avoid the floating point errors. A quick look at pandas indicates that you may be able to use boxed decimals but it doesn't support them natively.

I don't think float( '{:n}'.format( x ) ) is the same as rounding... float( '{:n}'.format( x ) ) will force Python to represent the floating point number exactly

What else would it be doing?

The closest value that a double precision IEEE754 floating point number can get to 25.15 is:
25.14999999999999857891452847979962825775146484375

The exact value of this number is 25.14999999999999857891452847979962825775146484375

Any program that uses IEEE754 floating point values internally that tells you the number is 25.15 is rounding. Any claim to the contrary is a lie.

If you tell python to round 25.15 to one decimal place you get 25.1 since you are actually telling it to round 25.14999999999999857891452847979962825775146484375
Python:
>>> print('{:.1f}'.format( 25.15 ))
25.1

BoB

EDIT> Mark44 and Jarvis323 posted while I was typing so most of this is redundant.
 
  • #21
Zap
387
99
That's displaying a string value, no?

Python does represent some floating point numbers as exact results, even though behind the scenes they are approximations in binary. The same is true for SQL and Excel. Otherwise, you could not assign a floating point value of 25.15 to a cell or variable, but you can.

This documentation alludes to '{:n}'.format as a way of enforcing the false representation of approximated floating point numbers as exact values. From what I can tell, it is doing that. The floating point errors encountered after reading the Excel sheets disappeared after applying float( '{:n}'.format( x ) ), so I'm not in a rush to try another method.

I will double check to see if there were any truncation errors, but it seemed to have worked beautifully.
 
  • #22
Zap
387
99
Anyways, I think that rounding the numbers this way is fine. But note that lambda x : float( '{:n}'.format( float( x ) ) ) is pointless. In the case of 25.15, stored as 25.149999999999977, it will just round and convert it to a string, and then convert it back to a float with the same value of 25.149999999999977. It doesn't change anything at all. In some cases, as in the case of y in my example (where the rounded value is less correct than the stored value), it will actually just cause you to lose some precision.

It changes it into a floating representation of 25.15, which is a display of 25.15 for a value of 25.149999999999977.
 
  • #23
Zap
387
99
Python and some SQL databases have the decimal type. This will avoid the floating point errors. A quick look at pandas indicates that you may be able to use boxed decimals but it doesn't support them natively.



What else would it be doing?

The closest value that a double precision IEEE754 floating point number can get to 25.15 is:
25.14999999999999857891452847979962825775146484375

The exact value of this number is 25.14999999999999857891452847979962825775146484375

Any program that uses IEEE754 floating point values internally that tells you the number is 25.15 is rounding. Any claim to the contrary is a lie.

If you tell python to round 25.15 to one decimal place you get 25.1 since you are actually telling it to round 25.14999999999999857891452847979962825775146484375
Python:
>>> print('{:.1f}'.format( 25.15 ))
25.1

BoB

EDIT> Mark44 and Jarvis323 posted while I was typing so most of this is redundant.

You can't round to 25.15 , because you can't calculate 25.15 in binary. If you see a "rounded" 25.15 floating point number, the program has hid the nasty floating point error and is displaying 25.15 to you instead, even though the true value is something like 25.1499999999999 ...
 
  • #24
rbelli1
Gold Member
997
379
You can't round to 25.15 , because you can't calculate 25.15 in binary.

When we round the "ugly" floating point value to the string value "25.15" we certainly can do that exactly in binary.

Python:
>>> print('{:.2f}'.format( 25.14999999999999857891452847979962825775146484375 ))
25.15

double 0x4039266666666666 gets converted to the series of bytes 0x32, 0x35, 0x2E, 0x31, 0x35.

BoB
 
  • #25
Zap
387
99
You can round to a floating point number of 25.15, but it's not really 25.15, it is displayed as 25.15, and it's not a string. It's a floating point number. Try it for yourself, or read the documentation. You can display the floating point number as 25.15, even though it's something else, for the hundredth time!
 

Related Threads on Pandas Floating Point Error

  • Last Post
Replies
8
Views
3K
Replies
8
Views
4K
  • Last Post
Replies
18
Views
6K
  • Last Post
Replies
8
Views
969
Replies
2
Views
1K
Replies
7
Views
1K
  • Last Post
Replies
2
Views
3K
Replies
1
Views
2K
Replies
2
Views
7K
Replies
6
Views
877
Top