Computer Arithmetic for Double Precision Numbers

Click For Summary
SUMMARY

The discussion focuses on evaluating the expression (1-a)(1+a) in double precision, specifically under the binary64 implementation of IEEE 754. It is established that the expression evaluates to 1 when a equals 0 or 1/n, where n is a positive integer. The conversation also clarifies that the problem pertains to floating-point representation rather than fixed-point, emphasizing the importance of understanding double precision arithmetic in computational contexts.

PREREQUISITES
  • Understanding of IEEE 754 double precision (binary64) representation
  • Familiarity with floating-point arithmetic
  • Basic knowledge of mathematical expressions and their evaluations
  • Concept of fixed-point vs. floating-point data representation
NEXT STEPS
  • Research IEEE 754 double precision floating-point format
  • Learn about the implications of floating-point precision in numerical computations
  • Explore fixed-point vs. floating-point arithmetic in programming
  • Investigate common pitfalls in floating-point arithmetic and how to avoid them
USEFUL FOR

Students studying computer science, software developers working with numerical methods, and anyone interested in understanding the intricacies of double precision arithmetic in programming.

ver_mathstats
Messages
258
Reaction score
21
Homework Statement
In double precision for what values of a does this expression evaluate to 1?

The expression is (1-a)(1+a).
Relevant Equations
(1-a)(1+a)
I know that this expression evaluates to 1 when a is equal to 0. Also for when a is equal to 1/n when n is a positive number, but I'm confused about how to go about this in double precision?
 
Physics news on Phys.org
ver_mathstats said:
Homework Statement:: In double precision for what values of a does this expression evaluate to 1?

The expression is (1-a)(1+a).
Relevant Equations:: (1-a)(1+a)

I know that this expression evaluates to 1 when a is equal to 0. Also for when a is equal to 1/n when n is a positive number, but I'm confused about how to go about this in double precision?
Assuming you mean Fixed Point and not Floating Point, what data representation are you supposed to use for this problem?

https://www.mathworks.com/help/fixedpoint/ug/fixed-point-data-types_btb4ld0-1.html

1664118270438.png
 
berkeman said:
Assuming you mean Fixed Point and not Floating Point, what data representation are you supposed to use for this problem?
The problem states "double precision" which is floating point, specifically it is the binary64 implementation of IEEE754: https://en.wikipedia.org/wiki/Double-precision_floating-point_format.

@ver_mathstats can you answer the similar question "In double precision for what values of a does ## 1 - a ## evaluate to 1?"
 

Similar threads

  • · Replies 17 ·
Replies
17
Views
3K
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
28
Views
1K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 32 ·
2
Replies
32
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K