Numerical analysis, floating-point arithmetic

Click For Summary
SUMMARY

The discussion focuses on the claim that if two floating-point numbers, x and y, with the same sign differ by a factor of at most the base B (1/B ≤ x/y ≤ B), then their difference x-y is exactly representable in the floating-point system. This claim is confirmed for B = 2, where the representation holds true. However, a counterexample is required for bases greater than 2, as the representation may fail. Participants suggest using specific numerical examples in bases greater than 2, such as base 3, to illustrate the failure of representability.

PREREQUISITES
  • Understanding of floating-point arithmetic
  • Familiarity with numerical analysis concepts
  • Knowledge of base systems in mathematics
  • Ability to manipulate and analyze mathematical equations
NEXT STEPS
  • Research the properties of floating-point representation in base 3 and higher
  • Learn about the implications of floating-point precision errors
  • Explore numerical analysis techniques for error analysis in computations
  • Investigate specific examples of floating-point numbers in various bases
USEFUL FOR

Students and professionals in numerical analysis, mathematicians, and software engineers dealing with floating-point arithmetic and precision issues in computational applications.

Chasing_Time
Messages
7
Reaction score
0
Hi all, this (probably easy) problem from numerical analysis is giving me trouble. I can't seem to get started and need some poking in the right direction.

Homework Statement



Consider the following claim: if two floating point numbers x and y with the same sign differ by a factor of at most the base B (1/B <= x/y <= B), then their difference x-y is exactly representable in the floating point system. Show that this claim is true for B = 2 but give a counter example for B > 2.


Homework Equations



The general form of a floating point number:

x = d_0.d_1 ... d_{t-1} * 10^e


The Attempt at a Solution



I have tried exploring the binary case, noting that d_0 must be = 1 in base B=2:

x = (1 + \frac {d_1}{2} + ... + \frac {d_{t-1}} {2^{t-1}}) * 2^e
y = (1 + \frac {d_1}{2} + ... + \frac {d_{t-1}} {2^{t-1}}) * 2^{e-1} = (\frac {1}{2} + \frac {d_1} {4} + ... + \frac {d_{t-1}} {2^t}) * 2^e
x - y = (1 + \frac {d_1 - 1} {2} + ... + \frac {d_{t-1} - d_{t-2}} {2^{t-1}} - \frac {d_{t-1}} {2^t})*2^e

Is this "exactly representable" in the floating-point system? I don't know what else to do or what to use as a counter example. Am I even on the right track? Thanks for any help.
 
Physics news on Phys.org
Chasing_Time said:
Hi all, this (probably easy) problem from numerical analysis is giving me trouble. I can't seem to get started and need some poking in the right direction.

Homework Statement



Consider the following claim: if two floating point numbers x and y with the same sign differ by a factor of at most the base B (1/B <= x/y <= B), then their difference x-y is exactly representable in the floating point system. Show that this claim is true for B = 2 but give a counter example for B > 2.


Homework Equations



The general form of a floating point number:

x = d_0.d_1 ... d_{t-1} * 10^e


The Attempt at a Solution



I have tried exploring the binary case, noting that d_0 must be = 1 in base B=2:

x = (1 + \frac {d_1}{2} + ... + \frac {d_{t-1}} {2^{t-1}}) * 2^e
y = (1 + \frac {d_1}{2} + ... + \frac {d_{t-1}} {2^{t-1}}) * 2^{e-1} = (\frac {1}{2} + \frac {d_1} {4} + ... + \frac {d_{t-1}} {2^t}) * 2^e
x - y = (1 + \frac {d_1 - 1} {2} + ... + \frac {d_{t-1} - d_{t-2}} {2^{t-1}} - \frac {d_{t-1}} {2^t})*2^e
Your arithmetic is off here. Since you have set this up with x being two times y, the difference x - y better be equal to y.
Chasing_Time said:
Is this "exactly representable" in the floating-point system? I don't know what else to do or what to use as a counter example. Am I even on the right track? Thanks for any help.
Certainly x - y is exactly representable in a base-2 floating-point system, as long as x and y are.

I don't have any examples in mind that would serve as counterexamples, but if you work with some specific numbers in base 3 or higher bases, you might be able to come up with one. By "specific numbers" I mean that you should work with numbers like 2.0121 X 32 (base-3), rather than symbolically representing the digits with d1, d2, etc. That's where I would start.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
6
Views
5K
Replies
8
Views
1K
Replies
3
Views
2K