A floating point Notation Exercise

Click For Summary

Discussion Overview

The discussion revolves around a floating point binary notation exercise involving 16 bits, focusing on calculating the minimum exponent required to represent two specific numbers, r = -8147.31 and s = 0.103 x 10^{-6}. Participants explore the order of magnitude for both numbers and the implications for the exponent in the notation.

Discussion Character

  • Exploratory
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant outlines a method to calculate the minimum exponent, e_min, by determining the order of magnitude for both r and s and using proportions based on powers of 2.
  • Another participant suggests calculating the steps for each number separately and selecting the larger exponent, leading to a conclusion that e_min should be 6.
  • Some participants note the common practice in floating point formats where the first bit of the mantissa is assumed to be one, which may affect the representation of numbers.
  • Concerns are raised about the ability to accurately represent the number 8147.31 within the 16-bit limit, with one participant asserting that it requires more than 16 bits to represent accurately.

Areas of Agreement / Disagreement

Participants express differing views on the calculations and implications of the floating point representation, particularly regarding the minimum exponent needed and the ability to represent certain values accurately. No consensus is reached on the correct approach or final values.

Contextual Notes

Participants mention the limitations of the 16-bit representation and the assumptions made regarding the mantissa and exponent, highlighting unresolved issues about the accuracy of representing specific numbers.

hastings
Messages
80
Reaction score
0
Consider a floating point binary notation with 16 bits. From left to right, it consists of 1bit for the sign (0= "+"), e bits for the exponent represented in Excess[tex]~2^{e-1}[/tex] and the remaining bits for the decimal part of the mantissa, normalized between 1 and 2 ([tex]1 \leq m <2[/tex]).

a) Calculate the minimum value [tex]e_{min}[/tex] of the exponent that allows us to write in the above notation, both the numbers r= -8147.31 and
s= [tex]0.103 \cdot 10^{-6}[/tex];

This is what I would do.
1. Calculate the order of magnitude of both r and s
2. Write a proportion knowing that [tex]2^{10} \approx 10^3[/tex] (like say 10:3= x: 4, considering 4 the result of point 1. ).
3. Find x from the above proportion and find the highest power of 2 which includes x (like say x=15, [tex]2^3 \leq 15 \leq 2^4[/tex], I'd take [tex]2^4[/tex])
4. Calculate [tex]e_{min}[/tex]: since it's in excess [tex]2^{e-1}[/tex], I solve the equation [tex]2^4=2^{e-1} \Rightarrow e=e_{min}=4+1=5[/tex], where [tex]2^4[/tex] is the result of point 3.

Is this resoning right?

Now, when I went to calculate the order of magnitude of r and s, I got that
Ord of Magn r=[tex]10^4[/tex], better say 4.
Ord of Magn s=[tex]10^{-5}[/tex] better say -5.
Which should I consider as a starting point, [tex]10^4 \mbox{ or } 10^{5}[/tex] ?
 
Last edited:
Computer science news on Phys.org
hey, I really REALLY, need a help with this exercise, got an exam day after tomorrow.


got an idea: suppose I do the 1 to 4 steps for each of the numbers? In the end I see which one "includes" the other and I choose that one.

Let's do it.

Let's take r, its Order of magnitude is 4

[tex]10^4 \approx 2^{10} \Rightarrow 2^2 \leq 10 \leq \underbrace{2^3} \Rightarrow 2^3=2^{e-1} \Rightarrow e=3+1=4[/tex]

Then take s, its O of M is -5 but since at the end what we get is the number of bits, which cannot be negative, we shall consider it simply 5.
This time we need to write a proportion about exponents:
since [tex]2^{10} \approx 10^3[/tex] and we want to find the equivalent of [tex]10^5=2^?[/tex]

[tex]10:3=x:5 \Rightarrow x=\frac{10 \cdot 5}{3} \approx 17 \Rightarrow 2^4 \leq 17 \leq \underbraces{2^5} \Rightarrow 2^5=2^{e-1} \Rightarrow e=5+1=6[/tex]

Since e=6 includes e=4, e_min =6.
So I suppose I should have taken 5 as the common order of magnitude for both the numbers in the decimal system.

Is what I just said a big bunch of nonsense?
Please reply asap!
 
Last edited:
Don't forget the common "cheat" used by most formats, where the first bit of the mantissa is assumed to be one and not included in the bits of a floating point number. You may have covered this case since you stated the mantissa represents a number between 1 and 2. In most floating point formats, the mantissa represents a number between >0 and <1. Reserved combinations of values are used for special cases, like all zero bits for zero.

However you've got a problem, 8147.31 takes more than 16 bits to represent to the nearest 1/100th.
 
Jeff Reid said:
However you've got a problem, 8147.31 takes more than 16 bits to represent to the nearest 1/100th.

Sorry I didn't get you.

The notation is

sign / exponent in Exc.2^{e-1} / mantissa norm. 1 & 2
1bit / e bits / (15-e) bits = Tot. 16 bits

example
suppose we find out e=6
and we want ot represent +1.0101 * 2^3 (doesn't matter what's its value)
exponent: since it's in Excess 2^(e-1), ==> 3 +(32) =35 (32 come from 2^(e-1)=2^5=32)
with 6 bits, 35 is 1 0 0 0 1 1 .
mantissa: 9 bits (=15-e=15-6) 0101 000 00

So ultimately the number in the above notation is
0 100011 010100000 (from left to right: 0 means it's positive, the following 6 bits are the exponent and the remaining 9 are the mantissa)
 
My point is that it takes 20 bits (or at least more than 19 bits) to represent 814731 x 10^? accurately.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 52 ·
2
Replies
52
Views
7K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K