IEEE representation

by Maylis
Tags: ieee, representation
 PF Gold P: 571 Hello, For the IEEE representation of a number, I wanted to ask something for clarification. For single precision, you have 3 parts: S, Exponent, and Fraction. The S takes 1 bit (1 slot) Exponent is 8 bits (8 slots) Fraction is 23 bits (23 slots). I was watching a video https://www.youtube.com/watch?v=H79PNQ4Z9HE and it helped me clear up how to do this, with one tiny caveat. After you divide a number to get under two, say 1.35703125. Your exponent is 2^7, so with a bias of 127 you get 134, which is 10000110 in binary. Now for the fraction part, since the number 1.35703125 is normalized, does that mean that the first '1' in the fraction is implied, and therefore does not take up one of the 23 slots permitted for the fraction? From the video it seems like that is what was done, but I got a little big murky on that point. Thanks
 Mentor P: 15,196 That's exactly how normalized numbers are represented. When you represent a non-zero number in scientific notation (e.g., Avogadro's number, 6.0221413*1023), the leading digit can be any digit between 1 and 9. The leading digit can only be 1 when you represent some non-zero number in the base 2 equivalent of scientific notation. So why store that leading digit? Not storing it means you get an extra binary digit of precision at no cost.
 PF Gold P: 571 Okay, now in the case of non-normalized numbers, does the 0 take up one of the 23 slots, or is it also implied and follows the same as a normalized number?
 Mentor P: 15,196 IEEE representation It's an implied leading zero for the denormalized numbers rather than one. There's another special rule for the denormalized numbers: The exponent of zero means a factor of 2-126 rather than the 2-127 that the exponent of zero would imply using the bias 127 notation.
 PF Gold P: 571 And I would assume the same goes for double, where the bias is 1022 instead of 1023
 Mentor P: 15,196 Exactly.
 PF Gold P: 571 Brilliant. Now begs the question, why would one wish to represent a number either normalized or denormalized? Why are there two different ways, and are there certain numbers that can only be represented as such? Outliers such as Inf or NaN come to mind here. Why would the idea to 'normalize' a number come about? And what is so normal about it??
 Mentor P: 15,196 Numbers between 2-126 (about 1.175×10-38) and 2128-2^104[/sup] (about 3.4×1038) are represented as normalized numbers. If the format treated exponent bits = 0 as it did everything else (i.e., no denormalized numbers), the smallest representable non-zero number would be 2-127. Adding the concept of denormalized numbers extends that lower range down to 2-149, but at the expense of a loss of a bit of precision for numbers between 2-127 and 2-126. Regarding infinity and NaNs: The ability to represent those is a "feature" that I always turn off. My experience is that infinities and NaNs almost always represent a bug in the underlying code. I want the program to blow up the instant one of those beasts appear. That gives a nice handle for chasing down the bug. Let them persist and you'll have a much harder time finding the bug because those infinities and NaNs poison every calculation in which they appear.
Engineering
HW Helper
Thanks
P: 7,274
 Quote by Maylis Why would the idea to 'normalize' a number come about? And what is so normal about it??
In most numerical computing, you are using a finite-precision floating point representation (like IEEE) as an approximate model of the mathematical real numbers.

Denormalization is necessary to preserve an important property of this number model: if a and b are two different numbers (whether normalized or not), then a-b should never be calculated as zero.

The fact that the denormalized number has lower precision is irrelevant, because subtraction of any two nearly-equal floating point numbers will lose precision, even if the result can be normalized.
 PF Gold P: 571 How do you remember what the limits are for the normalized and denormalized values for both single and double precision? For example, normalized single precision has a bias of 127, and normalized double precision has a bias of 1023. denormalized single precision has a bias of 126, denormalized double precision has a bias of 1022. It appears you take the bias and add one for its upper bound, then subtract one and multiply by negative 1 for its lower bound Is the range for normalized single precision ##2^{-126}## to ##2^{128}##. Normalized double precision would be ##2^{-1022}## to ##2^{1024}##. Now that seems to fall apart for denormalized numbers. Apparently its range is brought down to ##2^{-149}##. Is that just a fact to remember? Where does the factor of 149 come in? Is there an increase in the upper bound? I am trying to keep all of this information straight, because you have to remember both normalized and denormalized, single and double precision. My lecture notes don't suffice or have good information, and I haven't been able to find any concise information on the web.
 Engineering Sci Advisor HW Helper Thanks P: 7,274 Why do you need to remember the exact values? Humans invented writing so they didn't have to remember everything I just remember that IEEE single precision is about 6 or 7 decimal digits with exponents up to about ##10^{\pm 38}##, and double precision is about 16 decimal digits and exponents up to about ##10^{\pm 300}## - actually it's a bit more than 300, but I can never remember exactly how much more.
Mentor
P: 15,196
 Quote by AlephZero Denormalization is necessary to preserve an important property of this number model: if a and b are two different numbers (whether normalized or not), then a-b should never be calculated as zero.
The sole purpose of denormalization is to extend the range of numbers that are representable by the floating point standard. That's it. Denormalization certainly does not help with the problem you mentioned. That problem is an inherent to using a fixed width representation to represent the reals. It's the reason behind having a concept of "machine epsilon", the largest positive number such that (1.0+ε)-1.0 == 0.0. There are a number of properties of the reals that don't hold with the IEEE floating point representation. Most importantly, transitivity is gone. You can no longer trust that (a+b)+c is equal to a+(b+c).

 Quote by Maylis How do you remember what the limits are for the normalized and denormalized values for both single and double precision?
In a real world setting? You don't. You look them up. You should know those concepts exist, but knowing the specific values is asking for too much from our lousy human memory. I would assume you're in a college setting. Understand the concepts inside and out, remember that single precision is 32 bits wide, double is 64, and the remember the exponent biases for each format. That will tell you how big the exponent field is, which will in turn tell you how big the mantissa is.

 Now that seems to fall apart for denormalized numbers. Apparently its range is brought down to ##2^{-149}##. Is that just a fact to remember? Where does the factor of 149 come in?
It's easy. There are three easy additional concepts to remember for the denormalized numbers, and they all make sense.
1. The exponent bits for a denormalized number is all bits zero.
2. The implied leading binary digit is zero for the denormalized numbers rather than one.
3. The exponent is the same as that for 1. For single precision, the exponent is 2-126 rather than the 2-127 that would apply if you used the bias concept. For double precision, the exponent is 2-1022 rather than 2-1023.

So, just knowing the above concepts, here's how to calculate the smallest representable single precision number. The offset for single precision IEEE format is 127, or 27-1. Seven bits are needed to represent this number. The exponent bits use one more bit than this, so the exponent takes up eight bits. The sign takes up one more, leaving 32-9=23 bits for the mantissa. The smallest representable number is all bits zero except for the LSB. That LSB represents 2- <mantissa length>, or 2-23. The exponent is 21-<bias>, or 2-126. Multiply 2-126 and 2-23 and you get 2-149.

Doing the same with the double precision format, the offset is 1023, or 210-1, so that means an eleven bit exponent. The mantissa takes up 64-(11+1)=52 bits. The smallest representable number in double precision format is therefore 2-1022*2-52=2-1074.
Engineering
HW Helper
Thanks
P: 7,274
 Quote by AlephZero Denormalization is necessary to preserve an important property of this number model: if a and b are two different numbers (whether normalized or not), then a-b should never be calculated as zero.
 Quote by D H Denormalization certainly does not help with the problem you mentioned. That problem is an inherent to using a fixed width representation to represent the reals.
OK, the wording of my post was ambiguous - what I meant was "if a and b are numbers represented by different floating-point bit patterns, them a-b should never be calculated as zero". The point I was trying to make had nothing to do with approximating real values with finite computer arithmetic.

If you don't allow denormalized numbers, you can't store the difference between any two normalized numbers when both have the minimum exponent. That would mean the concept of "machine epsilon" loses some of its nice properties.
 PF Gold P: 571 Thanks both AZ and DH, great information. I was wondering, why is the machine epsilon not equal to the smallest representable number? My intuition is telling me that the difference between a representable number and its next closest representable number should be the smallest representable number? Some commands of interest. eps(1) ans = 2.2204e-16 EDU>> 2^-1074 ans = 4.9407e-324 EDU>> 2^-1075 ans = 0
 PF Gold P: 571 This is an old exam with questions about IEEE representation. What do they mean by eps(1), at 1?? Is there such thing as eps(1) at 2,3,...? For the 2nd question, I'm not sure if it's correct. For 3, thankfully this thread helped me know that. Amd similar for 4, and 5 I wonder if what I got is correct? Of course I have to be able to explain of all them as well..
 Mentor P: 15,196 Based on your last two unanswered questions, you still appear to be a bit confused. Perhaps it might be easier to forget about base 2 for a bit and look to base 10 instead. Suppose you want to represent positive real numbers using the form 0.ddd×10±ee. This scheme provides the ability to represent numbers between 0.001×10-99 and 0.999×1099. There are three ways to represent 1 in this scheme: 0.100×101, 0.010×102, and 0.001×103. The first is the normalized representation. In this scheme, all numbers between 0.1×10-99 and 0.999×1099 are represented normalized. The denormalized representation, where the leading digit is zero, is reserved for numbers smaller than 0.1×10-99. What about zero? That's simple. That's 0.000×10-99. It's the smallest denormalized number. The next representable number after zero is 0.001×10-99. The difference between this and zero is of course 0.001×10-99. The same holds for the difference between consecutive representable numbers up to 0.999×10-99 and 0.100×10-98. The next step up, however, is 0.101×10-98. The difference between 0.101×10-98 and 0.100×10-98 is 0.001×10-98. At the very top of the scale, you're looking at the difference between 0.999×1099 and 0.998×1099, or 0.001×1099. The difference between successive representable numbers depends very much on the magnitude of the number in question. The exact same concept holds for the binary IEEE floating point representations. The difference between the smallest representable number larger than one and one itself is "machine epsilon". The difference between the smallest representable number larger than two and two itself is twice this machine epsilon, and so on. The delta between successive representable numbers is very small when the number in question is small, but is rather larger when the number in question is large.
 PF Gold P: 571 Was F.1 through F.3 at least correct? In F.2 they ask for the smallest non-representable positive integer number. I just guessed that it is one since that is the smallest positive integer, but why is 1 not representable? I still don't understand what they mean by eps(1) at 1. What would eps(1) at 2 look like?? Are there 2^53 representable numbers in the domain [2^52,2^53)? I think the problem is that I don't know what I don't know
 Mentor P: 15,196 You got everything right except for question F.2. You shouldn't get full credit for those answers you did get correct because you only gave an explanation for question F.3. Regarding question F.2: One is a representable number. One is 2^0, so the sign bit is 0, the exponent is 1023, and the mantissa is zero, so the IEEE 64 representation of one is 0x3ff0000000000000. Hint: 1+2^100 is an integer, yet it can't be represented exactly in the IEEE64 floating point format. Why not? (BTW, this is not the answer to the question.) That's not the smallest positive integer that cannot be represented exactly in the IEEE64 floating point format. There are smaller integers that can't be represented in that format.

 Related Discussions Electrical Engineering 0 General Engineering 4 Electrical Engineering 0 General Engineering 3 General Engineering 1