Understanding IEEE Representation for Single Precision

gfd43tg · Aug 3, 2014

Hello,

For the IEEE representation of a number, I wanted to ask something for clarification. For single precision, you have 3 parts: S, Exponent, and Fraction.

The S takes 1 bit (1 slot)
Exponent is 8 bits (8 slots)
Fraction is 23 bits (23 slots).

I was watching a video

and it helped me clear up how to do this, with one tiny caveat. After you divide a number to get under two, say 1.35703125. Your exponent is 2^7, so with a bias of 127 you get 134, which is 10000110 in binary. Now for the fraction part, since the number 1.35703125 is normalized, does that mean that the first '1' in the fraction is implied, and therefore does not take up one of the 23 slots permitted for the fraction? From the video it seems like that is what was done, but I got a little big murky on that point.

Thanks

D H · Aug 3, 2014

That's exactly how normalized numbers are represented.

When you represent a non-zero number in scientific notation (e.g., Avogadro's number, 6.0221413*10²³), the leading digit can be any digit between 1 and 9. The leading digit can only be 1 when you represent some non-zero number in the base 2 equivalent of scientific notation. So why store that leading digit? Not storing it means you get an extra binary digit of precision at no cost.

gfd43tg · Aug 3, 2014

Okay, now in the case of non-normalized numbers, does the 0 take up one of the 23 slots, or is it also implied and follows the same as a normalized number?

D H · Aug 3, 2014

It's an implied leading zero for the denormalized numbers rather than one. There's another special rule for the denormalized numbers: The exponent of zero means a factor of 2^-126 rather than the 2^-127 that the exponent of zero would imply using the bias 127 notation.

gfd43tg · Aug 3, 2014

And I would assume the same goes for double, where the bias is 1022 instead of 1023

D H · Aug 3, 2014

Exactly.

gfd43tg · Aug 3, 2014

Brilliant. Now begs the question, why would one wish to represent a number either normalized or denormalized? Why are there two different ways, and are there certain numbers that can only be represented as such? Outliers such as Inf or NaN come to mind here.

Why would the idea to 'normalize' a number come about? And what is so normal about it??

D H · Aug 4, 2014

Numbers between 2^-126 (about 1.175×10^-38) and 2¹²⁸-2^104[/sup] (about 3.4×10³⁸) are represented as normalized numbers. If the format treated exponent bits = 0 as it did everything else (i.e., no denormalized numbers), the smallest representable non-zero number would be 2^-127. Adding the concept of denormalized numbers extends that lower range down to 2^-149, but at the expense of a loss of a bit of precision for numbers between 2^-127 and 2^-126.

Regarding infinity and NaNs: The ability to represent those is a "feature" that I always turn off. My experience is that infinities and NaNs almost always represent a bug in the underlying code. I want the program to blow up the instant one of those beasts appear. That gives a nice handle for chasing down the bug. Let them persist and you'll have a much harder time finding the bug because those infinities and NaNs poison every calculation in which they appear.

AlephZero · Aug 4, 2014

Maylis said:

Why would the idea to 'normalize' a number come about? And what is so normal about it??

In most numerical computing, you are using a finite-precision floating point representation (like IEEE) as an approximate model of the mathematical real numbers.

Denormalization is necessary to preserve an important property of this number model: if a and b are two different numbers (whether normalized or not), then a-b should never be calculated as zero.

The fact that the denormalized number has lower precision is irrelevant, because subtraction of any two nearly-equal floating point numbers will lose precision, even if the result can be normalized.

gfd43tg · Aug 4, 2014

How do you remember what the limits are for the normalized and denormalized values for both single and double precision?

For example, normalized single precision has a bias of 127, and normalized double precision has a bias of 1023.

denormalized single precision has a bias of 126, denormalized double precision has a bias of 1022.

It appears you take the bias and add one for its upper bound, then subtract one and multiply by negative 1 for its lower bound

Is the range for normalized single precision ##2^{-126}## to ##2^{128}##. Normalized double precision would be ##2^{-1022}## to ##2^{1024}##.

Now that seems to fall apart for denormalized numbers. Apparently its range is brought down to ##2^{-149}##. Is that just a fact to remember? Where does the factor of 149 come in? Is there an increase in the upper bound?

I am trying to keep all of this information straight, because you have to remember both normalized and denormalized, single and double precision. My lecture notes don't suffice or have good information, and I haven't been able to find any concise information on the web.

AlephZero · Aug 4, 2014

Why do you need to remember the exact values? Humans invented writing so they didn't have to remember everything

I just remember that IEEE single precision is about 6 or 7 decimal digits with exponents up to about ##10^{\pm 38}##, and double precision is about 16 decimal digits and exponents up to about ##10^{\pm 300}## - actually it's a bit more than 300, but I can never remember exactly how much more.

D H · Aug 4, 2014

AlephZero said:

Denormalization is necessary to preserve an important property of this number model: if a and b are two different numbers (whether normalized or not), then a-b should never be calculated as zero.

The sole purpose of denormalization is to extend the range of numbers that are representable by the floating point standard. That's it. Denormalization certainly does not help with the problem you mentioned. That problem is an inherent to using a fixed width representation to represent the reals. It's the reason behind having a concept of "machine epsilon", the largest positive number such that (1.0+ε)-1.0 == 0.0. There are a number of properties of the reals that don't hold with the IEEE floating point representation. Most importantly, transitivity is gone. You can no longer trust that (a+b)+c is equal to a+(b+c).

Maylis said:

How do you remember what the limits are for the normalized and denormalized values for both single and double precision?

In a real world setting? You don't. You look them up. You should know those concepts exist, but knowing the specific values is asking for too much from our lousy human memory. I would assume you're in a college setting. Understand the concepts inside and out, remember that single precision is 32 bits wide, double is 64, and the remember the exponent biases for each format. That will tell you how big the exponent field is, which will in turn tell you how big the mantissa is.

Now that seems to fall apart for denormalized numbers. Apparently its range is brought down to ##2^{-149}##. Is that just a fact to remember? Where does the factor of 149 come in?

It's easy. There are three easy additional concepts to remember for the denormalized numbers, and they all make sense.

The exponent bits for a denormalized number is all bits zero.
The implied leading binary digit is zero for the denormalized numbers rather than one.
The exponent is the same as that for 1. For single precision, the exponent is 2^-126 rather than the 2^-127 that would apply if you used the bias concept. For double precision, the exponent is 2^-1022 rather than 2^-1023.

So, just knowing the above concepts, here's how to calculate the smallest representable single precision number. The offset for single precision IEEE format is 127, or 2⁷-1. Seven bits are needed to represent this number. The exponent bits use one more bit than this, so the exponent takes up eight bits. The sign takes up one more, leaving 32-9=23 bits for the mantissa. The smallest representable number is all bits zero except for the LSB. That LSB represents 2^{- <mantissa length>}, or 2^-23. The exponent is 2^1-<bias>, or 2^-126. Multiply 2^-126 and 2^-23 and you get 2^-149.

Doing the same with the double precision format, the offset is 1023, or 2¹⁰-1, so that means an eleven bit exponent. The mantissa takes up 64-(11+1)=52 bits. The smallest representable number in double precision format is therefore 2^-1022*2^-52=2^-1074.

AlephZero · Aug 4, 2014

AlephZero said:

Denormalization is necessary to preserve an important property of this number model: if a and b are two different numbers (whether normalized or not), then a-b should never be calculated as zero.

D H said:

Denormalization certainly does not help with the problem you mentioned. That problem is an inherent to using a fixed width representation to represent the reals.

OK, the wording of my post was ambiguous - what I meant was "if a and b are numbers represented by different floating-point bit patterns, them a-b should never be calculated as zero". The point I was trying to make had nothing to do with approximating real values with finite computer arithmetic.

If you don't allow denormalized numbers, you can't store the difference between any two normalized numbers when both have the minimum exponent. That would mean the concept of "machine epsilon" loses some of its nice properties.

gfd43tg · Aug 5, 2014

Thanks both AZ and DH, great information.

I was wondering, why is the machine epsilon not equal to the smallest representable number? My intuition is telling me that the difference between a representable number and its next closest representable number should be the smallest representable number?

Some commands of interest.

Code:

eps(1)

ans =

   2.2204e-16

EDU>> 2^-1074

ans =

  4.9407e-324

EDU>> 2^-1075

ans =

     0

gfd43tg · Aug 9, 2014

ImageUploadedByPhysics Forums1407618586.757164.jpg

This is an old exam with questions about IEEE representation. What do they mean by eps(1), at 1?? Is there such thing as eps(1) at 2,3,...?

For the 2nd question, I'm not sure if it's correct.

For 3, thankfully this thread helped me know that.

Amd similar for 4, and 5 I wonder if what I got is correct? Of course I have to be able to explain of all them as well..

D H · Aug 9, 2014

Based on your last two unanswered questions, you still appear to be a bit confused.

Perhaps it might be easier to forget about base 2 for a bit and look to base 10 instead. Suppose you want to represent positive real numbers using the form 0.ddd×10^±ee. This scheme provides the ability to represent numbers between 0.001×10^-99 and 0.999×10⁹⁹. There are three ways to represent 1 in this scheme: 0.100×10¹, 0.010×10², and 0.001×10³. The first is the normalized representation. In this scheme, all numbers between 0.1×10^-99 and 0.999×10⁹⁹ are represented normalized. The denormalized representation, where the leading digit is zero, is reserved for numbers smaller than 0.1×10^-99.

What about zero? That's simple. That's 0.000×10^-99. It's the smallest denormalized number.

The next representable number after zero is 0.001×10^-99. The difference between this and zero is of course 0.001×10^-99. The same holds for the difference between consecutive representable numbers up to 0.999×10^-99 and 0.100×10^-98. The next step up, however, is 0.101×10^-98. The difference between 0.101×10^-98 and 0.100×10^-98 is 0.001×10^-98. At the very top of the scale, you're looking at the difference between 0.999×10⁹⁹ and 0.998×10⁹⁹, or 0.001×10⁹⁹. The difference between successive representable numbers depends very much on the magnitude of the number in question.The exact same concept holds for the binary IEEE floating point representations. The difference between the smallest representable number larger than one and one itself is "machine epsilon". The difference between the smallest representable number larger than two and two itself is twice this machine epsilon, and so on. The delta between successive representable numbers is very small when the number in question is small, but is rather larger when the number in question is large.

gfd43tg · Aug 11, 2014

Was F.1 through F.3 at least correct? In F.2 they ask for the smallest non-representable positive integer number. I just guessed that it is one since that is the smallest positive integer, but why is 1 not representable?

I still don't understand what they mean by eps(1) at 1. What would eps(1) at 2 look like??

Are there 2^53 representable numbers in the domain [2^52,2^53)?

I think the problem is that I don't know what I don't know

D H · Aug 11, 2014

You got everything right except for question F.2. You shouldn't get full credit for those answers you did get correct because you only gave an explanation for question F.3.

Regarding question F.2: One is a representable number. One is 2^0, so the sign bit is 0, the exponent is 1023, and the mantissa is zero, so the IEEE 64 representation of one is 0x3ff0000000000000.

Hint: 1+2^100 is an integer, yet it can't be represented exactly in the IEEE64 floating point format. Why not? (BTW, this is not the answer to the question.)

That's not the smallest positive integer that cannot be represented exactly in the IEEE64 floating point format. There are smaller integers that can't be represented in that format.

gfd43tg · Aug 11, 2014

Between any powers of 2, are there 2^52 representable numbers? I just guessed on that one. Is the number of representable numbers between 2^3 and 2^4 2^52 as well?? Or how do you determine that

Code:

log2(eps(1))

ans =

   -52

so the distance between 1 and the next representable number is 2^ -52, now I see that one. I think I discovered a pattern for determing the value of eps(x).

So you just express x as a power of 2, if its a whole number, then you do 2^x-52, and that is eps(x). If yoy have something that is not expressed as a power of 2, then you just go back to the lower number that can be, so eps(5) = eps(4)

So eps(4), 4 is 2^2, so eps(4) is 2^(2-52) = 2^-50.

So now, I can say with some confidence I understand F.1, F.3, and F.4. Now where I am stuck is F.2 and F.5.

Edit: I might be able to justify F.5 now, let's give this a whirl

So I know eps(2^52) is 1, because 2^(52-52) is 0, hence 2^0 = 1. So, everything in between there also has an eps(x) = 1. That means you can have ##2^{52}, 1+2^{52}, 2+2^{52},3+2^{52} ...2^{53}##. So there are 2^52 numbers in between 2^52 and 2^53.

So I guess what you should do to find the number of representable numbers is subtract the highest number from the lowerst number in the interval, then divide by eps(lower interval)??

##2^{53} - 2^{52} = 2^{52}/1 = 2^{52}## representable numbers

So to create another example, the number of representable numbers between [4 8).

eps(4) = 2^(2-52) = 2^-50 spacing between.

##8-4 = 4##, so ##4/2^{-50} = 4*2^{50}## representable numbers between 4 and 8

D H · Aug 12, 2014

There are 2⁵² representable numbers in the interval [2ⁿ,2ⁿ⁺¹) if 2ⁿ has a normalized representation. For example, there are none in the interval [2¹⁰²⁴,2¹⁰²⁵) because 2¹⁰²⁴ is out of range.

What about the denormalized numbers? There is only one representable number in the interval [2^-1074,2^-1073), 2^-1074 itself. The next largest representable number after 2^-1074 is 2^-1073. There are two representable numbers in the interval [2^-1073,2^-1072), three in [2^-1072,2^-1071), and so on, until you get to the interval [2^-1022,2^-1021), which contains 2⁵² representable numbers. Every power of 2 interval from that one to [2¹⁰²³,2¹⁰²⁴) does contain 2⁵² representable numbers.

gfd43tg · Aug 12, 2014

D H said:

There are 2⁵² representable numbers in the interval [2ⁿ,2ⁿ⁺¹)

Does that mean the relative spacing between all numbers is the same for normalized numbers?

D H · Aug 12, 2014

Of course not. It means the spacing is the same for all representable numbers between 2ⁿ and 2ⁿ⁺¹.

gfd43tg · Aug 12, 2014

EDIT: nevermind, I get it now.

I still don't understand the thing about the smallest representable number. Someone said it was 1+2^53, that is very big to be a smallest representable number, how do you do the analysis to determine that?

D H · Aug 12, 2014

Your question F.2 asked for the smallest non-representable integer. That obviously needs to be a biggish number.

Understanding IEEE Representation for Single Precision

1. What is IEEE representation?

2. Why is IEEE representation important?

3. What are some key elements of IEEE representation?

4. How can I ensure I am using proper IEEE representation in my work?

5. Are there any common mistakes to avoid in IEEE representation?

Similar threads

Hot Threads

Recent Insights