IEEE 754 - Little questions here and there for a problem that has sols

  • Thread starter Thread starter s3a
  • Start date Start date
  • Tags Tags
    Ieee
AI Thread Summary
The discussion revolves around understanding the IEEE 754 single-precision floating-point representation, particularly focusing on the calculations for exponent and significant bits. Participants clarify the reasoning behind using log calculations to determine the exponent and the significance of choosing specific powers of two for multiplication. There is a debate about the accuracy of the final representation, including the mantissa and rounding conventions, with corrections suggested for both binary and hexadecimal outputs. The conversation emphasizes the importance of precision in representing significant digits and the conventions used in IEEE 754. Overall, the participants collaboratively refine their understanding of the problem and solution, ensuring clarity in the calculations involved.
s3a
Messages
814
Reaction score
8

Homework Statement


The problem and its solution are attached as "IEEE754_ProblemAndSolution.jpg".

Homework Equations


The 32-bit/single-precision IEEE 754 procedure.

The Attempt at a Solution


1) What is the reasoning behind the log|-8.573 × 10^13| / log(2) computation?

2) When obtaining the exponent 47 in 2^47 from the computation in my question #1, I noticed that a 2^47 / 1.4074 × 10^14 = 1 was multiplied with -8.573 × 10^13.

My question for this part #2 is: why was that 47 “hunted down”? Why couldn't I have multiplied -8.573 × 10^13 by 2^50 / 1.1259 × 10^15 = 1 for example? Is there a significance to this or could I have chosen to multiply by 2^46 / 7.034 × 10^13 as well?

3) For the number of significant bits, I get the 14 – 1 part but, I don't understand how the 4/log(2) computation was obtained.

If something is unclear with what I am asking, tell me and I will attempt to clarify the situation.

Any input would be greatly appreciated!
 

Attachments

  • IEEE754_ProblemAndSolution.jpg
    IEEE754_ProblemAndSolution.jpg
    30.7 KB · Views: 445
Physics news on Phys.org
Say you wanted to write the number 1234 in scientific notation. Noting that 103 < 1234 < 104, you know it has to be
$$1.234 \times 10^3.$$ If you were to choose a larger exponent, the mantissa would be less than 1, e.g., ##0.1234\times 10^4##. If you used a smaller exponent, the mantissa would be greater than 10, e.g., ##12.34\times 10^2##. You want the closest power of 10 that's less than 1234. So how do you know 1234 between 103 and 104? It's because log10 1234 = 3.091.

You're doing the same thing here, except you're working with powers of 2. Recall that ##\log_b x = \frac{\log x}{\log b}##.

The first calculation tells you that ##2^{46} < 8.573\times 10^{13} < 2^{47}##. (You really want to use 246 instead of 247 that was used in the solution. Note that there had to be a multiplication by 2 at the end to fix the mantissa up.)

The bit about 4/log(2) = 13.28 is a typo (as it's clearly false). It should have said log(104)/log(2) = 13.28. Why 104? It's because the 8.573 has 4 significant figures.
 
Hello and, thank you very much! :)

Just to say, 4/log(2) = 13.28 is correct if you assume log is log_10 (as opposed to log being log_2 as was previously assumed in the solution). As you likely know, the ratio of two logarithms is independent of the base of the logarithms as long the base of the logarithm on the numerator is equivalent to the base of the logarithm on the denominator.

Also, just to put it in my own words, the procedure used for the part with what should have been 2^46 instead of 2^47 was for getting the exponent in a decimal representation with an exponent with base 2 along with a mantissa of 1 and whichever fractional field whereas the point of the work that yields the approximate value of 13.28 was for knowing how many bits would be needed for keeping the same precision (or, possibly, a little more precision?) using a binary representation (instead of a decimal one), right?
 
s3a said:
Hello and, thank you very much! :)

Just to say, 4/log(2) = 13.28 is correct if you assume log is log_10 (as opposed to log being log_2 as was previously assumed in the solution). As you likely know, the ratio of two logarithms is independent of the base of the logarithms as long the base of the logarithm on the numerator is equivalent to the base of the logarithm on the denominator.
Yes, of course, you're right. I was using a natural log here.

Also, just to put it in my own words, the procedure used for the part with what should have been 2^46 instead of 2^47 was for getting the exponent in a decimal representation with an exponent with base 2 along with a mantissa of 1 and whichever fractional field whereas the point of the work that yields the approximate value of 13.28 was for knowing how many bits would be needed for keeping the same precision (or, possibly, a little more precision?) using a binary representation (instead of a decimal one), right?
Yup, you seem to have it down.
 
Sorry, I double posted.
 
Looking at the final answer, did the person who made the solutions make a mistake?

I ask because, to me, it seems that the person took 0.6092 (which was the fractional part in front of the 2^47) and put the part to the right of the radix point (where the mantissa to the left of the radix point is 0) as the fraction field portion of the IEEE 754 number. Also, the decimal version of the value stored in the exponent field is 86 which just seems wrong to me and, I have trouble trying to figure out what kind of mistake the person could have made in order to try and make sense of things. After that, it seems that the hexadecimal number was also incorrectly converted as can be seen here ( http://www.wolframalpha.com/input/?i=convert+11010110100110111111001000000_2+to+hexadecimal ).

Could you please confirm (or deny), for me, that the correct final answer is the following?:
(1 10101101 00110111111010000000000)_2 = D69BF400_16
 
It looks like there's one mistake, but not the ones you're thinking of. There's a missing 0 bit, so the fourth nibble of the mantissa in the solution is 4, but it should be 2.

I think you're just misreading the final answer because the bits are split up kind of strangely. It should be

sign bit (1 bit) = 1
exponent (8 bits) = 1010 1101 = 173 = 46+127
mantissa (23 bits) = 0011 0111 1110 0010 0000 000 = 37E2016 plus the last 3 bits, all zero.

Concatenating those together and then splitting into nibbles gives

1101 0110 1001 1011 1111 0001 0000 0000

or, in hexadecimal, D69BF100.
 
It occurred to me that the solution is actually correct as written. If you limit the mantissa to 14 bits and round up the last bit because the following bit would have been a 1, you get the result in the solution.
 
Is it convention to just round up like that in binary?

Regardless, I believe 13 digits after the radix point should be considered since the 1 before the radix point is implicit 13 + 1 = 14 significant (binary) digits and I bolded those 13 digits (in the fraction field).:

1 10101101 00110111111000100000000

It seems to me that including the rounding, in this case, would cause there to be a 15th significant digit to be considered (which is not wanted – as stated in the instruction) since the 14th digit after the radix point is a 0 just like the 13th digit – it's the 15th digit after the radix point that is a 1.

Basically, shouldn't the fraction field be as follows?:
00110111111000000000000000000000
 
  • #10
Yes, I think you're right. I wasn't clear in my previous post, but I was just guessing about the rounding. IEEE 754 does specify rounding conventions, but it's probably for truncating a number to fit into 32 or 64 bits. In any case, it shouldn't matter when all you want is 13 bits.

By the way, I found a converter online: http://babbage.cs.qc.cuny.edu/IEEE-754/index.xhtml
 
  • #11
Edit: If you read my version before I edited it, read my/this post again.

Thanks for the link.

Also, sorry for being pedantic but, to be extra clear, the answer (in both binary and hexadecimal forms) is actually (slightly) wrong and the correct answer (to the exact amount of significant digits requested) is (1 10101101 00110111111000000000000)_2 = D69BF000_16 instead, right?
 
Last edited:
  • #12
No, I think you're right that the last one bit should actually be zero since you want to truncate at 13 bits.
 
  • #13
vela said:
No, I think you're right that the last one bit should actually be zero since you want to truncate at 13 bits.
Did you answer my latest update or the one before that?
 
  • #14
The one before that. :smile:
 
  • #15
vela said:
The one before that. :smile:
Okay so, unless I've gotten too sleepy, it seems that you're agreeing with my most up-to-date post. Are you?
 
  • #16
Yes.
 
  • #17
vela said:
Yes.
Yay! Alright, thank you very much for following this all with me. :D
 
Back
Top