C/C++ What Is Fixed Point Notation in C++?

AI Thread Summary
The discussion centers on the use of bit shifting in C++ code, specifically the notation 1 << 8, which initially appears to be a standard bit shift. However, it is also interpreted as a representation of fixed-point notation, where the high-order bits represent the integer part and the low-order bits represent the fractional part of a number. In a 16-bit integer, the first 8 bits can be used for different purposes, while the last 8 bits are utilized for fractional values, allowing for a fixed decimal point. The representation simplifies arithmetic operations, as addition and subtraction do not require shifts, while multiplication and division involve specific shifts and rounding. The discussion contrasts fixed-point with floating-point systems, noting that while fixed-point is easier to work with for certain applications, floating-point offers greater efficiency for a wider range of values. The conversation also touches on practical applications in firmware and hardware that utilize fixed-point arithmetic.
lawtonfogle
Messages
159
Reaction score
0
Recently in C++ code I came across the notation 1 << 8. At first, I thought it was just a standard bitshift, till an explanation of the code told me it was fixed point notation, something I had not heard of till then.

I have tried to read up about fixed point notation, but am still confused about it. As far as I can tell, the code was using a 16 bit integer, where the first 8 bits were used for something completely different than the last 8, and thus 1 was shifted up 8 bits. But, I have no clue exactly what value this would be or how to find it given the number.
 
Technology news on Phys.org
Perhaps that is just a way of creating integer with coded content, something like

('R' << 24) + ('I' << 16) + ('F' << 8) + 'F'

to put 'RIFF' in a 32 bit integer?

(Edit: I am almost sure I am wrong with the character order, but you should get the general idea).
 
1<<8 is a standard bit shift. The intent of the bit shift varies with application. It might be a goofy way to represent 256, a way to set a specific element in a bitfield, a way indicate a process ended with a hang-up (SIGHUP) termination status, ...

or it might be a way to represent one in a fixed binary point notation in which the low order byte represents the fractional part of some number and the high order byte represents the integer part of the number.
 
Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 215, 214 ... 21, 20 }

You're free to choose whatever representation you like, though. How about this one?

{ 27, 26 ... 21, 20, 2-1 ... 2-6, 2-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren
 
junglebeast said:
The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing.
Except that you'd have to define math operators to use for those bit fields or do it manually, and it would be slower than a fixed point library (in assembler you could access double precision results from a pair of registers to combine them back into a fixed point for a return value, in C you could use a double precision integer if it was available).

Add and subtract don't require any post op shifts for fixed point math. Muitply will require a round and right shift. Divide requires a pre-op left shift of the divisor. Extended precision math may be needed for mutliply and divide. I've worked with peripherals firmware that had hardware fixed point math included, you write one number to port, the other number to another port, and then read the answer from a 3rd port (the hardware is either fast enough to do this, or else you use dummy port writes to give the hardware time to provide the results. Ready status reading usually isn't used because the hardware has a fixed speed and doesn't need a variable timing handshake).
 
chroot said:
Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 215, 214 ... 21, 20 }

You're free to choose whatever representation you like, though. How about this one?

{ 27, 26 ... 21, 20, 2-1 ... 2-6, 2-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren


Ok, thanks.

I actually learned floating points in class (had to write an app which took in the binary representation of a floating point as a decimal digit number of only 1's and 0's (so we couldn't cheat) and convert it to a float), but I have since completely forgotten about them. But, since the NDS does not use them, I can 'not remember' till next semester begins.
 
Back
Top