C/C++ What Is Fixed Point Notation in C++?

lawtonfogle · Jun 2, 2009

Recently in C++ code I came across the notation 1 << 8. At first, I thought it was just a standard bitshift, till an explanation of the code told me it was fixed point notation, something I had not heard of till then.

I have tried to read up about fixed point notation, but am still confused about it. As far as I can tell, the code was using a 16 bit integer, where the first 8 bits were used for something completely different than the last 8, and thus 1 was shifted up 8 bits. But, I have no clue exactly what value this would be or how to find it given the number.

Borek · Jun 2, 2009

Perhaps that is just a way of creating integer with coded content, something like

('R' << 24) + ('I' << 16) + ('F' << 8) + 'F'

to put 'RIFF' in a 32 bit integer?

(Edit: I am almost sure I am wrong with the character order, but you should get the general idea).

D H · Jun 2, 2009

1<<8 is a standard bit shift. The intent of the bit shift varies with application. It might be a goofy way to represent 256, a way to set a specific element in a bitfield, a way indicate a process ended with a hang-up (SIGHUP) termination status, ...

or it might be a way to represent one in a fixed binary point notation in which the low order byte represents the fractional part of some number and the high order byte represents the integer part of the number.

chroot · Jun 2, 2009

Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 2¹⁵, 2¹⁴ ... 2¹, 2⁰ }

You're free to choose whatever representation you like, though. How about this one?

{ 2⁷, 2⁶ ... 2¹, 2⁰, 2^-1 ... 2^-6, 2^-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren

junglebeast · Jun 2, 2009

The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing

http://msdn.microsoft.com/en-us/library/ewwyfdbe(VS.71).aspx

rcgldr · Jun 2, 2009

junglebeast said:

The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing.

Except that you'd have to define math operators to use for those bit fields or do it manually, and it would be slower than a fixed point library (in assembler you could access double precision results from a pair of registers to combine them back into a fixed point for a return value, in C you could use a double precision integer if it was available).

Add and subtract don't require any post op shifts for fixed point math. Muitply will require a round and right shift. Divide requires a pre-op left shift of the divisor. Extended precision math may be needed for mutliply and divide. I've worked with peripherals firmware that had hardware fixed point math included, you write one number to port, the other number to another port, and then read the answer from a 3rd port (the hardware is either fast enough to do this, or else you use dummy port writes to give the hardware time to provide the results. Ready status reading usually isn't used because the hardware has a fixed speed and doesn't need a variable timing handshake).

lawtonfogle · Jun 3, 2009

chroot said:

Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 2¹⁵, 2¹⁴ ... 2¹, 2⁰ }

You're free to choose whatever representation you like, though. How about this one?

{ 2⁷, 2⁶ ... 2¹, 2⁰, 2^-1 ... 2^-6, 2^-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren

Ok, thanks.

I actually learned floating points in class (had to write an app which took in the binary representation of a floating point as a decimal digit number of only 1's and 0's (so we couldn't cheat) and convert it to a float), but I have since completely forgotten about them. But, since the NDS does not use them, I can 'not remember' till next semester begins.

C/C++ What Is Fixed Point Notation in C++?

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Sequential Analog Computers?

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem