What Is Fixed Point Notation in C++?

lawtonfogle · Jun 2, 2009

Recently in C++ code I came across the notation 1 << 8. At first, I thought it was just a standard bitshift, till an explanation of the code told me it was fixed point notation, something I had not heard of till then.

I have tried to read up about fixed point notation, but am still confused about it. As far as I can tell, the code was using a 16 bit integer, where the first 8 bits were used for something completely different than the last 8, and thus 1 was shifted up 8 bits. But, I have no clue exactly what value this would be or how to find it given the number.

Borek · Jun 2, 2009

Perhaps that is just a way of creating integer with coded content, something like

('R' << 24) + ('I' << 16) + ('F' << 8) + 'F'

to put 'RIFF' in a 32 bit integer?

(Edit: I am almost sure I am wrong with the character order, but you should get the general idea).

D H · Jun 2, 2009

1<<8 is a standard bit shift. The intent of the bit shift varies with application. It might be a goofy way to represent 256, a way to set a specific element in a bitfield, a way indicate a process ended with a hang-up (SIGHUP) termination status, ...

or it might be a way to represent one in a fixed binary point notation in which the low order byte represents the fractional part of some number and the high order byte represents the integer part of the number.

chroot · Jun 2, 2009

Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 2¹⁵, 2¹⁴ ... 2¹, 2⁰ }

You're free to choose whatever representation you like, though. How about this one?

{ 2⁷, 2⁶ ... 2¹, 2⁰, 2^-1 ... 2^-6, 2^-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren

junglebeast · Jun 2, 2009

The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing

http://msdn.microsoft.com/en-us/library/ewwyfdbe(VS.71).aspx

rcgldr · Jun 2, 2009

junglebeast said:

The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing.

Except that you'd have to define math operators to use for those bit fields or do it manually, and it would be slower than a fixed point library (in assembler you could access double precision results from a pair of registers to combine them back into a fixed point for a return value, in C you could use a double precision integer if it was available).

Add and subtract don't require any post op shifts for fixed point math. Muitply will require a round and right shift. Divide requires a pre-op left shift of the divisor. Extended precision math may be needed for mutliply and divide. I've worked with peripherals firmware that had hardware fixed point math included, you write one number to port, the other number to another port, and then read the answer from a 3rd port (the hardware is either fast enough to do this, or else you use dummy port writes to give the hardware time to provide the results. Ready status reading usually isn't used because the hardware has a fixed speed and doesn't need a variable timing handshake).

lawtonfogle · Jun 3, 2009

chroot said:

Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 2¹⁵, 2¹⁴ ... 2¹, 2⁰ }

You're free to choose whatever representation you like, though. How about this one?

{ 2⁷, 2⁶ ... 2¹, 2⁰, 2^-1 ... 2^-6, 2^-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren

Ok, thanks.

I actually learned floating points in class (had to write an app which took in the binary representation of a floating point as a decimal digit number of only 1's and 0's (so we couldn't cheat) and convert it to a float), but I have since completely forgotten about them. But, since the NDS does not use them, I can 'not remember' till next semester begins.

What Is Fixed Point Notation in C++?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect