What Is Fixed Point Notation in C++?

Click For Summary

Discussion Overview

The discussion revolves around the concept of fixed point notation in C++, particularly in the context of bit shifting and integer representation. Participants explore how fixed point notation can be implemented using bit manipulation, and they share their understanding of how integer values can encode both integer and fractional parts.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant notes confusion regarding fixed point notation and describes a specific example of using a bit shift (1 << 8) in a 16-bit integer context.
  • Another participant suggests that bit shifting can be used to create integers with encoded content, providing an example with character representation.
  • A different participant asserts that 1 << 8 is a standard bit shift, but its intent can vary based on application, including representing fixed binary point notation.
  • One participant explains a representation of a 16-bit number where the high order 8 bits represent integers and the low order 8 bits represent fractions, describing how this relates to fixed-point notation.
  • Another participant discusses the convenience of using bit fields over bit shifts, while also mentioning the need for defined math operators for bit fields and the potential speed advantages of fixed point libraries.
  • Further elaboration on fixed point math is provided, including details on operations like addition, subtraction, multiplication, and division, and how they differ from floating-point operations.

Areas of Agreement / Disagreement

Participants express a range of views on fixed point notation and its implementation, with no clear consensus on the best approach or understanding of the topic. Some participants agree on the basic principles of fixed point representation, while others present differing opinions on its application and efficiency.

Contextual Notes

There are limitations in the discussion regarding assumptions about the representation of numbers and the specific applications of fixed point notation. The discussion also reflects varying levels of familiarity with floating-point representation, which some participants mention but do not fully explore.

lawtonfogle
Messages
159
Reaction score
0
Recently in C++ code I came across the notation 1 << 8. At first, I thought it was just a standard bitshift, till an explanation of the code told me it was fixed point notation, something I had not heard of till then.

I have tried to read up about fixed point notation, but am still confused about it. As far as I can tell, the code was using a 16 bit integer, where the first 8 bits were used for something completely different than the last 8, and thus 1 was shifted up 8 bits. But, I have no clue exactly what value this would be or how to find it given the number.
 
Technology news on Phys.org
Perhaps that is just a way of creating integer with coded content, something like

('R' << 24) + ('I' << 16) + ('F' << 8) + 'F'

to put 'RIFF' in a 32 bit integer?

(Edit: I am almost sure I am wrong with the character order, but you should get the general idea).
 
1<<8 is a standard bit shift. The intent of the bit shift varies with application. It might be a goofy way to represent 256, a way to set a specific element in a bitfield, a way indicate a process ended with a hang-up (SIGHUP) termination status, ...

or it might be a way to represent one in a fixed binary point notation in which the low order byte represents the fractional part of some number and the high order byte represents the integer part of the number.
 
Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 215, 214 ... 21, 20 }

You're free to choose whatever representation you like, though. How about this one?

{ 27, 26 ... 21, 20, 2-1 ... 2-6, 2-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren
 
junglebeast said:
The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing.
Except that you'd have to define math operators to use for those bit fields or do it manually, and it would be slower than a fixed point library (in assembler you could access double precision results from a pair of registers to combine them back into a fixed point for a return value, in C you could use a double precision integer if it was available).

Add and subtract don't require any post op shifts for fixed point math. Muitply will require a round and right shift. Divide requires a pre-op left shift of the divisor. Extended precision math may be needed for mutliply and divide. I've worked with peripherals firmware that had hardware fixed point math included, you write one number to port, the other number to another port, and then read the answer from a 3rd port (the hardware is either fast enough to do this, or else you use dummy port writes to give the hardware time to provide the results. Ready status reading usually isn't used because the hardware has a fixed speed and doesn't need a variable timing handshake).
 
chroot said:
Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 215, 214 ... 21, 20 }

You're free to choose whatever representation you like, though. How about this one?

{ 27, 26 ... 21, 20, 2-1 ... 2-6, 2-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren


Ok, thanks.

I actually learned floating points in class (had to write an app which took in the binary representation of a floating point as a decimal digit number of only 1's and 0's (so we couldn't cheat) and convert it to a float), but I have since completely forgotten about them. But, since the NDS does not use them, I can 'not remember' till next semester begins.
 

Similar threads

  • · Replies 19 ·
Replies
19
Views
6K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
11
Views
4K
  • · Replies 23 ·
Replies
23
Views
3K
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
7
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
7K
  • · Replies 17 ·
Replies
17
Views
2K