Fixed Point Notation Basics in C++

In summary, fixed-point notation is a way of representing numbers using a fixed decimal point, with the high order bits representing an integer and the low order bits representing fractions. It is a simpler system compared to floating-point but offers less efficiency. It is often used in hardware for fast fixed point math operations.
  • #1
lawtonfogle
160
0
Recently in C++ code I came across the notation 1 << 8. At first, I thought it was just a standard bitshift, till an explanation of the code told me it was fixed point notation, something I had not heard of till then.

I have tried to read up about fixed point notation, but am still confused about it. As far as I can tell, the code was using a 16 bit integer, where the first 8 bits were used for something completely different than the last 8, and thus 1 was shifted up 8 bits. But, I have no clue exactly what value this would be or how to find it given the number.
 
Technology news on Phys.org
  • #2
Perhaps that is just a way of creating integer with coded content, something like

('R' << 24) + ('I' << 16) + ('F' << 8) + 'F'

to put 'RIFF' in a 32 bit integer?

(Edit: I am almost sure I am wrong with the character order, but you should get the general idea).
 
  • #3
1<<8 is a standard bit shift. The intent of the bit shift varies with application. It might be a goofy way to represent 256, a way to set a specific element in a bitfield, a way indicate a process ended with a hang-up (SIGHUP) termination status, ...

or it might be a way to represent one in a fixed binary point notation in which the low order byte represents the fractional part of some number and the high order byte represents the integer part of the number.
 
  • #4
Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 215, 214 ... 21, 20 }

You're free to choose whatever representation you like, though. How about this one?

{ 27, 26 ... 21, 20, 2-1 ... 2-6, 2-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren
 
  • #6
junglebeast said:
The bit shift operator is useful, but many times it's more convenient to use bit fields for this type of thing.
Except that you'd have to define math operators to use for those bit fields or do it manually, and it would be slower than a fixed point library (in assembler you could access double precision results from a pair of registers to combine them back into a fixed point for a return value, in C you could use a double precision integer if it was available).

Add and subtract don't require any post op shifts for fixed point math. Muitply will require a round and right shift. Divide requires a pre-op left shift of the divisor. Extended precision math may be needed for mutliply and divide. I've worked with peripherals firmware that had hardware fixed point math included, you write one number to port, the other number to another port, and then read the answer from a 3rd port (the hardware is either fast enough to do this, or else you use dummy port writes to give the hardware time to provide the results. Ready status reading usually isn't used because the hardware has a fixed speed and doesn't need a variable timing handshake).
 
  • #7
chroot said:
Normally, one would interpret a 16-bit number as representing powers of two ranging from 15 to 0:

{ 215, 214 ... 21, 20 }

You're free to choose whatever representation you like, though. How about this one?

{ 27, 26 ... 21, 20, 2-1 ... 2-6, 2-8 }

In this representation, the high order 8 bits represent an integer in the normal way. The low order 8 bits, on the other hand, represent fractions: one-half, one-quarter, one-eighth, and so on.

In this representation, the bit string "0000 0001 0000 0000" represents one. The bit string "0000 0000 0000 0001" represents 1/256.

This system is called "fixed-point" because the "decimal point" that separates the integer and fractional parts of numbers is fixed at a specific location -- right in the middle in this case. It's easy to understand and to work with.

Other numbering systems, like floating-point, offer much better efficiency. They permit the "decimal point" to move around, so that very small numbers and very large numbers can be stored with less total space.

- Warren


Ok, thanks.

I actually learned floating points in class (had to write an app which took in the binary representation of a floating point as a decimal digit number of only 1's and 0's (so we couldn't cheat) and convert it to a float), but I have since completely forgotten about them. But, since the NDS does not use them, I can 'not remember' till next semester begins.
 

1. What is fixed point notation in C++?

Fixed point notation is a way of representing numbers in a computer program using a fixed number of decimal places. It is commonly used in financial applications or when dealing with precise measurements.

2. How is fixed point notation different from floating point notation?

In fixed point notation, the decimal places are fixed and do not change, whereas in floating point notation, the decimal places can vary. This means that fixed point notation is more precise and accurate, but may not be suitable for very large or very small numbers.

3. How do you declare a variable using fixed point notation in C++?

To declare a variable using fixed point notation in C++, you must specify the number of decimal places you want to use. For example, you could declare a variable as "fixed_point<2> num = 10.50;" to represent a number with 2 decimal places.

4. How do you perform calculations with fixed point numbers in C++?

To perform calculations with fixed point numbers in C++, you can use the "fixed_point" data type and its built-in operators, such as addition, subtraction, multiplication, and division. These operators will automatically handle the fixed point notation and ensure the correct number of decimal places in the result.

5. What are some benefits of using fixed point notation in C++?

Fixed point notation can provide more precise and accurate results compared to floating point notation, which can be important in certain applications, such as financial calculations. It also allows for easier control of the number of decimal places used and can be more efficient in terms of memory usage.

Similar threads

  • Programming and Computer Science
Replies
11
Views
2K
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
30
Views
4K
  • Programming and Computer Science
Replies
32
Views
1K
  • Programming and Computer Science
Replies
19
Views
2K
  • Programming and Computer Science
Replies
17
Views
2K
Replies
6
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Programming and Computer Science
Replies
1
Views
888
  • Programming and Computer Science
Replies
1
Views
898
Back
Top