# 32 bits

1. Aug 3, 2010

### RedX

If you have a 32 bit processor, does that mean it is useless to have more than 4 GB of RAM, since you wouldn't be able to address those additional locations?

So with a 64 bit processor, technically you could buy as much RAM that fills 4 EB (exabytes)?

Why does it jump from 32 bits to 64 bits? Why not 32 to 33, or 32 to 40?

What's the purpose of having 32 bits as opposed to 16 bits? I read that most CPUs have only 150 opcodes (machine instructions such as add, jump, compare, read, write), so you only need $$log_2(150)=7.22$$ or 8 bits to do what most CPUs do.

Are these bits used to enhance the logic-arithmetic unit with more opcodes, like one that calculates functions like sin(x), instead of having to building sin(x) from a bunch of lines of assembly with lots of addition, multiplying and swapping out registers, you can do it all at once in one CPU cycle with built-in CPU sin(x) unit?

Also, in a computer's ROM is stored the BIOS, but is there also an assembly compiler from op code to machine code stored in the ROM?

Thanks.

2. Aug 3, 2010

### hamster143

You can't address more than 4 GB within a single program. The amount of memory you can have actually depends on the operating system: there are some intricacies in how the memory is doled out to individual processes. 32-bit versions of Windows don't support them (with some rare exceptions), and therefore they are limited to 4 GB or less, but you can use up to 64 GB in 32-bit Linux.

64 is a nice number, a power of 8, and that eliminates the need to upgrade the instruction set again when we discover that 40 is not enough, possibly in 10 years or less..

There may be only 150 different operations, but each individual operation can have hundreds of variations, depending on where you get the operands (registers? directly addressed memory? indirectly addressed memory? immediate values encoded into the opcode?)

Intel already has an instruction that calculates sin(x). it's 'fsin', machine code 0xD9 0xFE

Your higher-level language compiler generates the machine code directly.

Here's a nice fat list of machine-code instructions to look over

http://www.intel.com/Assets/PDF/manual/253666.pdf

3. Aug 3, 2010

### rcgldr

If I remember correctly, if there was an equivalent of the 16 bit "large" model in 32 bit mode, then you could get past 32GB, at least on some pentium cpus, using "far pointers", where the segment and 32 bit offset are combined to create a 36 bit address. I'm not aware of any operating system for an Intel cpu that supports this mode though.

4. Aug 3, 2010

### RedX

I have seen that 0x prefix before, but I don't know what it means.

If you have just D9FE, then that looks like a 16-bit instruction.
If it is 0D90FE, then that's 24-bits.

Why is it split up into two parts like that (0xD9 and the 0xFE)?

At the computer store I saw a single 8 GB laptop RAM stick for \$500, and was wondering what it could be used for, since 8 GB seemed like too much for 32 bit systems.

5. Aug 3, 2010

### hamster143

0x just means that it's a hex number. 0xD9 0xFE is a two-byte instruction. It's customary to prefix hex numbers with '0x'. This has roots in C (and most likely even deeper than that), where, if you write

Code (Text):

int x = 10;

x is assigned the value of 10, but if you write

Code (Text):

int x = 0x10;

it is assigned 16.

Last edited: Aug 3, 2010
6. Aug 4, 2010

### lostinxlation

Almost all the today's processors are based on byte address which means each address location is assigned every byte. To make the matter simple, it should be multiple of 8. if you want to have it 40bit, you can(IBM S/360 had 24bit address if I remember correctly), but logical step after 32 bit is 64bit just because it's large enough and much easier to implement due to the nature of power of 2. Just like why IPv6 got 128bit instead of 80bit.

For the data, first you can handle the large number easier. Assuming you have 4bit processor, it's limited to 0 to 15 or -8 to 7 or so and if you want to handle 20 with it, you have to offset it which requires extra speps to calculation, therefore you want to have 8 bits. The same for 16bit to 32bit or beyond.
Second, it allows larger address space.

For the instructions, having 16bit/32bit instructions rather than 8bit instruction has advantage. If you look at the instruction manual of any ISAs, you'll notice that they don't simply map the functions to the fixed bit field. They usually separate the field in differnt ways based on the type of operations and that gives you better organized architecture and making the instruction decoding easier and faster.

So, 8bit is too small for the modern architectures. Then, why 32bit over 16bit ?
Well, you have to have at least 2 source operands and 1 destination at the worst case if your ISA has register to register operations. Suppose you have 16 entries of registers, 12bits are taken for the addressing. if you are on 16bit instruction set, you are left only with 4bits to represent the functions for 3 register operation. Or when your branch instructions use immediate value offset from program counter, you need to have reasonably large immediate field to cover the wider area to the branch target(for example, if you want to branch to the location which is 1MB away from the current PC. you need 20bit of immediate field. If you don't have that much bit field, you can't go with PC-relative and have to use registers for the addressing which defers the branch target caculations and hit the performance.)

The extra bits can be used for new functions. That's how the instruction set is extended.
Frankly, you don't want to implenent sin(x) by hardware, though. It's not a simple binary operation and it requires a lot of iteration in the execution unit and not possible to do it one cycle(Even getting the 1st 2 terms in Taylor series will take a few cycle considering the fact that you need to calculate the x^3). Just to implement such a rarely used function by hardware isn't a good idea. If sin(x) instruction exists, I bet it utilizes the ALU back to back and it doesn't make much difference from issuing multiple instructions as an instruction emulation because nowadays CISC processors are actually breaking the native instructions into many pieces of micro instructions. When it comes to RISC processor, having complicated native instructions obviously go against RISC philosophy.

Last edited: Aug 5, 2010
7. Aug 4, 2010

### mgb_phys

It doesn't - 32bit Intel CPUs since the 386 actually have 36bit address buses - supporting 65Gb of RAM.
The 4Gb limit (actually 2Gb user space, 1Gb OS and 1Gb memory mapped) is purely a Windows marketing limit - if you wan more buy the 64bit or server version.
Linux supports 64Gb system address space, with 4Gb per process

You can process multiple instructions at the same time, or have more arguements (eg a src and dest address) together with the instruction opcode in one address.

8. Aug 6, 2010

### RedX

I guess doubling the number of bits is nice, so 4, 8, 16, 32, 64.... if that's what you mean by the nature of the power of 2.

IPv4 is 32 bits, and you want powers of 8 for whole bytes or hexadecimal numbers, so 128 bits makes sense. However, 64 bits will give you 2^32 times more addresses than we currently have, so why is IPv6 not 64 bits instead of 128? 64 seems to be way more than enough to give every device a unique IP instead of using dynamically overloaded NAT.
This seems unnecessary, as an assembler should take care of that.
That makes sense. I suppose you can only work with 2 registers (after all, things like multiplication and addition are binary operations) and have 8 bits for a total of 256 different 2-register operations. I'm a bit surprised that 16 different 3-register operations is not enough.
Well, according to hamster143, it exists on the 8086 as fsin (which I assume means function: sine)
What's wrong with complicated native instructions? Why shift the burden on the compiler maker? Giving the compiler maker more tools seems like a good thing.
Why would you have a 36bit address bus on a 32bit CPU, and wouldn't that result in 64 GB of RAM not 65? It seems that only some parts of the CPU would need to be 36 bits to support 65 GB of RAM, such as the stack pointer and instruction decoder. The arithmetic-logic unit wouldn't need to be 36 bits since they don't operate on addresses but what's in the address.

9. Aug 6, 2010

### mgb_phys

Sorry typo - should be 64Gb

Some of the address bus (and memory controller) needed to be 36bit, it's actually implemented (on 32bit intel chips) as a bank switching scheme, so you have 32bit memory hardware and an extra register somewhere that says which of the extra 4bits of address banks you are in. That's why it's generally still 32bit per process, it would hurt performance if an image crossed a bank boundary
36 was a compromise between 32bits (4Gb is a bit limiting) and 64bit which is a waste of addresses since nobody is about to fit 16Exabytes of RAM in their machine.

10. Aug 6, 2010

### lostinxlation

Yes. It's always good to process in 2^x data size.
If you design the 40bit processor, for example, you'll find the hardware will be complicated.
Suppose you design the cache memory for a 40bit processor.. What cache line size to pick ?
Due to the tag memory structure, the line size of the cache data must be 1, 2, 4, 8, 16...2^x byte.
40bit is 5byte or 5 address locations. If you have N data per cache line, the cache line size would be 5xN byte(address locations), which never matches up 2^x byte that is required for set associative cache or direct mapping cache. It means you have to go with full-associative cache(slow) or have some extra logic to determine availability of data in a given cache line.

As long as the computer is based on binary, it's best to have a power of 2 for data size.

64b may look enough, but we can't foresee the future. They decided to go with 128b to make sure that the same IP shortage won't happen again even with unexpected sudden explosion of IP address space in the future.

No.
Instruction set is mostly for the designer. Having well organised instruction set makes the instruction decoder simple.
I have designed a few SPARC processors for UNIX servers before, and even with neatly organized instruction set, the instruction decoder was such a mess. Implementing the instruction set that randomly maps the functions to whatever available bit pattern will be a design nightmare and potentially a nesting ground of bugs.

You can program with 2 register operations only, but being able to use 3 registers gives you more flexibility for programming. It's a trade off between the convenience and simplicity. Some architectures have 3 reg operations, some don't.
I have a SPARC v9 architecture manual here, which says there are 16 types of integer load instructions (signed/unsigned, byte/half word/word/double word, from alternate space or not) that use 3 registers(2 for address calculation, 1 for a destination). Besides that, there are interger stores, floating point loads and floating point stores as well, which allow 3 register operation. Total would be around 50 different types(and this is only for mem access instruction. There are other types of instruction with 3 registers as well)

Simple instructions make the hardware simple and can make the clock speed faster. That's how RISC concept was started.
Nowadays, highend CISC processors are internally RISC. They break the native instructions to several of simple instructions to make the hardware simple and fast.

Last edited: Aug 6, 2010
11. Aug 6, 2010

### hamster143

x86 chips come with a built-in floating point coprocessor that can do all sorts of functions. Those functions are not particularly fast (up to 100 clock cycles per sine) but it's convenient to have them in there as opposed to coding them manually.

12. Aug 6, 2010

### rcgldr

> cpus with floating point instructions

These cpus normally use tables to get the first term of a series to speed things up. A few flaws in a table used for divide one one of the early Pentium Pros was a somewhat famous bug in that cpu.

13. Aug 7, 2010

### Staff: Mentor

Just to set the record straight, fsin and the other floating point instructions did not exist on the 8086, 8088, 80186, 80286, 80386, and on the crippled version of the 486 (the 486SX). For those processors, if you wanted to be able to do floating point math in assembly, you had to add a floating point unit (FPU), such as the Intel 8087, 80287, 80387, or 487, or an FPU by Cyrix or some other FPU vendor.

Starting with the Pentium processor, all Pentium family have floating point units built right into the processor, and these units can serve double duty as (MultiMedia eXtensions?) MMX processors allowing parallel computation for graphics and audio processing.

14. Aug 7, 2010

### RedX

Just to be clear, it was my mistake saying fsin was on the 8086: hamster143 never said that.