What are the approximate costs for operations on common x86 chips?

In summary, the approximate costs for operations vary depending on the processor and the addressing mode being used. However, there are some general ballpark ranges that can be found on the Intel website.
  • #1
CRGreathouse
Science Advisor
Homework Helper
2,844
0
As a first step to optimizing code, I like to think about faster ways to do things. Algorithmic improvements are far better than minute improvements to code, or even use of inline assembly, in general. But most optimizations beyond common subroutine elimination and its ilk substitute inexpensive operations for expensive ones, rather than eliminating chunks entirely. Now when two solutions present themselves I can just code up both and time them, but for the purpose of planning early on into the coding it's good to have an idea of what costs more. For example, addition and negation are cheap while square roots and other transcendentals are expensive.

Toward that end, is there a decent list of approximate costs for operations (on some common x86 chip, perhaps a Pentium IV or Core II Duo or Athlon 64)? I'm looking for some kind of chart with figures like "division: 28 cycles (max throughput 5 cycles)". Since I'm just looking for general ballparks, I'm not too sensitive about the particular chip it applies to -- although general notes about what ops chips are good with would be great as well.
 
Computer science news on Phys.org
  • #2
This is not exactly what you are looking for, but it's pretty amazing. I checked wikipedia.org for computer benchmark info, and got a big page with lots of info and outside references:

http://en.wikipedia.org/wiki/Benchmark_(computing)

I know you want to compare different algorithms on the same processor, and that is different than benchmarking different processors against the same algorithm, but maybe poke around some of the links to see if you can find what you want, or see other terms to search on. Pretty interesting links that I followed...
 
  • #3
For older processors this was fairly easy.
Look up the instruction in the programmer's reference manual.
Pick the cycle count for the addresing mode being used.

For the proceesors you list this gets very complex with prefetch and what not.

This should be the page you want for Intel processors.

http://www.intel.com/products/processor/manuals/index.htm

and at least some cycle counting info appears here

http://www.intel.com/design/processor/manuals/248966.pdf

You will have to search for other manufactures
 
Last edited by a moderator:
  • #4
Thanks, that looks like it just might serve.
 

1. What are clock cycles per instruction?

Clock cycles per instruction (CPI) is a metric used to measure the efficiency of a computer's processor. It represents the average number of clock cycles required to execute a single instruction.

2. How is CPI calculated?

CPI is calculated by dividing the total number of clock cycles by the total number of instructions executed. For example, if a program takes 100 clock cycles to execute 50 instructions, the CPI would be 2.

3. Why is CPI important?

CPI is important because it can impact the overall performance of a computer. A lower CPI means that instructions are executed more efficiently, resulting in faster performance. A higher CPI may indicate inefficiencies in the processor's design or architecture.

4. What factors can affect CPI?

There are several factors that can affect CPI, including the processor's design, clock speed, cache size, and the type of instructions being executed. Other factors such as the complexity of the program and the efficiency of the compiler can also impact CPI.

5. How can CPI be improved?

CPI can be improved by optimizing the processor's design and architecture, increasing the clock speed, and utilizing larger caches. Additionally, optimizing the code and using more efficient algorithms can also help reduce CPI.

Back
Top