As a first step to optimizing code, I like to think about faster ways to do things. Algorithmic improvements are far better than minute improvements to code, or even use of inline assembly, in general. But most optimizations beyond common subroutine elimination and its ilk substitute inexpensive operations for expensive ones, rather than eliminating chunks entirely. Now when two solutions present themselves I can just code up both and time them, but for the purpose of planning early on into the coding it's good to have an idea of what costs more. For example, addition and negation are cheap while square roots and other transcendentals are expensive. Toward that end, is there a decent list of approximate costs for operations (on some common x86 chip, perhaps a Pentium IV or Core II Duo or Athlon 64)? I'm looking for some kind of chart with figures like "division: 28 cycles (max throughput 5 cycles)". Since I'm just looking for general ballparks, I'm not too sensitive about the particular chip it applies to -- although general notes about what ops chips are good with would be great as well.