D H said:
What are you simulating? A complex on-orbit scene (sun and backscattered illumination of multiple vehicles with moving solar arrays, semi-realistic Earth underneath) is incredibly expensive to generate. 10-15 Hz is a fast graphics rate for even a powerful computer dedicated to scene generation. The simulation itself runs at a much higher rate.
In "normal" graphics work 30fps is considered to be about the minimum that is acceptable, so I don't see how 10-15fps is fast. Maybe for a complex simulation it is fast, but it barely qualifies for the term "real-time". 60fps is often quoted as an ideal framerate for graphics applications.
Unless the grid is very fine, the errors in gridding will dominate unless the grid is extremely fine. Have you examined the errors that result from using this approach?
Yes, interpolating over a grid isn't as good as having the exact value but it is still a good optimisation in the sense that it removes the most expensive parts of the program from the render loop. Errors from using the grid will accumulate too you are correct, however with some good interpolation you can reduce the problems substantially (not totally of course).
Thought and analysis shows that most vector and matrix operations in an orbital simulation involve 3-vectors and 3x3 matrices. The most obvious optimization is to write inline functions or macros for 3x3 matrix operations with fully unrolled loop (i.e., compute each element explicitly). The pipelined architectures used by modern CPUs zip through such linear sequences of code quickly. Enable the compiler's optimizer will make this even faster.
Compare this to a set of generic matrix and vector hand-optimized in assembler. Function calls are very expensive. The stack needs to be loaded before the call and unloaded after the call. Moreover, the function call, the looping in the functions, and the return from the function stall the pipeline. The simple, non-assembly optimization will beat the pants off a set of assembler functions that use loops.
I'm sorry to disagree with this but I have written several programs where I need to do trivial operations with 4-vectors of 8-bit integers. Even for the simplest of cases like vector addition there is a noticable difference between a clean C++ implementation and an assembly routine using an MMX opcode. This is exactly the sort of operation that these opcodes were designed for in the first place. Ask anyone who has written blending code for software rendered graphics of some kind (even video players and such), MMX makes it substantially faster even for simple cases like additive or subtractive blending.
Where did I recommend loops in assembly? I think maybe you may have misinterpreted something that I said... I would also like to point out that your "most obvious optimisation" is how I would have written it in the first place... I don't consider that an optimisation, its just the obvious way to do it. Using a loop is obviously heavy handed, and more to the point, I never recommended it.
Also, what makes you think that I don't turn on the compiler's optimiser?
Function calls are very expensive
Function calls are very expensive? I used to have this same misconception, it took several discussions with a good friend of mine over the course of a year or so for me to shake it off.
Function calls are cheap and stack operations are relatively fast. A good compiler will inline a function call for one thing. Secondly there are __fastcall and similar non MS-specific specifiers which hint at the compiler to use the registers (although infact it only uses 2 because the compiler likes to use the others as it sees fit).
If we ignore these then we are left with the stack issues, which you can't avoid these by "letting the compiler optimise" or not using assembler... even a = b + c requires stack operations. Operators are after all just functions, and they use the same calling conventions as any other function (you can test this by manually examining the stack). The only way around that is to inline, which I (mistakingly) assumed was too obvious to bother mentioning explicitly. I just assume that people will inline or not in order to optimise their program, most compilers will even inline things for you where it is appropriate.
The fact is that in the case of a vector class with inline operators this all becomes a non-issue anyway, regardless as to whether or not function calls are slow, since there are no function call overheads and no stack operations for the parameters.
Maybe I jumped the gun a bit by suggesting asm code/libraries first, but at least it is one optimisation which you can apply from the beginning without too much thought, and it always makes the program faster... even if it is by some tiny amount.
Anyway... this has grown quite long and rantish... sorry if I seem stubborn, but I have been using these opcodes for quite a while and they do provide a substantial performance increase where you have to do lots of vector maths. If you don't believe me download the source code for any relatively recent 3d game, even a mod SDK will do since they often contain the math library. Media players are similar again, and I know that at least Winamp and WMP use MMX very heavily.
I will accept that perhaps my advice was bad though...