SixNein said:
The simple answer is no.
The long answer is that languages all get reduced down eventually into machine code. The difference in speed between languages more or less depends upon how many layers these languages have to go through in order to accomplish that task and how often they have to do it. There is also a factor in how good a compiler is at creating optimal machine code, but layering would probably dwarf that for the most part.
A much more useful study is what algorithm gives you the lowest complexity and what dependencies it has. A good example is sorting algorithms. When would you use a hash sort vs a merge sort vs a selection sort? This will provide you much more useful information than a compassion between languages.
I had a bit of fun with this, and it illustrates SixNein's points completely. First, I chose--as always when I have a choice--to write in Ada. Why use Ada? Aren't those compilers for embedded systems and very expensive? Some are, but there is an Ada compiler, GNAT, built into gcc. The advantage over using C with gcc is that the GNAT toolset manages the linking so you don't have to create a makefile. The code you get for the same algorithm in Ada and C should run at exactly the same speed. Of course, if you make significant use of Ada generics or tasking, lots of luck generating the same code in C. But that doesn't happen here. I'll put the complete code in the next post, so you can compile and run it on your system if you like.
I wanted to try writing the key loop two ways in Ada. That became three, and then four, and needed details on the real-time clock Ada was providing, and the resolution of the time types provided with it, to understand the results. I broke the second example so compilations wouldn't last for hours, while I was debugging my code. Why did I have to break the second case? Ada rules say that 0.9999999999**100000000 is a numeric literal, where ** is exponentiation, and it has to be computed exactly. ;-) The problem isn't evaluating the expression, it is keeping all those decimals around while it does so. The compiler (gcc) runs for around four hours, then hits Storage_Error when the number is too big to fit in 2 Gigabytes. (Yes, I have much more memory than that on my system, but the limit is on the bignum type.) Anyway, I broke the numeric expression up differently than in the third case, the whole program compiles in under a minute now.
The next issue is why I have a lot of large multipliers in the timing code, and why I added the output at the head. Found that the compiler was using 80-bit arithmetic for Long_Long_Float and storing it in three 32-bit words. Interesting. But let me show you the full output:
Sanity Checks.
Long_Long_Float'Size is 96 bits.
Duration'Small is 0.0010 Microseconds.
Real_Time.Tick is 0.2910 Microseconds.
Real_Time.Time_Unit is 0.0010 Microseconds.
Multiplication result was 0.045399907063 and took 147318.304 Microseconds.
Exponentiation result was 0.045399907063 and took 0.291 Microseconds.
Exponentiation 2 result was 0.045399907062 and took 0.583 Microseconds.
Fast exponentiation result was 0.045399907062 and took 0.875 Microseconds.
I gave up on trying to get that monospaced. \mathtt gets a typewriter font in LaTex, but then I have to redo the spacing and line breaks explicitly. Not worth the trouble. Now to discuss the results. That first time looks huge, but it is in microseconds. It is actually 0.1473... seconds. That may be the fastest time so far reported here, but if I ran the C code I should get close to the same thing. But the thing you should do if your program is too slow is not to look for tweaks here and there, but to use a better algorithm. I understand that this program was intended as a benchmark, but these results one, two, and three clock ticks respectively, and for a fast real-time clock, indicate that there is some magic going on under the hood. When I wrote the second case (exponentiation) I realized that the compiler was going to try to do everything at compile time, and it did. But Ada rules say that numeric literals evaluated at compile time must be evaluated exactly. Breaking the expression up this way (X := 1000.0; Y := 0.999_999_9**10_000; Y := Y**10_000; X := X*Y; ) took maybe thirty seconds of grinding at compile time, then stuffed the first 64-bits plus exponent into Y, and raised that to the 10,000th power at run-time. But how did it do that last step so quickly? Probably by calling the built-in power function in the chip.
We can guess that exponentiation 2 used the same trick, but calling the power function with an exponent of 100,000,000 instead of 10,000 apparently used another clock tick. (Incidentally, if the clock function is called twice during a tick it adds the smallest increment here, one nanosecond, to the value returned. Twice that for the third call, and so on. This means that you always get a unique value for the clock call. With a six core processor, and this code running on just one core, this can add a few nanoseconds to the value which should be ignored. It can also subtract nanoseconds if the starting call to clock is not the first call in that interval.)
Finally, the third approach can't be short-circuited by trig or exponential functions. It computes 0.9999999 times itself 100,000,000 times. That code would work even if both values were entered from the keyboard when the program was already running, and it did the calculation 168 thousand times faster.
So:
1. Use a language which makes the structure of the problem visible.
2. Use that to find a better algorithm, if needed.