# Assembly language programming vs Other programming languages

• C/++/#
phinds
Gold Member
2019 Award
I'm going to stay out of this. But there are definition issues. Clear debate requires that every use the same terms with exactly the same meaning. Can we agree on meanings?
Good idea. I've been too stuck on details, I think.

I understand what everyone is saying about the difference between machine code and assembly language, but let's do all stick to the standard terminology and agree that assembly is not a higher level language and in fact is disallowed from BEING a high level language by definition and common usage of the phrase.

The original question as about the speed difference between assembly language (~ machine code) vs high level languages. Are there still open questions about that?

TMT
The original question as about the speed difference between assembly language (~ machine code) vs high level languages. Are there still open questions about that?
Most people think assembly is speedy than high level languages. But I'm telling if high level language is configured more intelligently, it may be faster than assembler. Speed is majorly dependent on algorithms. if you implement an algorithm time cost as W(n) in assembly code (supposed as faster) a high level language (basic or any interpreted) implement algorithm time cost as W(log n) your assembly programs will breath the dust of the high level language implementation.
Also the intelligence embedded in high level language may generate speeder code. since embedded algorithm may create more optimized code than human can create.

ChrisVer
Mark44
Mentor
Most people think assembly is speedy than high level languages. But I'm telling if high level language is configured more intelligently, it may be faster than assembler. Speed is majorly dependent on algorithms.
Well, of course. No one in this thread is claiming that a slow algorithm coded in assembly will run faster than a faster algorithm coded in a higher-level language.
TMT said:
if you implement an algorithm time cost as W(n) in assembly code (supposed as faster) a high level language (basic or any interpreted) implement algorithm time cost as W(log n) your assembly programs will breath the dust of the high level language implementation.
Also the intelligence embedded in high level language may generate speeder code. since embedded algorithm may create more optimized code than human can create.

phinds
Gold Member
2019 Award
Most people think assembly is speedy than high level languages. But I'm telling if high level language is configured more intelligently, it may be faster than assembler. Speed is majorly dependent on algorithms.
As Mark pointed out, you have set up a dummy straw man argument and then shown that it's wrong. Your point is valid but really is irrelevant to this thread.

TMT
As Mark pointed out, you have set up a dummy straw man argument and then shown that it's wrong. Your point is valid but really is irrelevant to this thread.
Think on a case a machine having 16 register (as IBM), In program; you branch into a subroutine, you will going to save registers before starting subroutine and restore them before return back to caller Tel me how many programmer could count and mark which registers are altered in subroutine and write code to save and restore only those register. (register saving & restoring time cost is depend on # of register involved) But if your H_L compiler has an intelligence to consider this code will save & restore only altered registers process and optimize code accordingly. Please take this simple example as only to express my intention Since we can embed some logic in compiler we can let compiler will generate more optimal code than human can. (especially if you accept all programmer will not be smart as a versed one) in H_L compiler you even preprocess written code and localize some part as optimizable and apply specific process to optimize code generation. You can not train all your staff as versed assembler programmer. But you can use a high quality (optimization intelligence embedded) H_L language compiler to produce faster code.

Mark44
Mentor
Think on a case a machine having 16 register (as IBM), In program; you branch into a subroutine, you will going to save registers before starting subroutine and restore them before return back to caller Tel me how many programmer could count and mark which registers are altered in subroutine and write code to save and restore only those register. (register saving & restoring time cost is depend on # of register involved) But if your H_L compiler has an intelligence to consider this code will save & restore only altered registers process and optimize code accordingly. Please take this simple example as only to express my intention Since we can embed some logic in compiler we can let compiler will generate more optimal code than human can. (especially if you accept all programmer will not be smart as a versed one) in H_L compiler you even preprocess written code and localize some part as optimizable and apply specific process to optimize code generation. You can not train all your staff as versed assembler programmer. But you can use a high quality (optimization intelligence embedded) H_L language compiler to produce faster code.
Yes, of course to this as well. It's not a surprise that a dim-witted assembly programmer would likely produce slower code than code written in a high-level language and using a compiler that produces highly optimized object code. No surprise here.

phinds
Gold Member
2019 Award
Think on a case a machine having 16 register (as IBM) ...
Once again Mark has beat me to it in responding and again, I agree w/ him. You keep making valid points that are off the track of the thrust of this thread. I think we're going to have to just agree to disagree on this one.

jim mcnamara
rcgldr
Homework Helper
Most people think assembly is speedy than high level languages. But I'm telling if high level language is configured more intelligently, it may be faster than assembler.
In the case where compilers can produce assembly code, the assembly code would be the same as the high level language. In some cases, assembly programmers will have a compiler produce assembly code to look for "clever" code, typically when working with a processor new to the programmer. As posted by others here, the point of the thread isn't about intelligent compilers versus dumb assembly programmers.

See post #21 for example cases where assembly code is still being used today:

post #21

Last edited:
Gold Member
Is it possible to know whether your program (written in a high-level language) is slow due to the compiler's reinterpretation of your input (i.e. the assembler output could be improved by human intelligence) or if you screwed up with the algorithm and you have to look for a better one?

For example the code below could be improved in speed I guess by doing the multiplication in the end:
Code:
int sum = 0.0;
for(int i=1; i<=10; i++){
sum+= 2*i;
}
improved:
Code:
int sum = 0.0;
for(int i=0; i<10; i++){
sum+=i;
}
sum *= 2;
or even better use the bitwise operations (sum<<1 ??)... at which point any delay would reach the innefficiency of the compiler vs human input?

phinds
Gold Member
2019 Award
With the exception of fairly specialized circumstances, it's unlikely that a compiler will generate code that could be really significantly improved by twiddling the machine code but it's very easy to code an algorithm in a way that is very inefficient. Just as a trivial example, you could use a bubble sort instead of a quick sort.

ChrisVer
It's rarely compiler's fault.
A colleague of mine loves to build long strings one character at a time, e.g.
C:
std::string toBeLong="";
for (int i=0; i<300; i++)
toBeLong+=getc();
Depending on implementation, this could mean 300 reallocs are performed, plus the memory gets fragmented. I would allocate some memory in advance, but it's a few lines longer, and he argues that shorter code is better for maintenance.

He also uses a HashTable where I would use simple array with a lookup function - again, more coding on my side.

He uses Exceptions a lot, I catch them at first possible place and try not to raise any.

It is choices like this that you face most of the time. You'll use the best algorithm anyway, so not much space for improvement here.
And rewriting a piece of code into another language is pretty rare, kind of a "desperate measure", and often not even possible (JavaScript & co.)

Knowing assembly can help you estimate how fast/slow a piece of code will be, even if you never write assembly code.

ChrisVer
Gold Member
Well moving for example the multiplication of 2 outside the for loop is both an algorithmic process as well as an improvement to the output assembler... I tried this simple example I gave above and the contents of the for loop in the two cases are:
Code:
   0:   55                      push   %rbp
1:   48 89 e5                mov    %rsp,%rbp
4:   48 83 ec 10             sub    $0x10,%rsp 8: c7 45 fc 00 00 00 00 movl$0x0,-0x4(%rbp)
f:   c7 45 f8 01 00 00 00    movl   $0x1,-0x8(%rbp) 16: eb 0c jmp 24 <main+0x24> 18: 8b 45 f8 mov -0x8(%rbp),%eax 1b: 01 c0 add %eax,%eax <<<<<<<<<<<<<<<<<<<< 1d: 01 45 fc add %eax,-0x4(%rbp) 20: 83 45 f8 01 addl$0x1,-0x8(%rbp)
24:   83 7d f8 0a             cmpl   $0xa,-0x8(%rbp) 28: 7e ee jle 18 <main+0x18> Improved: Code:  0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub$0x10,%rsp
8:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp) f: c7 45 f8 01 00 00 00 movl$0x1,-0x8(%rbp)
16:   eb 0a                   jmp    22 <main+0x22>
18:   8b 45 f8                mov    -0x8(%rbp),%eax
1b:   01 45 fc                add    %eax,-0x4(%rbp)
1e:   83 45 f8 01             addl   $0x1,-0x8(%rbp) 22: 83 7d f8 0a cmpl$0xa,-0x8(%rbp)
26:   7e f0                   jle    18 <main+0x18>
I multiarrowed the line that is removed (which is doubling what was in the eax)
and then I may say that the first jump to 22 is unnecessary, which I think is a feature of the for loop - to make sure it should be entered... but I trust it will enter since I manually wrote for i=1 and not i=11 or 40 or ...
Also one interesting thing is that when the multiplication is done within the for loop, the assembler code does the doubling by add ... when it happens outside the for loop it does it with shifting left (and so *=2 and <<=2 in my case are equivalent).

Last edited:
What compiler flags (and compiler) did you use?

Gold Member
What compiler flags (and compiler) did you use?
hmm... so I wrote a program in AStester.cxx
then used gcc -c AStester.cxx to produce the .o file (so I think no flags?)
and looked in the contents by objdump -D AStester.o

Mark44
Mentor
Is it possible to know whether your program (written in a high-level language) is slow due to the compiler's reinterpretation of your input (i.e. the assembler output could be improved by human intelligence) or if you screwed up with the algorithm and you have to look for a better one?

For example the code below could be improved in speed I guess by doing the multiplication in the end:
Code:
int sum = 0.0;
for(int i=1; i<=10; i++){
sum+= 2*i;
}
improved:
Code:
int sum = 0.0;
for(int i=0; i<10; i++){
sum+=i;
}
sum *= 2;
or even better use the bitwise operations (sum<<1 ??)... at which point any delay would reach the innefficiency of the compiler vs human input?
The second example is quite a bit better, as it takes the multiplication out of the loop. Multiplication is a lot more expensive than addition in terms of processor time, although an optimizing compiler would probably replace 2 * i with a shift.

The second example is quite a bit better, as it takes the multiplication out of the loop. Multiplication is a lot more expensive than addition in terms of processor time, although an optimizing compiler would probably replace 2 * i with a shift.
Actually only the weakest CPUs have slow multiplication. Pretty much any ARM or Pentium and above only take 1 cycle per nonzero bit (or even less), that is, multiplying by 1001001 (binary) only takes 3 cycles.
And, pretty much any compiler does indeed replace multiplication by powers of 2 with shifts.

Gold Member
Actually only the weakest CPUs have slow multiplication. Pretty much any ARM or Pentium and above only take 1 cycle per nonzero bit (or even less), that is, multiplying by 1001001 (binary) only takes 3 cycles.
And, pretty much any compiler does indeed replace multiplication by powers of 2 with shifts
Well when I tested the above 2 code snippnets and compared times the "slow" one took ~120,000 and the "optimized" one took ~42,000 (tests : 10M iterations)... One extra interesting part was that I also tried the bitwise operation instead of the multiplication, and for up to 10M iterations, the bit-shifting operation was faster, above 10M the two became comparable (couldn't tell the difference).

Well when I tested the above 2 code snippnets and compared times the "slow" one took ~120,000 and the "optimized" one took ~42,000 (tests : 10M iterations)...
It's interesting that shortening a cycle from 6 to 5 instructions (and taking out the fastest one), the time was cut to 1/3.
This tells me
1) Speed of short loops depends on things like the actual address where it lands,
2) Too much optimization is a waste of time because you can't predict that
You can try -O3 flag or Microsoft or Intel compiler, optimized for speed. They include NOPs at various places to fix the alignment; then the results might be more comparable.

jim hardy
Gold Member
2019 Award
Dearly Missed
Think on a case a machine having 16 register (as IBM), In program; you branch into a subroutine, you will going to save registers before starting subroutine and restore them before return back to caller

Reference https://www.physicsforums.com/threa...gramming-languages.912679/page-6#post-5772249
Smart hardware guys can help too.
There's the venerable TMS9900
where program counter defines the start of your stack
and can be any location in memory
so you can context switch with just one save.....

hmm... so I wrote a program in AStester.cxx
then used gcc -c AStester.cxx to produce the .o file (so I think no flags?)
and looked in the contents by objdump -D AStester.o
Then the code is likely entirely unoptimized. An optimizing compiler would probably remove the code entirely if sum is not used elsewhere later (dead code elimination) or replace your code with sum = 110 (constant folding)

Edit: Note that is is considered bad practice to initialize an int with a double value.

Gold Member
Edit: Note that is is considered bad practice to initialize an int with a double value.
hmm yeh, I had started it double, but then it complained with the usage of << . I changed it to integer but I forgot to remove the .0 ...

Hm, so that means the assembler code would be something like:
movl \$0x6e , -0x4 (%rbp)
(sum is registered in 0x4 and moves 110 in it)
without a for loop?
Does that happen because the compiler runs the code and gets the result before producing the output?

without a for loop?
Does that happen because the compiler runs the code and gets the result before producing the output?
It doesn't really run the code, but it optimizes it.

int chrisver()

{
int sum = 0;
for(int i=1; i<=10; i++)

{
sum+= 2*i;
}
return sum;
0FB94090 mov eax,6Eh
}
0FB94095 ret

Does that happen because the compiler runs the code and gets the result before producing the output?
Since it doesn't really run the code it is usually said that the compiler evaluates the code at compile time. If you are interested in these things the compiler optimization section on wikipedia is a pretty good place to start. For instance constant folding is here: https://en.wikipedia.org/wiki/Constant_folding

Added: If you want to see what the compiler does with your code in a real world scenario you need to make sure the number of times the for-loop is repeated is not known at compile time, e.g. depends on user keyboard input, data from a file or similar.

Last edited: