Assembly language programming vs Other programming languages

ChrisVer · Apr 26, 2017

I was having a morning look into assembly language videos and stuff and I was wondering:
how much faster is assembly compared to some higher-level programming languages (like C or C++ or python)? And also why is it faster than those languages?
I mean a general C++ source code should be translated into machine code by the compiler right? So the actual performance of the C++ code would be the same as that of assembly's, even though assembly's just above the machine language.
PS : One point I found is the way a compiler can add unnecessary (compiler-dependant) stuff in the code, while writing immediately in Assembly you can emit them.
Thanks (specially to @phinds whose insight motivated me to look up for that language )

phinds · Apr 26, 2017

ChrisVer said:

I was having a morning look into assembly language videos and stuff and I was wondering:
how much faster is assembly compared to some higher-level programming languages (like C or C++ or python)? And also why is it faster than those languages?
I mean a general C++ source code should be translated into machine code by the compiler right? So the actual performance of the C++ code would be the same as that of assembly's, even though assembly's just above the machine language.
PS : One point I found is the way a compiler can add unnecessary (compiler-dependant) stuff in the code, while writing immediately in Assembly you can emit them.
Thanks (specially to @phinds whose insight motivated me to look up for that language )

These days it's pretty much irrelevant that C code may be a bit faster than some other high level languages. The reason C is sometimes faster is that it is structured in a way so as to be very directly translatable into machine code whereas some other higher level languages get a bit convoluted in that translation.

I remember keeping track for a while of the size of the executable for "hello world". An old DOS, absolute-address version took about 15 bytes, including just the text and a single BIOS call. The C version, with relocatable code, took about 1,000 bytes. The VB.NET version, as a windows app, not a console app. Took 5,000 bytes. Just for grins I just did one as a "console app" under Visual Studio 2017 in VB.NET and it took 15,000 bytes but it's not quite a true "console app" since although it puts up a DOS command box the message actually shows up in a little windows form. Still, from 15 bytes to 15,000 bytes for "hello world" is quite a leap.

jim mcnamara · Apr 26, 2017

For the purpose that most people write code, assembly is massive overkill. It is processor-specific which has advantages when developing a math library base component or coding CUDA threads for a PC game. A very long time ago, we had to code in either actual machine code or in assembler. For everything. In terms of productivity this was not good. So, large bunches of base code modules, originally in assembler, were grouped into libraries.

For example, a single simple code snippet in C:

Code:

status=read(fd, buffer, nbytes);
return status;

might represent a varying large number of lines of assembler, depending on the implementation. But now we call read() which is a library object and has lots of processor-I/O driver specific code. One line equals thousands of hours of development and testing, by many people.

Depending on your C compiler, it probably supports inline assembler. That feature is there because many libraries and operating systems are/were developed in C. Sort of a self-referential pyramid. You can write a C compiler in C, for example. The original UNIX from ATT was developed in C. Play with that inline feature to learn some basic assembler. But remember: x86, power, sparc, and other processor flavors use different assembler code. Inline only works on a single flavor of processor.

Have fun.

jim mcnamara · Apr 26, 2017

Looks like you got two "geezer" answers... I like Paul's answer better.

phinds · Apr 26, 2017

jim mcnamara said:

For the purpose that most people write code, assembly is massive overkill.

Actually, it's much worse than that. It would take FOREVER to do in assembly what you can do in a few minutes in a modern IDE, since (as Jim points out) these things are based on a HUGE expenditure of time in coding and testing massive libraries of functions that you get to use with a single call instead of spending a year or two writing code for each one (AND you could only do that after you had spent several hundred years, probably at lot more, writing all the infrastructure code that those calls make use of).

But I agree w/ Jim that it's worthwhile to learn a little assembly just so you get the feel of programming a computer instead of programming a big fat pillow that obscures the computer from your direct vision.

jedishrfu · Apr 26, 2017

When you write any given algorithm in a programming language you can take advantage of certain features to make it run faster with less memory.

As an example, my boss was a fantastic assembly code programmer. He would do things like reuse the area for program initialization as a buffer for reading in data from files. It saved memory to do this the expense of not being able to easily debug the program because a section of code has been overwritten with data. His rationale was that you wouldn't necessarily need to see that code since it was already run and no longer needed.

As part of initialization, he would setup a long string of load and store commands to build a list of numbers:

lda #1
sta tbl+0
lda #2
sta tbl+1
...

When I saw the code, I told him I could make it more compact using a loop and he said I should compare my loop to his code as far as clock cycles goes. I realized then that my loop had to increment an index, test the index, load a value, store the value and repeat the process which was about 5 clock cycles vs his two cycles. He did this in his initialization block which he over wrote so he saved clock cycles and saved memory.

The point here is that compilers as good as they are will never give you the most optimal code and that an experienced programmer willing to break the many conventions of programming like reusing code areas for data areas can always squeak out code that is faster with a smaller memory footprint.

A similar argument pops up in interpreted languages vs compiled languages vs byte code compiled+interpreted languages where sometimes things must be recomputed. In contrast, a programmer using a compiled language can restructure things with the compilers help to minimize any recomputation. Some interpreted languages introduced caching features to minimize the recomputations but then they have to search to see if they've done it already and if not then compute it. Byte code variants have removed the syntax compilation time in an interpreted language so they start up faster but byte codes are still interpreted so they always run somewhat slower than a compiled language.

It always becomes a time(clock cycles) vs space(memory) issue.

All the various programming optimization strategies make it harder to see which is better.:
- dynamically linked libs vs static libraries (apps using DLLS
- compiled vs interpreted vs byte code compiled/interpreted

In the end, we programmers just use whatever language we can get away with to get our job done knowing we can always find a better faster way if necessary but it will take more time and we may have to rewrite things to do it. We also realize that rewrites may cost us in performance evaluations so we must choose wisely to optimize our time and energy.

jim mcnamara · Apr 26, 2017

@jedishrfu - True, but unless you run that one code section billions of times for every execution on an ancient processor - assembler limits more important and very expensive aspects of the whole process. ~90% of coding is maintenance, 10% primary development. You pay for code obfuscation in maintenance costs. Speaking for a large company with mostly maintenance coders - past tense in my case now

- if we had assembler I would still be fixing stuff other people could not deal with. Assembler == code obfuscation for CIS programmers. OS and driver developers, no. My opinion only, clearly not yours.

I hope this thread does not devolve into a 'my dog is better than your dog' tête-à-tête.

phinds · Apr 26, 2017

jim mcnamara said:

@jedishrfu - True, but unless you run that one code section billions of times for every execution ...

Actually, I know of one case where that is NOT the reason. Early in my career we had to do real-time processing of telemetry data, and when I say "real-time", I MEAN real-time, not what people mean today which is more like "really really quickly".

We had to do program self-modification to get the thing reduced from the original 150% of real-time down to just under 100% of real time. One of the tricks was to overlay a conditional jump with an absolute jump once the decision had been made, so as to avoid having to make the decision over and over in the loop. We also played games with the arithmetic coding, and so forth.

I would not wish it on anyone to have to do that today but with today's storage capacity and speeds and today's computing speeds, it wouldn't likely come up.

jedishrfu · Apr 26, 2017

jim mcnamara said:

@jedishrfu - True, but unless you run that one code section billions of times for every execution on an ancient processor - assembler limits more important and very expensive aspects of the whole process. ~90% of coding is maintenance, 10% primary development. You pay for code obfuscation in maintenance costs. Speaking for a large company with mostly maintenance coders - past tense in my case now - if we had assembler I would still be fixing stuff other people could not deal with. Assembler == code obfuscation for CIS programmers. OS and driver developers, no. My opinion only, clearly not yours.

To be clear my boss was a product of the 1960's where assembler was the ultimate programming language. He was an OS programmer for GCOS on Honeywell 6000 machines and he wasn't affected by the latest trends programming where you followed separation of data from programming code as ascribed by Edsger Dijkstra. I was an eager newbie who learned a lot under his wing.

https://en.wikipedia.org/wiki/Edsger_W._Dijkstra

My boss Ben W, could read computer dumps like novels and quickly spot what caused the failure. He was always suspicious of COBOL & Fortran compiler technology especially any use of the newer extended instruction set opcodes that handled packed decimal numbers (not quite ASCII and not quite binary). He was also the goto guy when Honeywell had a problem in the OS and he worked at GE and was not a Honeywell employee.

One time, he was looking at dump to find the problem and accidently set his hair on fire from his cigarette. Another time, he moved the dump knocked over his coffee cup and almost got me with the spill. And yet another time when he was going on vacation, he literally slid everything off his desk into the waste paper basket some of which were his manager papers. He was a great manager and kept out of your way and allowed you to fail so that later you could succeed.

I also heard a story where he and another expert had found a flaw in Honeywell's Timesharing service where you could read the passwords of any user (no salt at that time). They used the flaw to break into a Honeywell demo system and crashed it while Honeywell was showing it to the military. It got Honeywell's attention and the flaw was fixed pretty quickly. Although it took much longer before password encryption became the accepted practice.

jedishrfu · Apr 26, 2017

phinds said:

We had to do program self-modification to get the thing reduced from the original 150% of real-time down to just under 100% of real time. One of the tricks was to overlay a conditional jump with an absolute jump once the decision had been made, so as to avoid having to make the decision over and over in the loop. We also played games with the arithmetic coding, and so forth.

Yes, my boss did that sort of thing too. Oh the freedom to do whatever you wanted in code. I think only the malware hackers get to enjoy that nowadays.

jedishrfu · Apr 26, 2017

A great quote among many from Edsjer Dijkstra:

newjerseyrunner · Apr 26, 2017

The only time I've really used assembly was to optimize a library that was already heavily optimized by forcing the generated code to be a very specific size. I had a particular piece of code that did a lot of work and I decided to look in greater detail into during profiling. When I examined that generated machine code for that piece of code, it came to about 9KB. I know that the L1 cache has a normal size of 8KB, so I rewrote large chunks of it in assembly to get the final code down to about 5KB. That way the entire hunk of code, plus the data it was working with could be run exclusively from the L1 cache. That's something that you simply can't do with higher level languages.

jedishrfu · Apr 26, 2017

newjerseyrunner said:

That's something that you simply can't do with higher level languages.

Yet.

AI may someday begin to take over this task.

hilbert2 · Apr 26, 2017

You're likely to need Assembly if you're going to make 1kb intros or something similar, but for scientific computing it's rarely necessary.

ChrisVer · Apr 26, 2017

jim mcnamara said:

~90% of coding is maintenance, 10% primary development. You pay for code obfuscation in maintenance costs.

This thing makes me wonder then, why Assembly is still so high in the rankings if it leads to difficult-to-maintain codes (Which I can understand).

Then general questions:
In terms of programming for higher-level languages, do you think that knowledge of assembly can help? I mean I know that I will always prefer going and writing my work-code in python , and sometimes change into C++ and things like that, but what could I gain from partially learning Assembly? (except from an extra language and the fun of it).
At least for the first day trying to look into it, I learned some interesting parts about CPUs (basically how they work) which I didn't know. And generally maybe I'll get an insight of hardware coding.

hilbert2 said:

You're likely to need Assembly if you're going to make 1kb intros or something similar, but for scientific computing it's rarely necessary.

Well, I will need to have a look about it... For example I am wondering how the detector parts work in a high-density environment (such as what is happening in ATLAS) could make use of it (or if they just need 0's /1's)... Of course I may be off (I'm not a hardware guy)

Mark44 · Apr 26, 2017

ChrisVer said:

This thing makes me wonder then, why Assembly is still so high in the rankings if it leads to difficult-to-maintain codes (Which I can understand).

Because it's fast.

ChrisVer said:

Then general questions:
In terms of programming for higher-level languages, do you think that knowledge of assembly can help? I mean I know that I will always prefer going and writing my work-code in python , and sometimes change into C++ and things like that, but what could I gain from partially learning Assembly? (except from an extra language and the fun of it).

A basic knowledge of assembly can help you understand what the computer is doing. Just this morning in the C++ class I'm teaching, I presented a simple program that has one function with a reference parameter, and another function with a pointer parameter. By looking at the disassembled code in the debugger (VS debugger), I could see that the code for the two functions was exactly the same. In the pointer parameter version it was clearer that what was being passed in the parameter was actually an address, and that the function modified the memory that the pointer variable pointed to.

Code:

00E129AE mov eax,dword ptr [number]
       *result = number * number; << a line in the C++ source code
00E129B1 imul eax,dword ptr [number] 
00E129B5 mov ecx,dword ptr [result] 
00E129B8 mov dword ptr [ecx],eax

To follow what was going on, we looked at the disassembled code (as above, the lines preceded by numbers), the registers pane, to watch the values change in the CPU registers, and a memory window, to watch the actual memory being changed (as a result of the last line above).

My intent in comparing the two kinds of function parameters was to show what was actually going on, something that is obscured by the simplicity of reference parameters.
I told my students that I wasn't going to test them on this, and that what I was showing them was purely for their knowledge. I think a few of the students got what I was saying, but most didn't, which is OK, too. They will see more about pointers a little later in the course.

ChrisVer said:

At least for the first day trying to look into it, I learned some interesting parts about CPUs (basically how they work) which I didn't know. And generally maybe I'll get an insight of hardware coding.

Aufbauwerk 2045 · Apr 26, 2017

I think for most programmers today assembly language programming is basically for educational purposes. You learn about how processors work, among other things. You need to learn something about assembly language if you study code generation by compilers.

Here is a good starting point for assembly language. He shows how you can even do W*****s programming in assembly language.

https://www.grc.com/smgassembly.htm

In the old days, Rollercoaster Tycoon was written in assembly, because it was the only way to get the desired performance. But today I think it's mainly used for embedded systems, device drivers, and so on.

phinds · Apr 26, 2017

Aufbauwerk 2045 said:

I think for most programmers today assembly language programming is basically for educational purposes. You learn about how processors work, among other things.

Exactly. There ARE, as you note, still direct uses for it but I think it's smart to learn about computer via assembly because it can help you debug problems even in higher level languages

jim mcnamara · Apr 26, 2017

You can make sense out of stackdumps when you understand assembler. Ditto debuggers as @Mark44 nicely pointed out. I think most of the older folks answering here know several assemblers. So we think there are advantages to it, which other well experienced coders/admins might not agree with. We had no choice in learning it. I thought it was fun myself.

VS compiles go to MSIL (CIL) which is an intermediate object oriented pseudo-assembler. Learning that may be useful in the context of solving some deeper problems. After CIL and possibly some more steps it then gets linked. So no matter what language you use you have this interposed assembler. Which also may account for Mark's position on it.

FactChecker · Apr 26, 2017

Assembly language is only used on an "as needed" basis. General compiled languages like C, C++, FORTRAN, etc. can usually get the job done and have optimization compiler options that are hard to beat. (Interpreted languages are usually much slower.)
That being said, there are occasions when knowledge of assembly language is needed such as:

Debugging new compilers and hardware
Keeping track of test case coverage to get complete modified condition/decision coverage (MCDC)
Understanding why some code executes slower than expected

rcgldr · Apr 26, 2017

Most of the time that assembly code is actually needed is due to specialized instructions not included with most compilers. For example doing a context switch in a multi-threading / multi-processing operating system. Parts of a device driver may need hardware specific instruction.

Sometimes specialized instructions are used to speed up an algorithm, such as using the X86 pclmulqdq (carryless multiply) instruction for fast CRC calculation. There are examples of optimized implementations that take over 500 lines of assembly code:

https://github.com/01org/isa-l/blob/master/crc/crc16_t10dif_01.asm

There is also an usual mix of COBOL and assembly code for IBM mainframes due to legacy issues. Decades ago, some of the database I/O operations like ISAM (indexed sequential access method) were implemented as macro's in IBM assembler, and combined with the processor specific instruction based optimized assembly functions, resulted in a library of assembly code at classic COBOL type environments. In this case, the issue is a legacy one, since it doesn't make much sense to replace a working set of library routines, even if they are written in assembly.

As for new code, some compiler's have intrinsic inline functions for many processor specific instructions, but you still see examples of assembly code today.

Aufbauwerk 2045 · Apr 26, 2017

hilbert2 said:

You're likely to need Assembly if you're going to make 1kb intros or something similar, but for scientific computing it's rarely necessary.

Good link. Pouet.net is a major site for the demoscene. Tons of impressive stuff. I like Farbrausch among others. Check out their game .kkrieger is 96kb. Also .theprodukkt which is 63.5 kb. They use procedural generation to build the graphics at game startup. Also they developed a tool for making intros and demos called .werkkzeug. There is a fascinating video on .werkkzeug by Dierch Ohlerich:

Aufbauwerk 2045 · Apr 26, 2017

ChrisVer said:

PS : One point I found is the way a compiler can add unnecessary (compiler-dependant) stuff in the code, while writing immediately in Assembly you can emit them.
Thanks (specially to @phinds whose insight motivated me to look up for that language )

Compilers can also introduce bugs of their own, and knowing assembly can help you debug this.

For example, I know personally of a case where the programmer was debugging an embedded application in C. The C code had passed code review because no one could see any bugs. But then by stepping through the assembly code, the programmer realized that the compiler was generating bad code. When he switched off the optimizer, the problem went away. A quick phone call to the compiler writers and soon the compiler error was fixed.

FactChecker · Apr 26, 2017

Aufbauwerk 2045 said:

Compilers can also introduce bugs of their own, and knowing assembly can help you debug this.

For example, I know personally of a case where the programmer was debugging an embedded application in C. The C code had passed code review because no one could see any bugs. But then by stepping through the assembly code, the programmer realized that the compiler was generating bad code. When he switched off the optimizer, the problem went away. A quick phone call to the compiler writers and soon the compiler error was fixed.

In some safety critical applications, the optimizing compiler options are not allowed. Those compiler options have not been certified for use. Certification is expensive and is not always done for specialized hardware and applications.

Aufbauwerk 2045 · Apr 26, 2017

FactChecker said:

In some safety critical applications, the optimizing compiler options are not allowed. Those compiler options have not been certified for use. Certification is expensive and is not always done for specialized hardware and applications.

I was not aware of this restriction, but I'm not surprised. I do wonder how they certify such applications even with the optimizer switched off. Is it based on some kind of verification, or on testing? There is a big difference between software verification and software testing, as pointed out long ago by Wirth. Strictly speaking verification of a function would prove it is correct, while testing only proves no errors have been found so far.

When it comes to complex systems, let's say involving real time and multitasking, how can they formally verify such a system?

FactChecker · Apr 26, 2017

Aufbauwerk 2045 said:

I'm not surprised, but I was not aware of this restriction. I do wonder how they certify such applications even without the optimizer switched on. Is it based on some kind of verification, or on testing? There is a big difference between software verification and software testing, as pointed out long ago by Wirth. Strictly speaking verification of a function would prove it is correct, while testing only proves no errors have been found so far.

If you are asking about the application code (as opposed to the compiler and OS), certification is done by "defense in depth". The software developer specifies processes that must be approved and followed for requirement specification, documentation, review, coding, testing, and configuration management. The test plan includes unit tests, stand-alone tests, integrated tests, system tests. Test results must be recorded, reviewed, and tracked in a configuration management system. Test coverage must be recorded and shown to achieve certain levels of coverage. The rates of bugs found and what development phase they were found in are tracked. etc., etc., etc. ad nauseam.

jack action · Apr 27, 2017

This is a great thread! It should be a featured thread.

Oh please wise men, continue to enlighten us with your wonderful stories!

Buffu · Apr 27, 2017

Is Assembly taught to students ? It looks interesting.

Mark44 · Apr 27, 2017

Buffu said:

Is Assembly taught to students ? It looks interesting.

It isn't taught very much, as far as I know, but some universities teach it in their computer science/computer engineering or electrical engineering departments. If it is taught, it's usually in the context of computer architecture and system design.

One variant of assembly that appears fairly often is MIPS (https://en.wikipedia.org/wiki/MIPS_architecture), the assembly language used on MIPS processors.

Dr Transport · Apr 27, 2017

Buffu said:

Is Assembly taught to students ? It looks interesting.

I had to use it for an electronics course taught by a physics department about a decade ago...it was fun, i programmed a stepper motor...something I always wanted to do.

fluidistic · Apr 28, 2017

ChrisVer said:

I was having a morning look into assembly language videos and stuff and I was wondering:
how much faster is assembly compared to some higher-level programming languages (like C or C++ or python)? And also why is it faster than those languages?
I mean a general C++ source code should be translated into machine code by the compiler right? So the actual performance of the C++ code would be the same as that of assembly's, even though assembly's just above the machine language.
PS : One point I found is the way a compiler can add unnecessary (compiler-dependant) stuff in the code, while writing immediately in Assembly you can emit them.
Thanks (specially to @phinds whose insight motivated me to look up for that language )

I'll give you one example because that's the only one I know of. Take Stockfish chess engine, it's the current strongest chess engine and it's open source. Several tens of people (mostly programmers) are contributing to it since a few years, and it's entirely written in C++ and is said to have a high quality standard of code. If you know C++, you can judge the code by yourself : https://github.com/official-stockfish/Stockfish.

Now, a single man from South Korea ported the code to assembly (see https://github.com/tthsqe12/asm). He said he could optimize several parameters that he had to manually pick. So without being close to be fully optimized, the gain in speed was about 25% compared to C++ (using GCC as a compiler).
Another single man ported Stockfish to C (Ronald de Man, who is also the creator of the Syzygy end game table bases) and he gained about 10% speed vs C++. See the code at https://github.com/syzygy1/Cfish.

And if you're interested, here's a hint on the performance of these engines : http://spcc.beepworld.de/.

FactChecker · Apr 28, 2017

fluidistic said:

Now, a single man from South Korea ported the code to assembly (see https://github.com/tthsqe12/asm). He said he could optimize several parameters that he had to manually pick. So without being close to be fully optimized, the gain in speed was about 25% compared to C++ (using GCC as a compiler).
Another single man ported Stockfish to C (Ronald de Man, who is also the creator of the Syzygy end game table bases) and he gained about 10% speed vs C++. See the code at https://github.com/syzygy1/Cfish.

That may be deceptive. A person who rewrites a large program in assembly language is really determined to optimize speed. I'm sure that the C++ programmers were also concerned in keeping the code understandable and maintainable. If they put the same effort into optimizing for speed as the assembler coders, they may have written faster code than they did. I also have seen benchmarks where C++ was slower than C.

phinds · Apr 28, 2017

FactChecker said:

I also have seen benchmarks where C++ was slower than C.

C++, when used to its full capacity with OOP techniques, is a whole different beast from C even thought the basic syntax is identical and it's not at all surprising that it would be slower. You can write programs in C++ with OOP that COULD be written in C with a bit more work and the overhead of the OOP constructs would make the C++ slower.

FactChecker · Apr 28, 2017

phinds said:

C++, when used to its full capacity with OOP techniques, is a whole different beast from C even thought the basic syntax is identical and it's not at all surprising that it would be slower. You can write programs in C++ with OOP that COULD be written in C with a bit more work and the overhead of the OOP constructs would make the C++ slower.

Yes, I agree. I have always assumed that you can do things in C++ as you would do them in C and get the same speed. But in the benchmarks that I have seen, I don't know if that is how the C++ was done, so my assumption might be wrong.

hilbert2 · Apr 29, 2017

Mark44 said:

It isn't taught very much, as far as I know, but some universities teach it in their computer science/computer engineering or electrical engineering departments. If it is taught, it's usually in the context of computer architecture and system design.

One variant of assembly that appears fairly often is MIPS (https://en.wikipedia.org/wiki/MIPS_architecture), the assembly language used on MIPS processors.

If someone's trying to learn digital electronics and needs to understand grass root level things like how logic circuits are simplified with Karnaugh maps, it certainly doesn't hurt to learn Assembly language as well, even though it's not usually needed when writing actual programs.

Assembly language programming vs Other programming languages

Similar threads

Hot Threads

Recent Insights