Learning Assembly and computer architecture for x86

Click For Summary
Learning assembly and computer architecture for x86 involves understanding the differences between CISC (Complex Instruction Set Computing) and RISC (Reduced Instruction Set Computing) architectures. The discussion emphasizes starting with a C compiler to generate assembly code, which aids in understanding the underlying processes. It is noted that modern x86 assembly is complex and rarely used for full programs, with high-level languages being more effective for most applications. Resources like Intel's programmer's reference manual and various assembly programming books are recommended, but the importance of practical experience is highlighted. Ultimately, gaining foundational knowledge through existing systems and gradually expanding one's understanding is advised.
  • #31
elias001 said:
I also did ask how programmers in the old days learn to weite in assembly. Did they learn it by picking up a book like how they learn to code in fortran, etc. I mean there were so many different versions of assembly language. I am guessing baxk then then might not be like now where people are expected to know multiple languages even within a single programming language paradigm.
If you go back far enough, IBM dominated. I "learned" assembly language in a college class that was 75% IBM employees. There was no real variety. The latest IBM development was all that mattered.
(I quoted the "learned" because I was too immature and lazy at that time to really learn it.)
 
Technology news on Phys.org
  • #32
@FactChecker so what was it like wearing your own assembly programming? I mean you has Basic, Fortran and Pascal. Did you find it tedious or was it something you just did and thought nothing much of it.
 
  • #33
elias001 said:
i want to know what it felt like to at least write a program in the x86 architecture in assembly.
It's just assembly these days is not used for general programming on x86.
It'll be very hard to get any real, useful, relevant experience apart from some variations of 'hello world'.

That's why I recommended starting with microcontrollers. There, it's still possible to get the full experience.
 
  • #34
@Rive is programming microcontrollers the same as programming embedded systems? Also for assembly language, do each iterations of an architecture in the same family require one to learn a new version of assembly language. What i mean is say for the x86 family, there were pentium 2, then 3, 4. Now they have the ix family of chips. For each of these do people have to learn assembly language over again? I am guessing same question applies to diffferent iterations of Adruino and Raspberry Pis.

Sorry if I hope I am not asking something naive. But i keep hearing assembly is a bit different depending on which hardware architecture. But then my understanding is that there is a connection between Arm and RISC which i am not sure what it is.

Also if I want to use inline assembly in C, do I have to look up what cpu chip set my laptop is running on in order to be specific how to write my assembly code?
 
  • #35
elias001 said:
@Rive is programming microcontrollers the same as programming embedded systems?
Kind of. The term 'embedded system' usually hints a more complex environment, often with some kind of OS: while the term 'microcontroller' can be just some minimalistic setup - the smallest devkit with minimal addons.

elias001 said:
for the x86 family, there were pentium 2, then 3, 4.
As a beginner, what you will likely need is the IA32 as basis, from 1985... Everything can run that today.
Above that, it'll be about the interaction with the OS.

elias001 said:
I am guessing same question applies to diffferent iterations of Adruino and Raspberry Pis.
Yep. Usually there is a common basis for every family, with type-specific additional extras.

elias001 said:
Also if I want to use inline assembly in C, do I have to look up what cpu chip set my laptop is running on in order to be specific how to write my assembly code?
No. The OS (Windows?) you are using will severely limit the direct interaction with the hardware. When you have an OS then it's more about the interaction with the OS which is providing the necessary services to access what the hardware can offer.
 
  • #36
@Rive yes I am running windows. I have two books on advanced assembly, but most of the topics are about how to interact with a computer mouse, keyboard, etc. I am not interested in those topics yet at this point.

There are some basic things that are applicable to assembly programming regardless of which architecture one is writing it for? I wish there are books that has that.
 
  • #37
elias001 said:
There are some basic things that are applicable to assembly programming regardless of which architecture one is writing it for?
It is all about learning to dissect the problem to lot finer steps than usual, while creating your own levels of abstraction from scratch fitting with the limited (CPU) resources while minding the environment you are working with.

My humble opinion is, that it is kind of hard to get what assembly is really good for in such a resource-rich environment as a highly advanced CPU and OS.

elias001 said:
how to interact with a computer mouse, keyboard, etc.
What kind of environment (OS) is used in those books?
 
  • #38
@Rice the advanced assembly language books are all for the ibm pc, one of them uses the old Microsoft marco assembler. Both books were written in the late 80s and early 90s. Both books assumed the reader knows already know how to code in assembly language.
 
  • #39
elias001 said:
Both books were written in the late 80s and early 90s.
Then for practice you better get a PC from that era. Highly likely won't work on modern PC, with modern OS - but even if you manage to make it work, it won't give you much useful knowledge.

...I may suggest to get 'Peter Norton's Assembly Language Book for the IBM PC' too from that era (a scan may be downloadable, I think).
Very good resource - for assembly programming for 8086 under MSDOS up to 6.22, that is...

But honestly, I really recommend to start with a PIC12 or PIC16 devboard instead.
 
  • #40
elias001 said:
Also if I want to use inline assembly in C, do I have to look up what cpu chip set my laptop is running on in order to be specific how to write my assembly code?
Assembly languages are very much specific to the CPU being used. Years ago, Apple Mac computers used Motorola M6800x processors, which were completely different from the Intel x86 processors used in PCs. Later, the Macs used a PowerPC processor, which again was different from the Intel (and AMD) processors used in PCs of that era. I haven't followed Macs lately, but I understand that they now use the same or similar processors as PCs.
elias001 said:
I have two books on advanced assembly, but most of the topics are about how to interact with a computer mouse, keyboard, etc. I am not interested in those topics yet at this point.
I'm not very interested in those low-level BIOS operations, either. I'm more interested in writing mixed-language programs in which part is written in C or C++ (the main() function), with calls to functions written in x86 or x64 assembly. Doing things this way gives you a solid understanding of how parameters are passed to a function (stack-based on x86 and in registers in x64), how values are returned from functions, how pointers work with arrays and structs, and lots more.
 
  • #41
Here's a simple example of what I mean by mixed-language programming.
Below is the high-level part in C.
C:
// Driver.cpp : Call an assembly routine to add two integers, returning their sum.

#include <stdio.h>
extern "C" int AddParams(int, int);

int main()
{
    int arg1 = 5, arg2 = 13;
    int result = AddParams(arg1, arg2);
    printf("The sum of %d and %d is %d", arg1, arg2, result);
}
Below is the x86 assembly part that implements AddParams(). In x86 assembly, arguments to assembly functions (PROCs) are pushed onto the stack. The assembly function needs to know where are the stack the parameters appear in order to access them, in this case at locations 4 bytes and 8 bytes above the current value of the stack pointer. After the first parameter is loaded into a register, the second parameter can be loaded and added to the first. When the routine returns, the sum is in the EAX register, which is where the C main() function expects it.
Code:
.model flat, c
; 32-bit code -- i.e., uses 32-bit registers eax, ebx, ecx, etc.

.code

;Equates
Param1    equ <[ebp + 8]>
Param2    equ Param1 + 4

; Add the two parameters, and return their sum.
; Prototype: extern "C" int AddParams(int a, int b);
; "C" tells the compiler not to "mangle" the routine name
; Returns the sum of Param1 and Param2 in eax

AddParams PROC C
    push ebp
    mov ebp, esp
    mov eax, [Param1]   ; Load first argument
    add eax, [Param2]   ; Add second argument
    pop ebp
    ret                    ; Sum is in eax
AddParams ENDP
END

The output is as follows:
Code:
The sum of 5 and 13 is 18
 
  • Like
Likes FactChecker and sbrothy
  • #42
Rive said:
Kind of. The term 'embedded system' usually hints a more complex environment, often with some kind of OS: while the term 'microcontroller' can be just some minimalistic setup - the smallest devkit with minimal addons.


As a beginner, what you will likely need is the IA32 as basis, from 1985... Everything can run that today.
Above that, it'll be about the interaction with the OS.


Yep. Usually there is a common basis for every family, with type-specific additional extras.


No. The OS (Windows?) you are using will severely limit the direct interaction with the hardware. When you have an OS then it's more about the interaction with the OS which is providing the necessary services to access what the hardware can offer.
Yeah. I was also gonna suggest “embedded systems” I coded a barcode scanner and that was low level C against it’s BIOS ROM. Not much fireworks there! :)

EDIT: I also coded dictaphones and set up the backend completely with RDBMS. They were sold to Chinese hospitals close to nationwide. Made my boss a wealthy man but all I got (besides a good salary for an autodidact) was an AUDI A3. After I installed Microsoft Outlook Server I was told what Microsoft takes for the same task. I refused to show up for a couple of days after that, driving around in my new car, seething! :woot:
 
Last edited:
  • #43
sbrothy said:
Yeah. I was also gonna suggest “embedded systems” I coded a barcode scanner and that was low level C against it’s BIOS ROM. Not much fireworks there! :)

EDIT: I also coded dictaphones and set up the backend completely with RDBMS. They were sold to Chinese hospitals close to nationwide. Made my boss a wealthy man but all I got (besides a good salary for an autodidact) was an AUDI A3. After I installed Microsoft Outlook Server I was told what Microsoft takes for the same task. I refused to show up for a couple of days after that, driving around in my new car, seething! :woot:
Though I guess the car made up for ir with Danish taxes in mind. I just felt cheated.
 
  • #44
Mark44 said:
Here's a simple example of what I mean by mixed-language programming.
Below is the high-level part in C.
C:
// Driver.cpp : Call an assembly routine to add two integers, returning their sum.

#include <stdio.h>
extern "C" int AddParams(int, int);

int main()
{
    int arg1 = 5, arg2 = 13;
    int result = AddParams(arg1, arg2);
    printf("The sum of %d and %d is %d", arg1, arg2, result);
}
Below is the x86 assembly part that implements AddParams(). In x86 assembly, arguments to assembly functions (PROCs) are pushed onto the stack. The assembly function needs to know where are the stack the parameters appear in order to access them, in this case at locations 4 bytes and 8 bytes above the current value of the stack pointer. After the first parameter is loaded into a register, the second parameter can be loaded and added to the first. When the routine returns, the sum is in the EAX register, which is where the C main() function expects it.
Code:
.model flat, c
; 32-bit code -- i.e., uses 32-bit registers eax, ebx, ecx, etc.

.code

;Equates
Param1    equ <[ebp + 8]>
Param2    equ Param1 + 4

; Add the two parameters, and return their sum.
; Prototype: extern "C" int AddParams(int a, int b);
; "C" tells the compiler not to "mangle" the routine name
; Returns the sum of Param1 and Param2 in eax

AddParams PROC C
    push ebp
    mov ebp, esp
    mov eax, [Param1]   ; Load first argument
    add eax, [Param2]   ; Add second argument
    pop ebp
    ret                    ; Sum is in eax
AddParams ENDP
END

The output is as follows:
Code:
The sum of 5 and 13 is 18
I actually meant the asm keyword. Although nowadays, due to operating systems being paranoid about protecting their resources, it might be somewhat obsolete. Perhaps not on embedded systems and in low level OS code though.

Most compilers also have a switch for outputting your program as assembler (-S with GCC if I'm not mistaken). I'm sure Devstudio has something similar. It's been a while.

EDIT: Yeah:

MSVC does not support inline assembly on the ARM and x64 processors, and only support the form introduced by __asm on x86 processors.
 
  • #45
How much you will use assembly language depends on what type of work you do. If you get into real-time device handlers, you will see it a lot. In 37 years of programming, I have only been involved with assembly code a few times. Once was regarding device handlers. Once was in trying to speed up some real-time aerodynamic calculations (speed issues where memory access is involved is tricky and non-intuitive). And once was inserting assembly code to record test branch coverage of a test set.
That being said, if you are a good assembly language programmer the jobs will find you. The question will be whether you like those jobs.
 
  • Like
Likes harborsparrow, Rive and sbrothy
  • #46
FactChecker said:
How much you will use assembly language depends on what type of work you do. If you get into real-time device handlers, you will see it a lot. In 37 years of programming, I have only been involved with assembly code a few times. Once was regarding device handlers. Once was in trying to speed up some real-time aerodynamic calculations (speed issues where memory access is involved is tricky and non-intuitive). And once was inserting assembly code to record test branch coverage of a test set.
That being said, if you are a good assembly language programmer the jobs will find you. The question will be whether you like those jobs.

Indeed, assembly language isn't known for being funny and a breeze to work with. And as for speeding things up in this day and age, throwing more hardware at the problem, fiddling with compiler flags, starting a concurrent process, or if push comes to shove, implementing multi-threading for the problem program is probably preferable 9 out of 10 times....

EDIT: Unless of course, as you describe, the problem being very specialized.
 
  • Like
Likes FactChecker
  • #47
sbrothy said:
I actually meant the asm keyword. Although nowadays, due to operating systems being paranoid about protecting their resources, it might be somewhat obsolete.
Visual Studio doesn't permit inline assembly (i.e., using the asm keyword) in 64-bit code. The example I showed was compiled as 32-bit code. The assembly code would look quite a bit different if I had done it as a 64-bit example, since the parameters would have been passed in specific registers rather than on the stack.
 
  • Informative
Likes FactChecker and sbrothy
  • #48
Mark44 said:
Visual Studio doesn't permit inline assembly (i.e., using the asm keyword) in 64-bit code. [...]
No. I wrote as much in my comment above. No harm in it getting chiseled in stone though. It's been that way for quite a while if I'm not mistaken.

EDIT: Your first comment obviously. Lemme edit that quote. There.
 
  • #49
@sbrothy, @Rive and @FactChecker I think all three of you recommended me to try out microcontrollers/embedded systems. So say I wrong an assembly program to control some tiny devices. Is that finished program call "firmware"?

Also the two advanced assembly language books that i have are:

Advanced Assembly Language on the IBM PC by Steven Holzner

and

Advanced Assembly Language by Allen L Wyatt

The two C books with assembly programming are:

C with Assembly Language by Steven Holzner

and

X86 Assembly Language and C fundamentals by Joseph J F Cavanagh.

I do also have that Peter Norton's book.


If I have any other questions about this topic, I will ask in a new post. I appreciate all of you explaining assembly language to me. I still feel weird about and uneasy about learning a programming language only for a specific architecture then knowing that on a different architecture the same programming language will have differences.
 
  • #50
I think firmware is more @FactChecker ‘s alley. I’d call a device driver firmware. Anything that interfaces to hardware.
 
  • Like
Likes FactChecker
  • #51
  • #52
elias001 said:
I still feel weird about and uneasy about learning a programming language only for a specific architecture then knowing that on a different architecture the same programming language will have differences.
You are mistaken here. Different architectures have different assembly languages. Sometimes there even are different versions of an assembly language for a given processor when different OSes are being used, for example with Windows vs. Linux. That said, once you become competent with one assembly language, it's not that difficult to learn how another one does things.
 
  • Like
Likes FactChecker
  • #53
Mark44 said:
That said, once you become competent with one assembly language, it's not that difficult to learn how another one does things.
I think the difference between RISK versus CISK is more significant than that.
 
  • #54
elias001 said:
I still feel weird about and uneasy about learning a programming language only for a specific architecture then knowing that on a different architecture the same programming language will have differences.
To be more precise, an assembly language is matched to the architecture that it runs on. Different architecture has a different assembly language.
The higher level languages have standard versions that should run properly on any compliant machine.
 
  • #55
Which is also why I’m a little surprised that @elias001 chose to start at assembler level and not, say, C, which is what I did. But to each his own.
 
  • #56
FactChecker said:
I think the difference between RISK versus CISK is more significant than that.
RISC (reduced instruction set computing) instruction sets are smaller than those for CISC (complex instruction set computing), which is fairly obvious if you unpack the acronyms. For a given processor you have registers and memory, and the instruction set provides operators to load into or store out of registers into memory or other registers, to perform arithmetic operations on values in the registers, to branch to other parts of the program, and so forth.
I've written code in Intel x86 and x64 assembly (CISC) as well as MIPS and ARM (both RISC). Although CISC instruction sets have way more instructions, a lot of the time the instructions one uses are more-or-less common to both types of computing.
 
  • Like
Likes FactChecker
  • #57
sbrothy said:
Which is also why I’m a little surprised that @elias001 chose to start at assembler level and not, say, C, which is what I did. But to each his own.
I think he said somewhere along the line, that he had done some programming in C or C++.
 
  • #58
Mark44 said:
I think he said somewhere along the line, that he had done some programming in C or C++.
Ah.
 
  • #59
@sbrothy I went through C, and now am going through the book Introduction to computing systems from bits & gates to C/C++ & beyond. When i first said I was going through this book, it forgot to put down the first part since the second part of the title was much more memorable and easier to remember for me. Anyways, the book has the Lc3 mini assembly language. As you know and I soon found out the x86 family of architecture, there is no online emulator/simulator and the more i looked around, the more confusing and the more questions I ended up having. Also, some of those old assembly language books had sections on how to sort stuff. Though none of them discusses data structures in full that one would find in a C book
 
  • Like
Likes harborsparrow
  • #60
Mark44 said:
Although CISC instruction sets have way more instructions, a lot of the time the instructions one uses are more-or-less common to both types of computing.
I got the (maybe incorrect) impression that RISC programming was harder when there was a great shortage of RISK programmers. Maybe the shortage was due to the initial flood of RISK processors rather than their programming difficulty. I thought that RISK processing was like low-level array processor programming, where you had to keep track of the short time that the output of every operation was available and grab it in that cycle. That was like 3-dimensional assembler language programming.
 

Similar threads

  • · Replies 25 ·
Replies
25
Views
2K
Replies
5
Views
4K
  • · Replies 13 ·
Replies
13
Views
2K
Replies
14
Views
3K
  • · Replies 4 ·
Replies
4
Views
15K
  • · Replies 397 ·
14
Replies
397
Views
19K
Replies
6
Views
3K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 8 ·
Replies
8
Views
5K