In this Insights Guide the focus is entirely on types of languages, how they relate to what computers actually do, and how they are used by people. It is a general discussion and intended to be reasonably brief, so there is very little focus on specific languages. Please keep in mind that for just about any topic mentioned in this article, entire books have been written about that topic v.s. a few sentences in this article.
The primary focus is on
• Machine language
• Assembly language
• Higher Level Languages
And then some discussion of
• interpreters (e.g. BASIC)
• markup languages (e.g. HTML)
• object oriented languages (e.g. Java)
• Integrated Development Environments (e.g. The “.NET Framework”)
Database languages (e.g. SQL) will be discussed briefly in Part 2.
Computers work in binary and their innate language is strings of 1’s and 0’s. In my early days I actually programmed computers using strings of 1’s and 0’s, input to the machine by toggle switches on the front panel. You don’t get a lot done this way but fortunately I only had to input a dozen or so instructions, which made up a tiny “loader” program. That loader program would then read in more powerful programs from magnetic tape. It was possible to do much more elaborate things directly via toggle switches. Remembering EXACTLY (mistakes are not allowed) which string of 1’s and 0’s represented which instructions was a pain.
SO … very early on the concept of assembly language arose.
This is where you represent each machine instruction’s string of 1’s and 0’s by a human-language word, or abbreviation. Things such as “add” or “ad” and “load” or “ld”. These sets of human-readable “words” are translated to machine code by a program called an assembler.
Assemblers have an exact one-to-one correspondence between each assembly language statement and the corresponding machine language statement. This is quite different from high level languages where compilers can generate dozens of machine language statements for a single high level language statement. Assemblers are a step away from machine language in terms of usability but when you program in assembly you are programming in machine language. You are using words instead of the strings of 1’s and 0’s but assembly language is direct programming of a computer, with full control over data flow, register usage, I/O, etc. None of those are directly accessible in higher level languages.
Every computer has its own set of machine language instructions, and so every machine has its own assembly language. This can be a bit frustrating when you have to move to a new machine and learn the assembly language all over again. Having learned one, it’s fairly easy to learn another, but it can be very frustrating/annoying to have to remember a simple word in one assembly language (e.g. “load”) is slightly different (e.g. “ld”) or even considerably different (e.g. “mov”) in another language. Also, machines have slightly different instruction sets which means you not only have new “words” you miss some and add some. All this means there can be a time-consuming focus on the implementation of an algorithm instead of just being able to focus on the algorithm itself.
The good news about assembly language is that you are not programming some “pillow” that sits between you and the machine, you are programming the machine directly. This is most useful for writing things like device drivers where you really need to be down in the exact details of what the machine is doing. They also give you the opportunity to do the best possible algorithm optimization, but for complex algorithms you would need a VERY compelling reason indeed to go to the trouble of writing it in assembly and with today’s very fast computers, that’s not likely to occur.
The bad news about assembly language is that you have to tell the machine EVERY ^$*&@^#*! step that you want it to perform, which is really tedious. An algebraic statement such as
A = B + COSINE(C) – D
has to be turned into a string of instructions something like what follows (which is a pseudo assembly language that is MUCH more verbose than actual assembly)
• load memory location C to register 1
• Call Subroutine “Cosine” [thus getting the cosine of what’s in register 1 back into register 1 by calling a subroutine]
• Load memory location B to register 2
• Add registers 1 and 2 [thus putting the sum into register 1]
• load memory location D to register 2
• subtract register 2 from register 1 thus putting the difference into register 1
• save register 1 to memory location A [final value]
And that’s a VERY simple algebraic statement, and you have to specify each of those steps. In the right order. Without any mistakes. It will make your head hurt. Just to give you some of the flavor of assembly, here’s a very small portion of an 8080 instruction guide (a CPU used in early personal computers). Note that both the octal and hex versions of the binary code are given. These were used during debug.
Because of the tediousness of all this, compilers were developed
COMPILERS / higher level languages
Compilers take away the need of assembly language to specify the minutiae of the machine code level manipulation of memory locations and registers. This makes it MUCH easier to concentrate on the algorithm that you are trying to implement rather than the details of the implementation. Compilers take complex math and semi-human language statements and translate them into machine code to be executed. The bad news is that when things go wrong, figuring out WHAT went wrong is more complicated than in machine/assembly code because the details of what the machine is doing (which is where things go wrong) are not obvious. This means that sophisticated debuggers are required for error location and correction. As high level languages add layers of abstraction, the distance between what you as a programmer are doing and what the machine is doing down at the detailed level gets larger and larger. This could make debugging things intolerable, but the good news is that today’s programming environments are just that; ENVIRONMENTS. That is, you don’t usually program a computer language any more, you program an Integrated Development Environment (IDE) and it includes extensive debug facilities to ease the pain of figuring out where you have gone wrong when you do. And you will. Inevitably.
Early high level languages were:
• Fortran (early 1950’s)
• Autocode (1952)
• ALGOL (late 1950’s)
• COBOL (1959)
• LISP (early 1960’s)
• APL (1960’s)
• BASIC (1964)
• PL/1 (1964)
• C (early 1970’s)
Since then there have been numerous developments, including
• Ada (late 70’s, early 80’s)
• C++ (1985)
• Python (late 1980’s)
• Perl (1987)
• Java (early 1990’s)
• Ruby (mid 1990’s)
• PHP (1994)
• C# (2002)
There were others along the way but they were of relatively little impact compared to the ones listed above. Even the ones named have had varying degrees of utility, popularity, widespread use, and longevity. One of the lesser known ones was JOVIAL, a variant of ALGOL (which was originally called the “International Algorithmic Language”). I mention JOVIAL only because the name always tickled me. It was developed by Jules Schwartz and JOVIAL stands for “Jules’ Own Version of the International Algorithmic Language”. I enjoy his lack of modesty.
A special note about the C programming language. C caught on over time as the most widely used computer languages of its time and the syntax rules (and even some of the statements) of several more modern languages come directly from C. The UNIX, Windows and Apple operating systems kernels are written in C (with some assembly for device drivers) as is some of the rest of the Operating System’s code. Unix is written entirely in C and requires a C compiler. Because of the way it was developed and implemented, C is, by design, about as close to assembly language as you can get in a high level language. Certainly it is closer than other high level languages. Java is perhaps more widely used today but C is still a popular language.
INTERPRETERS / SCRIPTING LANGUAGES / COMMAND LANGUAGES
Interpreters are unlike assemblers and compilers in that they do not generate machine code into an object file and then execute that. Rather, they interpret each statement into machine code one statement (or even just parts of statements) at a time and perform the necessary calculations/manipulations, saving intermediate results as necessary. Use of registers and such low-level detail is completely hidden from the user. The bad news about interpreters is that they are slower than mud compared to executable code generated by assemblers and compilers but that’s much less of an issue today given the speed of modern computers. The particularly good news is that they are amenable to on-the-fly debugging. That is, you can break a program at a particular instruction, change the values of the variables being processed and even change the subsequent instructions and then resume interpretation. This was WAY more important in the early days of computers when doing an edit-compile-run cycle could require an overnight submission of your program to the computer center.
A very important type of scripting language is the command script language, also called “shell scripts”. Early PC system control was done with the DOS command interpreter which is one such. Probably the most well known is Perl. In its heyday it was the most widely used system administrator’s language to automate operational and administrative tasks. It is still widely used and there are a lot of applications still running that have been built in Perl. It runs on just about all operating systems and still has a very active following because of its versatility and power.
The most widely used markup language is HTML (Hypertext Markup Language). It defines sets of rules for encoding documents in a format that is both human-readable and machine-readable. You will often hear the phrase “HTML code”. Technically this is a misnomer since HTML is NOT “code” in the original sense because its purpose is not to result in any machine code. They are actually just data. They do not get translated to machine language and executed. They are read as input data. HTML is read by a program called a browser and is used not to specify an algorithm but rather to describe how a web page should look. HTML doesn’t specify a set of steps, it “marks up” the look and feel of how a browser creates a web page.
To go at this again, because it’s fairly important to understand the difference between markup languages and other languages, consider the following. Assembler and Complier languages result in what is called object code. That is, a direct machine language result of the assembly or compilation and the object code persists during execution. Interpreter languages are interpreted directly AT execution time so clearly exist during execution. Markup languages are different in that they do not persist during execution and in fact do not themselves result directly in any execution at all. They provide information that allows a program called a browser to create an HTML web page with a particular look and feel. So HTML is not code in the sense that a program is code; it is just data to be used by the browser program. A significant adjunct to HTML is CSS (Cascading Style Sheets) which can be used to further refine the look and feel of a web page. The latest version of HTML, HTML5, is a significant increase in the power and flexibility of HTML but further discussion of CSS and HTML5 is beyond the scope of this article. Another significant markup language is XML, also beyond the scope of this article.
INTEGRATED DEVELOPMENT ENVIRONMENTS
Programming languages these days come with an entire environment surrounding them, including for example object creation by drag and drop. You don’t program the language so much as you program the environment. The number of machine language instructions generated by a simple drag and drop of a control onto the window you are working on can be enormous. You don’t need to worry about such things because the environment takes care of all that. The compiler is augmented in the environment by automatic linkers, loaders, and debuggers so you get to focus on your algorithm and the high-level implementation and not worry about the low level implementation. These environments are now called “Integrated Development Environments” (IDE’s) which is a very straightforward description of exactly what they are. They include a compiler, debugger, help tool, and as well as additional facilities that can be extremely useful (such as a version control system, a class browser, an object browser, support for more than one language, etc.). These extensions will not be explored in this article but some of them will be touched on in the second article.
Because the boundary between a full bore IDE and a less powerful programming environment is vague, it is quite arguable just how long IDE’s have existed but I think that it is fair to say that in their current very powerful incarnations they began to evolve seriously in the 1980’s and came on even stronger in the 90’s. Today’s incarnations are more evolutionary from the 80’s and 90’s than the more revolutionary nature of what was happening in those decades.
So to summarize, the basic types of computer languages are
• database (discussed in the Part 2 article)
and each has its own set of characteristics and uses.
Within the confines of those general characteristics, there are a couple of other significant categorizations of languages, the most notable being object-oriented languages and event driven languages. These are not themselves languages but they have a profound effect on how programs work so deserve at least a brief discussion in this article.
OBJECT ORIENTED LANGUAGES
“Normal” compiled high level languages have subroutines that are very rigid in their implementation. They can, of course, have things like “if … then” statements that allow for different program paths based on specific conditions but everything is well defined in advance. “Objects” are sets of data and code that work together in a more flexible way than exists in non-object-oriented programming.
C++ tacked object orientation onto C but object orientation is not enforced. That is, you can write a pure, non-object program in C++ and the compiler is perfectly happy to compile it exactly as would a C compiler. Data and code are not required to be part of objects. They CAN be but they don’t have to be. Java on the other hand is a “pure” object-oriented language. ONLY objects can be created and data and code are always part of an object. C# (“C sharp”) is Microsoft’s attempt to create a C language that is purely object-oriented like Java. C# is proprietary to Microsoft and thus is not used on other platforms (Apple, Unix, etc). I don’t know if it is still true, but an early comment about C# was “C# is sort of Java with reliability, productivity and security deleted”.
Objects, and OOP, have great flexibility and extensibility but getting into all of that is beyond the scope of this modest discussion of types of languages.
EVENT DRIVEN PROGRAMMING
“Normal” compiled high level languages are procedural. That is, they specify a strictly defined set of procedures. It’s like driving a car. You go straight, turn left or right, stop then go again, and so forth, but you are following a specific single path, a procedural map.
“Event-driven” languages such as VB.NET are not procedural, they are, as the name clearly states, driven by events. Such events include, for example, a mouse-click or the press of a key. Having subroutines that are driven by events is not actually new at all. Even in the early days of computers, things like a keystroke would cause a system interrupt (a hardware action) that would cause a particular subroutine to be called. BUT … the integration of such event driven actions into the very heart of programming is a different matter and is common in modern IDEs in ways that are far more powerful than early interrupt-driven system calls. It’s somewhat like playing a video game where new characters can pop up out of nowhere and characters can teleport to a different place and the actions of one character affect the actions of other characters. Like OOP, event driven programming has its own power and pitfalls but those are beyond the scope of this article. Both Apple operating systems and Microsoft operating systems strongly support event driven programming. It’s pretty much a necessity in today’s world of highly interactive, windows based, computer screens.
GENERATIONS OF LANGUAGES
You’ll read in the literature about “second generation”, “third generation”, and so forth, computer languages. That’s not much discussed these days and I’ve never found it to be particularly enlightening or helpful, however, just for completeness, here’s a rough summary of “generations”. One reason I don’t care for this is that you can get into pointless arguments about exactly where something belongs in this list.
• first generation = machine language
• second generation = assembly language
• third generation = high level languages
• fourth generation = database and scripting languages
• fifth generation = Integrated Development Environments
Newer languages don’t really replace older ones. Newer ones extend the capabilities and usability and are sometimes designed for specific reasons, such as database languages, rather than general use, but they don’t replace older languages. C, for example, is now 40+ years old but still in very wide use and Fortran is even older and yet because of the ENORMOUS amount of scientific subroutines that were created for it over decades, it is still in use in some areas of science.
Part 2 will include brief discussions of the following concepts which go beyond specific languages and are more about how languages are used by people and how their output code is used the machines themselves
• source code v.s. object code
• Assembly/machine language program self-modification
• relocatable vs absolute
• .com vs .exe
• linkers / shared libraries / DLLs
• command interpreters ()
• optimized compilers
• IDE features (object browser, class browser, etc.)
• intermediate code (e.g. Microsoft’s Common Intermediate Language)
• paged computers
• bit / byte / word
• structured and unstructured languages
• database languages (SQL, MySQL, Oracle, DB2, etc, and lightweights such as Access)
• OOP (C++ vs Java) etc.
• assembly embedded in C
• Von Neumann Machines v.s Harvard Architecture
• threaded programming (single processor and multiprocessor)
• code obfuscation