Interpreter vs Compiler: What Are the Differences and Benefits?

opus · Sep 3, 2018

I'm learning Python in my introductory Computer Science class and I'm going over high-level languages- more specifically, Interpreters vs Compilers.

To my understanding, when we write in a high-level language, which a computer does not understand, we write what's called a source code or source program. For the computer to understand this source code, it needs to be translated into machine code via an Interpreter or Compiler.

The main differences, from what I can see, between an interpreter and compiler is that compilers add an extra step. That is,

Interpreters take the source code, translate it to machine code, and executes this statement immediately and one statement at a time.
Compilers take the source code, translate it to machine code via a compiler, then from here I am confused. From what I can tell, after it's been translated into machine code, it now uses an "executor" to run the program.

So my question is two fold. First, is my understanding of compilers correct? And second, what's the point of the extra step with compilers? What benefit does that have?

fresh_42 · Sep 3, 2018

opus said:

So my question is two fold. First, is my understanding of compilers correct?

Basically yes. But it does a bit more. In former times this was one or two individual additional steps, namely link to and bind all needed library codes. Your source code usually uses standard tools as random generators, floating point arithmetic, formatting tools, date converter etc. All these have to be included into the executable file. Nowadays it is usually done in the background if you push the compile button in your IDE.

And second, what's the point of the extra step with compilers? What benefit does that have?

Interpreters are slow. They always read all steps at a time, execute them and read the next. Also you must think for them, i.e. you may not forget those things I told about in the previous step. Interpreters are stupid. A compiler does this for you and optimizes the machine code. You can also put in measurements to catch exceptions as divisions by zero, read errors and so on. An interpreter crashes.

Imagine I would tell you how to proof a certain number is divisible by 9.
If you were an interpreter the code goes:
Set save to zero. Do while exists read digit. Add to save. End while. Do while save greater eight. Subtract nine from save. End while. If save equals zero print yes else print no.
Now for the compiler code:
Read number. If number modulo 9 equals zero print yes else print no.
This is a bit over simplified, but it is what's essentially going on. The interpreter has also a modulo function, but it loads it on demand, whereas the compiler integrates it in the executable, which saves a lot of time. Reads from the disk take longer than reads from memory by several orders magnitude. So you can choose between time (interpreter) or memory (compiler).

jtbell · Sep 3, 2018

opus said:

what's the point of the extra step with compilers? What benefit does that have?

The compiler saves the program's machine code in an "executable file". Then you tell your operating system to "execute" that file. The next time you run the program, it doesn't need to be compiled again. You simply execute the saved executable file.

[added] As fresh_42 noted while I was writing the previous paragraph, usually there's another step after compilation but before execution, namely "linking" your compiled code with previously-compiled code for standard operations like math functions and input/output. The details of how you do all this depends on your operating system and programming environment.

As an example, using the command line in a Unix-like OS, you might start by using a text editor to create a source program in C, in a file named myprog.c. Then you might compile it (but without linking it yet) using a command like

cc myprog.c -o myprog.o

which tells the compiler (cc) to store the compiled machine code in the file myprog.o. This is often called an "object file" (hence the ".o" in the file name). Then to link the file to the standard C library, you might use a command like

ld myprog.o cstdlib.o -o myprog

which produces an executable file named "myprog". (Unix geeks, please note that I'm doing this from memory and the exact name of the library is probably different! I just want to convey the flavor of the process.)

In Unix-like systems at least, you can usually combine the two steps into a single command like

cc myprog.c -o myprog

in which "cc" does both the compilation and linking steps, knows to link to the C standard library without your having to specify it explicitly, and doesn't bother to save an object file (myprog.o), only the final executable "myprog".

Tom.G · Sep 3, 2018

While I was typing this, I see several others have responded; anyhow here is a quick look at some of the internal 'dirty details' of Interpreter/Compiler differences.

An Interpreter contains all the code to execute every instruction in the language, add, multiply, search in a string, invert a matrix, etc. When you write and execute a program in the high level language, the interpreter looks at each high level instruction then does a CALL to its own built-in code to do the processing. These built-in called routines then have to determine whether a variable is an Integer, Float, Complex, String, Matrix, etc

A Compiler reads your source code and emits machine language for every high level instruction. This may be just a few machine language instructions for simple things like an Integer add, or several hundred machine instructions for more complex things like matrix inversion, etc.

Compilers also do Code Optimization before emitting the machine language. For instance if multiplying two numbers, you have already told the compiler if the numbers are Integers, Floats, Complex, etc. and it will emit only the code required for the operation (along with perhaps code to convert between the formats if needed.)
If you are computing an intermediate result in a calculation, a compiler can also look ahead in the code and see if the result is also used somewhere else. If that intermediate is used, say, only in the following line, the compiler realizes this, keeps the value in an internal CPU register, and uses it for the next calculation. This saves both a write and a read to memory, which are much slower than a CPU register access.

Overall, execution times of compiled code is much faster than interpreted code. At program execution time, every time a variable is referenced the interpreter has to decide which format it is; whereas a compiler does this once at compile time so the execution time is not impacted.

Cheers,
Tom

FactChecker · Sep 3, 2018

To understand why a modern language like Python would be interpreted, remember that it can be used as a pre-written program but it can also be used where the code might change as it runs. Your Python program can build up a line of code as part of the program logic and then execute it. A lot of interpreted languages are used as the user interface. The user types in commands that get executed and when he sees the results, he decides what he wants the next step to be. As time goes on, the user can automate his own decisions and make a program with larger and larger automated steps and less inputs from him.

But because the interpreted language can not look ahead, there is less optimization and pre-compiling that it can do. So they are relatively slow.

opus · Sep 3, 2018

fresh_42 said:

Basically yes. But it does a bit more. In former times this was one or two individual additional steps, namely link to and bind all needed library codes. Your source code usually uses standard tools as random generators, floating point arithmetic, formatting tools, date converter etc. All these have to be included into the executable file. Nowadays it is usually done in the background if you push the compile button in your IDE.

Interpreters are slow. They always read all steps at a time, execute them and read the next. Also you must think for them, i.e. you may not forget those things I told about in the previous step. Interpreters are stupid. A compiler does this for you and optimizes the machine code. You can also put in measurements to catch exceptions as divisions by zero, read errors and so on. An interpreter crashes.

Imagine I would tell you how to proof a certain number is divisible by 9.
If you were an interpreter the code goes:
Set save to zero. Do while exists read digit. Add to save. End while. Do while save greater eight. Subtract nine from save. End while. If save equals zero print yes else print no.
Now for the compiler code:
Read number. If number modulo 9 equals zero print yes else print no.
This is a bit over simplified, but it is what's essentially going on. The interpreter has also a modulo function, but it loads it on demand, whereas the compiler integrates it in the executable, which saves a lot of time. Reads from the disk take longer than reads from memory by several orders magnitude. So you can choose between time (interpreter) or memory (compiler).

So with a compiler, I have my source code, it get's compiled into machine code, and stored into a file.exe. In this file.exe, I have the option to "fine tune" the code, like with your divisibility test?

Say I have 5 lines of statements in my source code.
If I'm using an interpreter, and I want to run the program, are you saying that it reads line 1 -> translates line 1 into machine code -> executes line 1 -> reads line 2 -> translates line 2 into machine code -> executes line 2 -> reads line 3... whereas a compiler translates all 5 lines at once, then saves the program to an executable file, then I would then run that file. And once that file has been translated and executed, it doesn't need to be translated again?

opus · Sep 3, 2018

jtbell said:

The compiler saves the program's machine code in an "executable file". Then you tell your operating system to "execute" that file. The next time you run the program, it doesn't need to be compiled again. You simply execute the saved executable file.

[added] As fresh_42 noted while I was writing the previous paragraph, usually there's another step after compilation but before execution, namely "linking" your compiled code with previously-compiled code for standard operations like math functions and input/output. The details of how you do all this depends on your operating system and programming environment.

As an example, using the command line in a Unix-like OS, you might start by using a text editor to create a source program in C, in a file named myprog.c. Then you might compile it (but without linking it yet) using a command like

cc myprog.c -o myprog.o

which tells the compiler (cc) to store the compiled machine code in the file myprog.o. This is often called an "object file" (hence the ".o" in the file name). Then to link the file to the standard C library, you might use a command like

ld myprog.o cstdlib.o -o myprog

which produces an executable file named "myprog". (Unix geeks, please note that I'm doing this from memory and the exact name of the library is probably different! I just want to convey the flavor of the process.)

In Unix-like systems at least, you can usually combine the two steps into a single command like

cc myprog.c -o myprog

in which "cc" does both the compilation and linking steps, knows to link to the C standard library without your having to specify it explicitly, and doesn't bother to save an object file (myprog.o), only the final executable "myprog".

Ok that "linking" rings a bell. I remember it being talked about it class but it wasn't in my book. I'm really weak on the jargon, but I do believe that makes sense.
So prior to discussing high-level language, assembly language was discussed. Does the translation from high-level to machine-level go through assembly, or does it skip it? Or is assembly the actual compiler/interpreter?

phinds · Sep 3, 2018

opus said:

\
If I'm using an interpreter, and I want to run the program, are you saying that it reads line 1 -> translates line 1 into machine code -> executes line 1 -> reads line 2 -> translates line 2 into machine code -> executes line 2 -> reads line 3... whereas a compiler translates all 5 lines at once, then saves the program to an executable file, then I would then run that file. And once that file has been translated and executed, it doesn't need to be translated again?

Yes, and where this REALLY matters is when, for example, the 5 lines are in a loop that gets executed 10,000 times. Interpreter interprets each line 10,000 times, compiler just once each.

https://www.physicsforums.com/insights/computer-language-primer-part-1/

opus · Sep 3, 2018

Tom.G said:

An Interpreter contains all the code to execute every instruction in the language, add, multiply, search in a string, invert a matrix, etc. When you write and execute a program in the high level language, the interpreter looks at each high level instruction then does a CALL to its own built-in code to do the processing. These built-in called routines then have to determine whether a variable is an Integer, Float, Complex, String, Matrix, etc

So if an interpreter contains this code, then am I to assume that the compiler does not? And this is when we add the extra code into the file by linking?

phinds · Sep 3, 2018

opus said:

So if an interpreter contains this code, then am I to assume that the compiler does not? And this is when we add the extra code into the file by linking?

Such source code is the INPUT to the compiler but it does not exist at run-time because the compiler has generated the executable code and doesn't need the source code any longer. See the link in my previous post.

FactChecker · Sep 3, 2018

Assembly code is a standard language for a particular computer. By converting any language to assembly code, an optimizer, linker, etc. can work on the assembly code without worrying what language it all started in. The linking process allows you to take advantage of huge utility libraries that handle details that you do not need to know about. There will be standard libraries for controlling all the connected hardware, communications, math functions, etc. Those get linked into your program.

opus · Sep 3, 2018

FactChecker said:

To understand why a modern language like Python would be interpreted, remember that it can be used as a pre-written program but it can also be used where the code might change as it runs. Your Python program can build up a line of code as part of the program logic and then execute it. A lot of interpreted languages are used as the user interface. The user types in commands that get executed and when he sees the results, he decides what he wants the next step to be. As time goes on, the user can automate his own decisions and make a program with larger and larger automated steps and less inputs from him.

But because the interpreted language can not look ahead, there is less optimization and pre-compiling that it can do. So they are relatively slow.

So then an Python, which get's interpreted not compiled, is more "edit as you go" friendly? I know with Sublime Text and Python, I can just run a program and see if it works (although I've only done print statements).
With a compiler, can I not run the program as I go to see if it works? That is, to see if it works, I have to save as a file.exe, then run it?

phinds · Sep 3, 2018

opus said:

So then an Python, which get's interpreted not compiled, is more "edit as you go" friendly? I know with Sublime Text and Python, I can just run a program and see if it works (although I've only done print statements).
With a compiler, can I not run the program as I go to see if it works? That is, to see if it works, I have to save as a file.exe, then run it?

Yes. AGAIN, I recommend the link I provided.

jtbell · Sep 3, 2018

opus said:

Does the translation from high-level to machine-level go through assembly, or does it skip it?

It can go either way. Compiling straight to machine code is faster. However, the GNU compilers (maybe others, too) for Unix-like systems compile first to assembly code, which is then assembled ("compiled") to machine language. I suppose this makes it easier to manage compilers for multiple languages (C, C++, Fortran, etc.) for many different machine (chip) architectures. Compilers for different languages can share the same assembler, for a given architecture.

opus · Sep 3, 2018

Thanks for the responses all. Having a hard time responding to all of them properly. I'm going to digest this and follow the link given by phinds and will report back with any questions.
I really appreciate the responses.

FactChecker · Sep 3, 2018

Typically a compiled program is compiled and linked into an executable before it runs. You can use a debugger to step through the execution one step at a time to examine in detail what it is doing.

The lines between a compiled language and an interpreted language are not as clear cut as we may have been implying. An interpreted language with a loop to 100 of a few lines that do not change can interpret it once and loop through it without reinterpreting. But it must be able also to handle code that it can not anticipate. That is why it must always keep access to any libraries that it might possible need. Both approaches are very well developed and minimize their disadvantages as much as possible.

fresh_42 · Sep 4, 2018

opus said:

So with a compiler, I have my source code, it get's compiled into machine code, and stored into a file.exe. In this file.exe, I have the option to "fine tune" the code, like with your divisibility test?

No, that has to be part of your code. But an interpreter divides if the line says it. So if it cannot be divided, it crashes. E.g. in C++ you can catch this crash and decide what to do instead (pre-defined via coding).

Say I have 5 lines of statements in my source code.
If I'm using an interpreter, and I want to run the program, are you saying that it reads line 1 -> translates line 1 into machine code -> executes line 1 -> reads line 2 -> translates line 2 into machine code -> executes line 2 -> reads line 3... whereas a compiler translates all 5 lines at once, then saves the program to an executable file, then I would then run that file. And once that file has been translated and executed, it doesn't need to be translated again?

Yes, although "at once" is a bit very optimistic. A compiler also has to read line by line, but it organizes its memory allocations, libraries and so on by itself. And you can take away the executable file and implement it on another computer. For an interpreter, you will have to provide its environment and install it, too.

fresh_42 · Sep 4, 2018

Funny paradox: A browser is a complied executable in order to interpret HTML code.

DrClaude · Sep 4, 2018

phinds said:

Yes, and where this REALLY matters is when, for example, the 5 lines are in a loop that gets executed 10,000 times. Interpreter interprets each line 10,000 times, compiler just once each.

There is another possibility with interpreted languages, which is to use Just-In-Time (JIT) compilation, where the interpreter will recognise that the code inside the loop will be repeated multiple times, so it will compile that part of code and execute the compiled version.

The PyPy implementation of python uses JIT.

phinds · Sep 4, 2018

DrClaude said:

There is another possibility with interpreted languages, which is to use Just-In-Time (JIT) compilation, where the interpreter will recognise that the code inside the loop will be repeated multiple times, so it will compile that part of code and execute the compiled version.

True, of course. I was using the original, basic (pun intended), definition of interpreter.

newjerseyrunner · Sep 4, 2018

A major advantage of compiled code is that compilation doesn't happen on the user's machine, so there is no reason that the compilation itself has to be fast. If you are both interpreting and running, both have to fast. A compiler can take hours to fully compile a program. Most of what it's doing can not be done in real time. Translating from the language grammar into machine language is easy, it'll take the compiler a minute to do that, but then it can start analyzing the code and because there is nothing waiting for it, it can do really deep analysis and do optimizations that you'd never think of. It'll also do stuff like packing things that run together next to each other to take advantage of CPU-level caching. The last step I usually have before releasing anything is compiling with -O3 --expensive-optimizations and a few other options that take forever to complete.

harborsparrow · Oct 2, 2018

It's approximately like this.

COMPILER:
1. sometime before run time, whole program converted from source code into intermediate language (machine code with missing addresses to subroutines)
2. at run time, the whole program is loaded by a dynamic linker, machine code addresses patched "in place" so can jump to other code it needs
3. at run time, the machine code of the entire pre-translated program executes without needing to pause for more translation or linking

INTERPRETER:
* at run time, the current line of code is converted from source code into intermediate language (machine code with missing addresses to subroutines)
* at run time, the current line of code is loaded (added on to running program) by a dynamic linker, machine code addresses patched "in place" so can jump to other code it needs
* during run time, the above two steps repeat until every line of code that is going to run during this program episode has been translated

So, usually, interpretered languages execute more slowly than compiled ones. However, the speed of modern processors make this increasingly irrelevant. Nevertheless, there are some subtle differences in interpreted and compiled languages, having to do with the fact that interpreted languages don't know everything they need to know ahead of time.

harborsparrow · Oct 2, 2018

DrClaude said:

There is another possibility with interpreted languages, which is to use Just-In-Time (JIT) compilation, where the interpreter will recognise that the code inside the loop will be repeated multiple times, so it will compile that part of code and execute the compiled version.

The PyPy implementation of python uses JIT.

JIT compilation is usually a feature of languages that run in a virtual machine environment. The matter of compiling, linking and loading a program becomes then much more complex, because there is the compiling, linking and loading at two distinct levels (virtual machine, and underlying hardware too). Just pointing out that it's not a simple or straightforward matter and bringing up JIT in this answer program opens more questions than it answers.

FactChecker · Oct 2, 2018

harborsparrow said:

So, usually, interpretered languages execute more slowly than compiled ones. However, the speed of modern processors make this increasingly irrelevant.

This is very dependent on the situation. In my entire career in the aerospace industry, I don't think that there was one time where the hardware limitations did not eventually become the primary constraint. Whereas software always grows, there is a multitude of reasons that the hardware capability may not grow to match.

.Scott · Oct 4, 2018

opus said:

If I'm using an interpreter, and I want to run the program, are you saying that it reads line 1 -> translates line 1 into machine code -> executes line 1 -> reads line 2 -> translates line 2 into machine code -> executes line 2 -> reads line 3... whereas a compiler translates all 5 lines at once, then saves the program to an executable file, then I would then run that file. And once that file has been translated and executed, it doesn't need to be translated again?

There are lots of different ways to implement an interpreter, but what you describe is not the most common.

Let's say you have a line of code: z = zmax*sin(t)

Both the compiler and the interpreter will start by parsing that statement into tokens and then identifying the operations that will be needed. So in this case, we need to:
1) identify an existing variable "t" and an existing function "sin".
2) pass the content of "t" as the parameter to a call to "sin".
3) identify an existing variable "zmax".
4) multiply the content of "zmax" with the value returned by "sin".
5) take the product of that multiplication and store is as the content of a variable "z".

In the cases of an interpreter, once it recognizes what must be done, it simply does it. It does not need to generate any machine language. The interpreter itself is a program that has been compiled into machine language and it is that machine language that is used to execute the operations.

Even in cases where the interpreter optimizes the code before running it, that optimization is usually only for parsing the source into a list of operations (coded in binary) that must be completed. Thus it does not have to reparse the statements in a loop for every iteration of the loop.

In contrast, the compiler needs to perform operations similar to this:
1) generate machine language code that will move the content of variable "t" onto the stack.
2) generate object code that will tell the linker that function "sin" will be needed. During the link phase, the machine language for "sin" will then be copied into executable output.
3) generate machine language code to call the "sin" function. The actual machine code generated is "relocatable", since the exact memory address of the "sin" function will not be known until link time.
4) Generate machine language code that will load the content of "zmax" into a register.
5) Generate machine language code to multiple the content of the register holding the return value of "sin" with the content of the register holding "zmax".
6) Generate machine language code to move the product of the multiplication into the memory address assigned to "z".

The final result of this is a relocatable binary that can be used by the linker to create a full executable.

There are endless variations on this - based on the language elements, the specific implementation os the interpreter or compiler, and on the capabilities of the computer.

Rive · Oct 4, 2018

harborsparrow said:

So, usually, interpretered languages execute more slowly than compiled ones. However, the speed of modern processors make this increasingly irrelevant.

That supposed irrelevance... I wonder why do I have the feeling that the main reason why I'm buying my better and better computers is to satisfy the hunger of the actual newest layer between me and the resources...

FactChecker said:

In my entire career in the aerospace industry, I don't think that there was one time where the hardware limitations did not eventually become the primary constraint.

My career feels like about fighting with the management about hardware resources.

Management: these are the estimated requirements, pick the cheapest controller what fits.
Me: we will need at least 100% reserve both in computing power and memory.
Management: forget it.
Me: ***beeep***

Management, later on: why this is not running, you should have picked something stronger!
Me: ***beeep***

JayS0 · Oct 4, 2018

interpreted code runs more slowly because it is converted to machine code as it runs
but interpreted code is good for proto-typing, when you want to make lots of changes and test quickly

compiled code runs faster because it has all been converted to machine code already, and optimised for faster running
but the compilation step takes extra time beforehand, for large applications the 'build' process can take hours.

if i were to write software for a space probe or safety-critical system i would compile it because compilers catch all syntax errors.
i wouldn't want the software to crash with a syntax error while it is on the surface of Mars 40 million miles away from repair engineer.

FactChecker · Oct 4, 2018

Rive said:

My career feels like about fighting with the management about hardware resources.

Management: these are the estimated requirements, pick the cheapest controller what fits.
Me: we will need at least 100% reserve both in computing power and memory.
Management: forget it.
Me: ***beeep***

Management, later on: why this is not running, you should have picked something stronger!
Me: ***beeep***

Or how about
Management: We have budget for 5 programmers on this. We can buy new hardware, but then we will only have budget for 4 programmers and will have to lay off one. Who should it be?
Me: ----- Ok. We'll try to make it work with the hardware we have.

.Scott · Oct 4, 2018

JayS0 said:

interpreted code runs more slowly because it is converted to machine code as it runs
but interpreted code is good for proto-typing, when you want to make lots of changes and test quickly

As I said earlier, it is not common for interpreters to convert the code to machine language. Much more common is simply for the interpreter to parse the code and then perform the operations it has discovered.

Another method is to fully or partially parse the code as it is entered and store that parsed result as source. Then you need to re-encode the parsed result to display the source as the developer expects it. This strategy is used in Forth and the early Basic interpreters.

If an interpreter is actually compiling the code into machine language just before running it, should it really be called an interpreter?

phinds · Oct 4, 2018

JayS0 said:

interpreted code runs more slowly because it is converted to machine code as it runs ...

No, as was just explained by Scott in a previous post, it is NOT converted to machine code, it is "executed" by the interpreter.

EDIT: oops. I see I'm late w/ my response.

Interpreter vs Compiler: What Are the Differences and Benefits?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect