# Difference between interpreted and compiled files...

Summary:
Difference between interpreted and compiled files...
Hello,

I am sort of clear on the difference between compiled and interpreted computer languages.

Compilation
A program (source code), for example written in C, is essentially text that is converted by a compiler (software program) into binary machine code (1s and 0s) for the microprocessor to execute. Different computers have different operating systems and microprocessor architectures. This means that, while the source code is the same, the compiler and the compiler's output (binary code) are machine dependent. If a programmer wants to distribute its C code, he/she must first either compile the code into the binary code to share it. The generated .exe file can only run on the same type of machine that the programmer has. If the recipient machine is different, the .exe file will not work. The C programmer could also share the source code directly with the recipient who must then download the appropriate compiler program on its machine to compile the C source code into a binary code that works on his/her machine.

Interpretation
Interpreted Language (Python): The fist step consists of converting the source code into intermediary code (bytecode). I think the bytecode is computer hardware independent. Bytecode must then be converted by a Python virtual machine (software) into the actual binary machine code. I think that different machines (with different operating system and processor) would require different types of VMs to perform the final conversion. Is that correct?
So the recipient of a file that was written with an interpreted language like Python still needs to download on their machine the VM/

In both the compilation and interpretation cases, the recipient of the file needs to download piece of software ( either a VM or a compiler) to use the file. And both the VM and the compiler are machine dependent, I believe. So how is Python more portable if the user still need to have a machine specific software, the VM? It seems to be the same issue that we have with a machine specific compiler...

Thanks!!

## Answers and Replies

hmmm27
Gold Member
Ignore "VM" and "bytecode" for the time being. Also "download", and different machine types.

First off, C and Python are languages, not compilers nor interpreters.

There are available for both languages, both compilers and interpreters. These were produced by people who aren't the programmers who write the source code (ie: you and me), nor are they (generally) the people who made up the languages in the first place.

- a compiler program takes a source-code file and produces an object code file - a program - which is then run independently of both the compiler and the source code.

- an interpreter program takes the source-code and runs it directly ; it's producing object code and feeding that to the computer on the fly.

PeterDonis
Mentor
A program (source code), for example written in C, is essentially text that is converted by a compiler (software program) into binary machine code (1s and 0s) for the microprocessor to execute.
Yes.

The fist step consists of converting the source code into intermediary code (bytecode). I think the bytecode is computer hardware independent.
Yes. Note that this step is also called "compiling" the source code; the Python interpreter program contains a compiler that compiles Python source code into Python byte code.

Bytecode must then be converted by a Python virtual machine (software) into the actual binary machine code.
No. The Python interpreter, which is the program that contains the Python virtual machine, doesn't convert byte code into machine code. It just executes the byte code. In other words, the byte code is "machine" code--it's just machine code for the Python virtual machine, instead of for some physical machine.

sysprog, fog37, pbuk and 1 other person
PeterDonis
Mentor
a compiler takes a source-code file and produces an object code file, which is then run independently of both the compiler and the source code.
This describes how a C compiler works, but, as I noted in my response to the OP just now, the Python program also contains a compiler; it just compiles to Python byte code instead of object code (or more precisely, to Python byte code, which is "object code" for the Python virtual machine, instead of object code for some particular physical machine).

an interpreter takes the source-code and runs it directly ; it's producing object code and feeding it to the computer on the fly.
As I noted in my response to the OP just now, this is not what the Python interpreter does. It compiles source code to Python byte code on the fly (as you enter it at the interactive prompt), and then executes the byte code. It does not further convert byte code into machine code. (Of course the Python interpreter itself is machine code, but that machine code is not derived from your Python source code and knows nothing about it.)

sysprog, fog37 and pbuk
Yes.

Yes. Note that this step is also called "compiling" the source code; the Python interpreter program contains a compiler that compiles Python source code into Python byte code.

No. The Python interpreter, which is the program that contains the Python virtual machine, doesn't convert byte code into machine code. It just executes the byte code. In other words, the byte code is "machine" code--it's just machine code for the Python virtual machine, instead of for some physical machine.
And is a Python virtual machine independent of the computer architecture? When we download Python we make sure we download the correct version for our operating system.
For example, if we sent the python source code to a Linux, Mac OS, Windows users, would each user need a different and platform specific VM for the bytecode to run? I thought that the idea of portability, which applies to Python, was the same as platform independence but maybe I am fully clear on what platform independence truly means. But it does not sound very portable if every user needs to worry to download a VM specific to his system.

PeterDonis
Mentor
is a Python virtual machine independent of the computer architecture?
Yes.

When we download Python we make sure we download the correct version for our operating system.
The program that implements the virtual machine will have different machine code for different operating systems or different architectures. But all of those programs implement the same virtual machine. You can run the same Python byte code on all of them.

it does not sound very portable if every user needs to worry to download a VM specific to his system.
No program is "portable" by this criterion; different operating systems and different architectures always require different machine code.

sysprog
I think the key focus is what a virtual machine language is by definition. Some VMs have just in time compilers or hotspot compilers or even ahead of time compilers that attempt to optimize execution of code by literally translating the language into machine code at some point to or during the execution. But keep on mind that some languages don't even have "byte codes" or abstract VMs in the modern sense... e.g. shell scripts or basic (eww... well, vb.net),

If we consider Java, for example, different platforms will required different and machine specific JVM.
I understand that a java program is converted to bytecode which is handled by a JVM as if it was an actual physical OS/CPU...but is in reality an abstract OS/CPU. Of course, the JVM eventually must generate a machine binary code since the CPU will have to do the job...

The bytecode is a neutral file and can be processed in exactly the same way by a Linux specific JVM, a microsoft JVM, a mac JVM....as PeterDonis mentioned, the VM is OS/CPU dependent. The bytecode may be neutral and platform independent but the VM is not....

A compiler, like the VM, is OS/CPU dependent. So I don't see the advantage and the portability benefit deriving from using a VM if different platforms still need to run different and specific VM.
In the case of a C source code, we could have OS/CPU specific C compilers on different machines and be able to run the same C source code without a problem. That seems exactly what happens with java.

The only difference seems that in the case of compilation, all the source code is compiled at once while in the case of java the bytecode is executed JIT, on the fly.

Perfect independence and portability would be writing a source code in java and be able to run it on any platform OS/CPU) without a VM because a VM would be platform dependent....

Last edited:
PeterDonis
Mentor
If we consider Java, for example, different platforms will required different and machine specific JVM.
The program that runs Java byte code is, yes, just as the program that runs Python byte code. But that's only because different platforms require different machine code to accomplish the same semantic task.

Of course, the JVM eventually must generate a machine binary code
No, it doesn't. The Java virtual machine, like the Python virtual machine, does not translate byte code into machine code. It just executes the byte code. There is of course machine code in the Java or Python program that executes the byte code, but none of that machine code is a translation of the byte code that is being executed.

I don't see the advantage and the portability benefit deriving from using a VM
The VM is one program, that gets compiled once for each target platform. Then the same byte code for each individual application can be run on all the target platforms.

By contrast, with a compiled language like C, a separate program has to be compiled for each target platform for each individual application. So there's a lot more compilation involved, and there are a lot more platform-specific quirks and hacks and oddities that have to be done, since they have to be done for each individual application for each platform, instead of just for one VM for each platform.

In the case of a C source code, we could have OS/CPU specific C compilers on different machines and be able to run the same C source code without a problem. That seems exactly what happens with java.
You are drastically underestimating the difficulty of running "the same C source code" on different platform. Yes, back when C was first invented, it was a big improvement over assembly language because, compared to that, C back then was much more portable; there weren't that many platforms and the language was still pretty simple, so you could actually come close to having the exact same source code compile without errors on every target platform.

That is light years away from being the case now. Look at the source code for any non-trivial C program and, if it actually can be compiled on multiple platforms (which many C programs cannot be, they were written for one platform and have only been used on that platform), you will find that it's full of macros and preprocessor directives and other quirks and hacks that change what source code gets compiled depending on what platform you're compiling for. And then there's the huge mess of autoconf/autotools, because even the C compiler and preprocessor can't detect all the platform-specific information that needs to be detected to customize the source code for each platform, so you have another layer of tools that generate files that help the compiler and preprocessor to do that. I could go on and on.

For applications programmed in C, every individual application has to go through the whole rigmarole above. But for applications programmed in Java or Python, only one program per platform, the Java or Python VM program, has to. The rest of the programs don't; they're the same source code, the same byte code, for every platform. That's a huge reduction in complexity, and also in sources of bugs.

Perfect independence and portability
As you define it, does not exist, and probably never will exist. The question is not whether or not you can have "perfect" portability, but where you want the inevitable platform-specific quirks and hacks to be. Do you want them in every single application, or in just the VM programs?

sysprog
As @PeterDonis has ably explained, making a portable C compiler is fraught with difficulty that is related to the portabilty (it's of course exceedingly difficult to write a C compiler in the first place) ##-## for an overview of an early C compiler designed to be portable, you can review https://en.wikipedia.org/wiki/Portable_C_Compiler.

As @PeterDonis has ably explained, making a portable C compiler is fraught with difficulty that is related to the portabilty (it's of course exceedingly difficult to write a C compiler in the first place) ##-## for an overview of an early C compiler designed to be portable, you can review https://en.wikipedia.org/wiki/Portable_C_Compiler.
I'll slightly disagree here, and argue in the light of the original question... The C (and C++) standards have come a very long way in the last two decades, both very much defining a compute machine in a very abstract, but well defined and logically consistent manner. So to me, the implementation of a compiler that meets the standard properly is no different than implementing a virtual machine for any given programming language interpreter. If the design is well specified, there is still an onus to implement the standard error free, whether compiler or VM. Both can be done incorrectly.

PeterDonis
Mentor
The C (and C++) standards have come a very long way in the last two decades, both very much defining a compute machine in a very abstract, but well defined and logically consistent manner.
While I agree that the standards have advanced a lot, I'm not sure how much that has actually translated into fewer platform-specific quirks and hacks in the source code of actual projects. I'm fairly familiar with the source code for the CPython interpreter, for example, and I still see plenty of #ifdefs in there that pick one block of source code for POSIX systems and a different one for Windows, for example.

The fundamental problem is that the different platforms are different platforms, built on different design principles and with no pretense at compatibility or commonality with any of the others. Any application that is going to try to run on multiple platforms has to deal with that at some level. The question is simply, as I put it before, whether you want to have to deal with that in every application, or only in the small number of VM programs.

sysprog
While I agree that the standards have advanced a lot, I'm not sure how much that has actually translated into fewer platform-specific quirks and hacks in the source code of actual projects. I'm fairly familiar with the source code for the CPython interpreter, for example, and I still see plenty of #ifdefs in there that pick one block of source code for POSIX systems and a different one for Windows, for example.

The fundamental problem is that the different platforms are different platforms, built on different design principles and with no pretense at compatibility or commonality with any of the others. Any application that is going to try to run on multiple platforms has to deal with that at some level. The question is simply, as I put it before, whether you want to have to deal with that in every application, or only in the small number of VM programs.
That's a fair point. But modern compilers basically first compile to an intermediate (aka virtual machine) abstraction representation, and then have a back end machine language compiler and optimizer. Much of the ifdef stuff is either based on architecture word size (largely solved), basic c library support (getting better, but you have libc, windows, mac, and bsd), or real low level OS stuff (which normally should be handled by the c lib).

I'll slightly disagree here, and argue in the light of the original question... The C (and C++) standards have come a very long way in the last two decades, both very much defining a compute machine in a very abstract, but well defined and logically consistent manner. So to me, the implementation of a compiler that meets the standard properly is no different than implementing a virtual machine for any given programming language interpreter. If the design is well specified, there is still an onus to implement the standard error free, whether compiler or VM. Both can be done incorrectly.
I wasn't saying that it's very difficult to e.g. install the GNU C++ compiler on an OpenBSD system, and the Visual Studio C++ compiler on a Windows computer, but that doesn't make for a portable C++ ##-## the library functions, error handling, etc. will be different. If you look at the wikipedia article on portable C that I referenced, you'll see some of the challenging difficulties.

Mark44
Mentor
No, it doesn't. The Java virtual machine, like the Python virtual machine, does not translate byte code into machine code. It just executes the byte code.

There is of course machine code in the Java or Python program that executes the byte code, but none of that machine code is a translation of the byte code that is being executed.
I'm trying to parse the sentences above. I get that the Python virtual machine knows what to do with, say BINARY_MULTIPLY (a Python bytecode instruction), but it seems to me that at some point this byte code has to be translated into a multiplication of two numbers with the result stored somewhere, perhaps as something like mul ax, bx (in Intel assembly).
Am I missing something of what you wrote?

sysprog
Jonathan Scott
Gold Member
I'm trying to parse the sentences above. I get that the Python virtual machine knows what to do with, say BINARY_MULTIPLY (a Python bytecode instruction), but it seems to me that at some point this byte code has to be translated into a multiplication of two numbers with the result stored somewhere, perhaps as something like mul ax, bx (in Intel assembly).
Am I missing something of what you wrote?
The virtual machine program keeps the virtual machine storage and registers in its own variables, and it interprets the byte code dynamically. For example, if it sees a multiplication operation it loads the specified source operands into virtual machine variables, multiplies them and stores the result into the specified target operand. When the virtual machine program is compiled, that multiplication operation will normally be converted to native machine code.

fog37
PeterDonis
Mentor
I get that the Python virtual machine knows what to do with, say BINARY_MULTIPLY (a Python bytecode instruction), but it seems to me that at some point this byte code has to be translated into a multiplication of two numbers
When the Python virtual machine sees the byte code BINARY_MULTIPLY, it calls a function inside the interpreter with particular arguments. In the Python virtual machine, the arguments are on the virtual machine's stack, so the function that gets called for the BINARY_MULTIPLY byte code pops those arguments off the stack. Those arguments are Python objects, so the function that gets called then has to figure out what kind of objects they are and what "multiply" means for those objects (which might involve extracting methods from object instances), and then doing whatever "multiply" means (which might involve calling methods it's extracted from object instances). Yes, if the two objects are Python integers, say, then eventually there will be an integer multiplication done inside the function; but that integer multiplication is not constructed on the fly in machine code based on "translating" the BINARY_MULTIPLY byte code. It's code that's already there on one particular code path of the function inside the interpreter that gets called for a BINARY_MULTIPLY byte code. And if the objects passed to the function are arbitrary Python objects that implement multiplication methods, what BINARY_MULTIPLY means for them might not be anything like an ordinary multiplication of two numbers.

sysprog
Mark44
Mentor
Yes, if the two objects are Python integers, say, then eventually there will be an integer multiplication done inside the function; but that integer multiplication is not constructed on the fly in machine code based on "translating" the BINARY_MULTIPLY byte code. It's code that's already there on one particular code path of the function inside the interpreter that gets called for a BINARY_MULTIPLY byte code.
Limiting the discussion at the moment to relatively simple objects like integers and floats, am I correct in thinking that inside the interpreter there is actual machine code for doing the multiplication, either integer multiplication or floating point? I.e., the code has to drop down to the machine level at some point, based on the relevant architecture and OS.

Jonathan Scott
Gold Member
An interpreter program does not normally need to know anything about the machine code of the machine on which it runs. It simply uses the operations in the language in which it is written to execute the processes described by the interpreted code, in the order determined by the control flow of the interpreted program. If the program loops, the interpreter ends up interpreting the same code again (using new values of the relevant variables).

PeterDonis
Mentor
Limiting the discussion at the moment to relatively simple objects like integers and floats, am I correct in thinking that inside the interpreter there is actual machine code for doing the multiplication, either integer multiplication or floating point?
In the CPython interpreter, if we are talking about integers that fit within the bit size of the platform/architecture, there will be a line of C code somewhere inside the appropriate code path of the BINARY_MULTIPLY function that looks like result = i1 * i2. This line will get compiled to the corresponding machine code for the appropriate platform and architecture in the compiled CPython interpreter program. But it's just code inside a function that's static in the interpreter.

Note that CPython int objects are "bigints", i.e., they are not limited to the bit size of the underlying platform. So even considering just integer objects, the BINARY_MULTIPLY function has to deal with multiplying integers of arbitrary size, which means it's not always as simple as the code path described above.

PeterDonis
Mentor
When the virtual machine program is compiled, that multiplication operation will normally be converted to native machine code.
That depends on what you mean by "converted". I described in post #20 what happens inside the CPython interpreter program for a BINARY_MULTIPLY. The C code inside the appropriate function in the interpreter will have a multiplication operation that gets compiled to the appropriate machine code for the target platform and architecture, yes. But that "conversion" is only done once, when the C code for the interpreter is compiled. There is no "conversion" done when a BINARY_MULTIPLY byte code is executed by the interpreter; the interpreter just calls the function with the given arguments.

Jonathan Scott
Gold Member
But that "conversion" is only done once, when the C code for the interpreter is compiled.
Yes. Regardless of whether the interpreter goes via a virtual machine byte code interpreter (as previously mentioned) or just interprets source directly, operations such as multiplication are converted into machine code only when the interpreter is compiled.

Of course, that applies when the interpreter itself is written in a compiled language such as C. In general, there is no reason why the interpreter itself cannot be written in an interpreted language, and so on. (I have previously written a command language interpreter in REXX, which is itself normally an interpreted language, although it can be compiled using IBM's mainframe REXX compiler, which is one of the products supported by my team).

PeterDonis
Mentor
there is no reason why the interpreter itself cannot be written in an interpreted language
Yes, you can of course have multiple layers of interpreters before you actually get to something that is either implemented directly in machine code (or assembly language), or is in a language that gets compiled to machine code.

Ok. Thanks everyone. I think I am getting it now. Let me see:

In the case of hybrid interpretation (like in Python and Java) the source code (essentially text) is passed to the a program called interpreter which comprises both an internal compiler and a virtual machine VM.
The internal compiler converts source code into bytecode which is not machine code but something that resembles assembly code, I dare to say. Bytecode is then processed by the VM as if it was a physical machine (it is an abstract machine, just software). No new machine code file is generated after the bytecode but the VM is itself compiled (?) and the resulting binary information is run by the CPU...Clearly the CPU and OS are always working throughout the entire process so even when an application is running and mediated by the OS, binary information is continuously going and coming from the CPU...

The fact that bytecode is platform independent makes java and python portable language compared to C. Certainly, users need to have a platform dependent VM on their machines to run the bytecode. But distributing bytecode is less "problematic" than distributing a C source code and have the end user compile it at the receiving end with his platform specific C compiler. In the case of C and other compiled languages though, the developers distributed various machine code versions of the same source code and it is up to the user to select and download the correct one based on their OS/CPU...

I know it is possible to convert a Python source code into an executable .exe for Windows. I guess that can be done after the VM has processed the python bytecode....

Additionally, I read that all languages are truly independent of the compilation and interpretation. The same language, pick Python, can have different implementations and in theory some implementations can be compiled, some can be interpreted and some hybrid interpreted. For Python, we have Cpython, Jpython, Iphython, etc. as different implementations and I guess they are all hybrid interpreted...

PeterDonis
Mentor
The internal compiler converts source code into bytecode which is not machine code but something that resembles assembly code, I dare to say.
Kind of, yes. The byte codes can be thought of as machine opcodes for the virtual machine that the VM/interpreter program realizes.

Bytecode is then processed by the VM as if it was a physical machine
Yes.

No new machine code file is generated after the bytecode
Yes.

the VM is itself compiled (?)
The VM/interpreter is a program that was already compiled, before it was first installed on the physical computer that all this is happening on. From the standpoint of that program and the operating system, the actual "program" (the Python source code, for example) is just data that is being operated on by VM/interpreter program. It's no different from having the compiled Microsoft Excel program, say, on your computer, and using it to open an Excel spreadsheet. The spreadsheet might contain "code" (for example, macros), but to the Excel program and the OS it's just data that the Excel program is operating on.

Clearly the CPU and OS are always working throughout the entire process so even when an application is running and mediated by the OS, binary information is continuously going and coming from the CPU...
Yes, of course, that's true when any program is running.

I know it is possible to convert a Python source code into an executable .exe for Windows.
There are various ways of doing this for various platforms/operating systems. All of them work basically the same: you have a small stub program for each platform/OS whose job is to load the interpreter (which is shipped as a library in this application--a .dll file on Windows or an .so file on Linux or Mac OS X) and tell it to start running the appropriate source code or byte code (usually byte code, because the process that creates the executable can also pre-compile the source code into byte code). The "executable" file is then really just the stub program, all the work is being done in the interpreter library. (It is also possible to wrap all of this--the stub program, the library, and the byte code--into a single "executable" file, which then extracts the various parts from itself as it runs, something like the way self-extracting ZIP archives work--in fact the ZIP archive format was designed to make things like this possible so it is often used by the tools that make these executables out of Python source code.)

So none of this actually changes the way the Python code is run: it's still byte code being processed by a VM/interpreter. All that's changed is how it's packaged for user convenience.

fog37