How did early compilers overcome the need for a compiler to compile itself?

  • Thread starter Thread starter bhobba
  • Start date Start date
AI Thread Summary
The discussion centers on the concept of bootstrapping in compiler design, particularly addressing how to compile a C compiler when a C++ compiler is needed. Bootstrapping involves creating a simple assembler, often using a minimalistic language like Forth, to write the initial code. This assembler can then be used to compile a basic C compiler, such as TinyC, which is written in C. The process requires rewriting code in the simple assembler and then recompiling to produce a functional C compiler.When new CPUs are introduced, cross-compilers are typically employed, allowing for code to be tested before the CPU is fully operational. The discussion highlights the historical context of compiler development, noting that early compilers like cfront converted C++ to C, which was then compiled to machine code. Modern compilers often generate machine code directly, bypassing the assembler step. The conversation illustrates the evolution of bootstrapping techniques and the various methods used throughout computing history to develop compilers and assemblers.
Messages
10,902
Reaction score
3,782
This came up as a question on another forum - how can you compile the code of a C compiler when you need a C++ compiler to do it? How was the first C compiler made? How is it done when a new CPU comes out?

To cut a long story short, it’s done by bootstrapping. We now have more sophisticated tools, so I will first describe how it was done in the early days. It was not exactly how I am going to describe it. It used the old language BCPL, but the principles are the same. First, you write an assembler in direct bytecode. It is a simple assembler - nothing fancy at all. The easiest to write is what I call a Forth Assembler.

Forth would have to be the easiest language there is. You simply read a string of words and numbers separated by whitespace (ie spaces). If it is a word, you look up a dictionary to find the address of the associated subroutine. If it is a number, push the number onto the Forth stack. If it can't do either, create an error. The executable is so easy it can be put on the boot sector of a USB disk. It can contain a routine to read the rest of the disk into memory, execute the code, and save the code you write. Then you have a word for each computer instruction, so you have a simple assembler. You could, if you wanted, create a whole Forth-based sophisticated operating system. But most stuff these days is written in C++, so the first thing to try is to get a C compiler working.

Added Later: The above is a bit sketchy. I have located a site giving more detail:
https://niedzejkob.p4.team/bootstrap/

Probably the simplest C is TinyC which is written in C. To see a simplistic version of its code, search for KartikTalwar/3095780. I originally gave the code, but IMHO it distracted more than helped.

The full vesion does some pretty cool things:
https://nova.disfarm.unimi.it/manual/plugins/tcc-doc.htm

There is a PDF on exactly how it works (also see the book - A Retargetable C Compiler Design and Implementation):
https://www.researchgate.net/publication/2810190_A_retargetable_compiler_for_ANSI_C

The hard part is you have to rewrite that code in your simple Forth Assembler. It will be no easy task, but a determined person can do it with perseverance and the information above.

Then you use your FORTH C compiler to compile the TinyC code. But of course, the executable is the crappy executable your Forth C compiler produces. To fix that, you recompile it to produce a nice TinyC executable. This process is called bootstrapping a compiler.

Today we have much better tools, such as LLVM Lite, which allows a compiler to be written in Python. You write the compiler in Python and LLVM. Then you write the compiler in that language and bootstrap it as before. The Linux kernel is written in C. You could install the Linux kernel and a full Linux operating system.

These days the best C compilers (in fact, the best compilers in general) are written in C++. So if you want the best compiler, you need a C++ to C converter (EMMTRX makes one). You take the compiler you want, written in C++, for C++; translate it to C, then compile it. The executable is used to recompile the C++ compiler to get the best C++ executable. Bootstrapping again.

Someone developed a very efficient and small (114K) version of Python written in C++ instead of C. This version of Python was very easy to interface to programs written in C++. Since it is so small, instead of having the C programs as DLLs (you can if you want), it created a special version of the Python virtual machine with the C++ programs linked in. So what was done was to write a very efficient version of the python interpreter integrated with C++ in C++ and a Python preprocessor. It’s called Python++. They then used it to write an even better version of Python called TPython++:
https://gitlab.com/hartsantler/tpythonpp

Again bootstrapping at work.

I hope this has shown just how powerful the bootstrapping concept is.

Thanks
Bill
 
Last edited:
  • Like
Likes Filip Larsen, Mark44, FactChecker and 1 other person
Technology news on Phys.org
bhobba said:
how can you compile the code of a C compiler when you need a C++ compiler to do it?
A C compiler need not be written in C++. I worked for somebody who wrote one by hand in assembler. It had to be able to run in 64k of memory, and the low level language allowed some interesting space saving tricks.

bhobba said:
How is it done when a new CPU comes out?
Usually a cross compiler is used, which can have tested code ready even before the new CPU prototype exists. Most of the compiler code already exists and is tested, and it just needs to be altered for the new code set. The bootstrap process is at some point necessary to solve the chicken and egg problem.

More to the point, at some point the first assembler had to be written without aid of being able to write it in assembly code.
 
  • Like
Likes Vanadium 50, berkeman, bhobba and 1 other person
Halc said:
A C compiler need not be written in C++.

Of course. I described a process where it was written in my quick and dirty Forth Assembler. Then bootstrapped to Tinyc.

Halc said:
More to the point, at some point the first assembler had to be written without aid of being able to write it in assembly code.

You could write the quick and dirty Forth assembler in bytecode. That would in itself involve some steps. For example, you need some bytecode to store and load an executable and a simple editor to modify the bytecode.

Thanks
Bill
 
If you really want to talk about bootstrapping. The first tool is the assembler. Compilers would spit out assembler code that would be processed by the assembler to machine code. A linker would take that machine code and construct a machine image that actually got loaded.

In the case of C++ in the early days, there was the "compiler" cfront which converted C++ code to C code. The C compiler would do its magic to make assembler then to machine code for the linking process.

Newer versions of the compilers generate machine code directly skipping the assembler step.

I'm sure we can make these steps even more precise and with how its done across various platforms...

Here's a discusion on the technique:

https://www.geeksforgeeks.org/bootstrapping-in-compiler-design/
 
  • Like
Likes bhobba
jedishrfu said:
Newer versions of the compilers generate machine code directly skipping the assembler step.
Well...sort of, They don't write out assembly, but they do "think it".

As part of the history of bootstraps, there are also cross-assemblers (in analogy to cross-compilers). Back in the day, you could buy an assembler that took 8080 assembler input, and produced 8086 output. However, you were (obviously) restricted to 8-bit operations and 64k addressing.
 
  • Like
Likes bhobba
In 1967 I worked on a project with the assembler run on a different machine (GE 225) than the target machine (GE 4020). We assembled and printed source listings weekly. Developers did machine code patches the rest of the week.

For the next cycle, the week's patches were hand translated to assembler.

The first version of the assembler source was generated by a FORTRAN compiler on yet another machine (GE 625) and hand translated from 625 assembler to 225 assembler. Our friend and mentor, @jedishrfu also worked with that same GE 625.

The point is that any and all kind of bootstrapping you can imagine was used somewhere.
 
  • Informative
Likes bhobba
Back
Top