- #1
- 10,809
- 3,686
This came up as a question on another forum - how can you compile the code of a C compiler when you need a C++ compiler to do it? How was the first C compiler made? How is it done when a new CPU comes out?
To cut a long story short, it’s done by bootstrapping. We now have more sophisticated tools, so I will first describe how it was done in the early days. It was not exactly how I am going to describe it. It used the old language BCPL, but the principles are the same. First, you write an assembler in direct bytecode. It is a simple assembler - nothing fancy at all. The easiest to write is what I call a Forth Assembler.
Forth would have to be the easiest language there is. You simply read a string of words and numbers separated by whitespace (ie spaces). If it is a word, you look up a dictionary to find the address of the associated subroutine. If it is a number, push the number onto the Forth stack. If it can't do either, create an error. The executable is so easy it can be put on the boot sector of a USB disk. It can contain a routine to read the rest of the disk into memory, execute the code, and save the code you write. Then you have a word for each computer instruction, so you have a simple assembler. You could, if you wanted, create a whole Forth-based sophisticated operating system. But most stuff these days is written in C++, so the first thing to try is to get a C compiler working.
Added Later: The above is a bit sketchy. I have located a site giving more detail:
https://niedzejkob.p4.team/bootstrap/
Probably the simplest C is TinyC which is written in C. To see a simplistic version of its code, search for KartikTalwar/3095780. I originally gave the code, but IMHO it distracted more than helped.
The full vesion does some pretty cool things:
https://nova.disfarm.unimi.it/manual/plugins/tcc-doc.htm
There is a PDF on exactly how it works (also see the book - A Retargetable C Compiler Design and Implementation):
https://www.researchgate.net/publication/2810190_A_retargetable_compiler_for_ANSI_C
The hard part is you have to rewrite that code in your simple Forth Assembler. It will be no easy task, but a determined person can do it with perseverance and the information above.
Then you use your FORTH C compiler to compile the TinyC code. But of course, the executable is the crappy executable your Forth C compiler produces. To fix that, you recompile it to produce a nice TinyC executable. This process is called bootstrapping a compiler.
Today we have much better tools, such as LLVM Lite, which allows a compiler to be written in Python. You write the compiler in Python and LLVM. Then you write the compiler in that language and bootstrap it as before. The Linux kernel is written in C. You could install the Linux kernel and a full Linux operating system.
These days the best C compilers (in fact, the best compilers in general) are written in C++. So if you want the best compiler, you need a C++ to C converter (EMMTRX makes one). You take the compiler you want, written in C++, for C++; translate it to C, then compile it. The executable is used to recompile the C++ compiler to get the best C++ executable. Bootstrapping again.
Someone developed a very efficient and small (114K) version of Python written in C++ instead of C. This version of Python was very easy to interface to programs written in C++. Since it is so small, instead of having the C programs as DLLs (you can if you want), it created a special version of the Python virtual machine with the C++ programs linked in. So what was done was to write a very efficient version of the python interpreter integrated with C++ in C++ and a Python preprocessor. It’s called Python++. They then used it to write an even better version of Python called TPython++:
https://gitlab.com/hartsantler/tpythonpp
Again bootstrapping at work.
I hope this has shown just how powerful the bootstrapping concept is.
Thanks
Bill
To cut a long story short, it’s done by bootstrapping. We now have more sophisticated tools, so I will first describe how it was done in the early days. It was not exactly how I am going to describe it. It used the old language BCPL, but the principles are the same. First, you write an assembler in direct bytecode. It is a simple assembler - nothing fancy at all. The easiest to write is what I call a Forth Assembler.
Forth would have to be the easiest language there is. You simply read a string of words and numbers separated by whitespace (ie spaces). If it is a word, you look up a dictionary to find the address of the associated subroutine. If it is a number, push the number onto the Forth stack. If it can't do either, create an error. The executable is so easy it can be put on the boot sector of a USB disk. It can contain a routine to read the rest of the disk into memory, execute the code, and save the code you write. Then you have a word for each computer instruction, so you have a simple assembler. You could, if you wanted, create a whole Forth-based sophisticated operating system. But most stuff these days is written in C++, so the first thing to try is to get a C compiler working.
Added Later: The above is a bit sketchy. I have located a site giving more detail:
https://niedzejkob.p4.team/bootstrap/
Probably the simplest C is TinyC which is written in C. To see a simplistic version of its code, search for KartikTalwar/3095780. I originally gave the code, but IMHO it distracted more than helped.
The full vesion does some pretty cool things:
https://nova.disfarm.unimi.it/manual/plugins/tcc-doc.htm
There is a PDF on exactly how it works (also see the book - A Retargetable C Compiler Design and Implementation):
https://www.researchgate.net/publication/2810190_A_retargetable_compiler_for_ANSI_C
The hard part is you have to rewrite that code in your simple Forth Assembler. It will be no easy task, but a determined person can do it with perseverance and the information above.
Then you use your FORTH C compiler to compile the TinyC code. But of course, the executable is the crappy executable your Forth C compiler produces. To fix that, you recompile it to produce a nice TinyC executable. This process is called bootstrapping a compiler.
Today we have much better tools, such as LLVM Lite, which allows a compiler to be written in Python. You write the compiler in Python and LLVM. Then you write the compiler in that language and bootstrap it as before. The Linux kernel is written in C. You could install the Linux kernel and a full Linux operating system.
These days the best C compilers (in fact, the best compilers in general) are written in C++. So if you want the best compiler, you need a C++ to C converter (EMMTRX makes one). You take the compiler you want, written in C++, for C++; translate it to C, then compile it. The executable is used to recompile the C++ compiler to get the best C++ executable. Bootstrapping again.
Someone developed a very efficient and small (114K) version of Python written in C++ instead of C. This version of Python was very easy to interface to programs written in C++. Since it is so small, instead of having the C programs as DLLs (you can if you want), it created a special version of the Python virtual machine with the C++ programs linked in. So what was done was to write a very efficient version of the python interpreter integrated with C++ in C++ and a Python preprocessor. It’s called Python++. They then used it to write an even better version of Python called TPython++:
https://gitlab.com/hartsantler/tpythonpp
Again bootstrapping at work.
I hope this has shown just how powerful the bootstrapping concept is.
Thanks
Bill
Last edited: