135 lines
5.8 KiB
Markdown
135 lines
5.8 KiB
Markdown
# chibicc: A Small C Compiler
|
|
|
|
(The old master has moved to
|
|
[historical/old](https://github.com/rui314/chibicc/tree/historical/old)
|
|
branch. This is a new one uploaded in September 2020.)
|
|
|
|
chibicc is yet another small C compiler that implements most C11
|
|
features. Even though it still probably falls into the "toy compilers"
|
|
category just like other small compilers do, chibicc can compile several
|
|
real-world programs, including [Git](https://git-scm.com/),
|
|
[SQLite](https://sqlite.org) and
|
|
[libpng](http://www.libpng.org/pub/png/libpng.html), without making
|
|
modifications to the compiled programs. Generated executables of these
|
|
programs pass their corresponding test suites. So, chibicc actually
|
|
supports a wide variety of C11 features and is able to compile hundreds of
|
|
thousands of lines of real-world C code correctly.
|
|
|
|
chibicc is developed as the reference implementation for a book I'm
|
|
currently writing about the C compiler and the low-level programming.
|
|
The book covers the vast topic with an incremental approach; in the first
|
|
chapter, readers will implement a "compiler" that accepts just a single
|
|
number as a "language", which will then gain one feature at a time in each
|
|
section of the book until the language that the compiler accepts matches
|
|
what the C11 spec specifies. I took this incremental approach from [the
|
|
paper](http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf) by Abdulaziz
|
|
Ghuloum.
|
|
|
|
Each commit of this project corresponds to a section of the book. For this
|
|
purpose, not only the final state of the project but each commit was
|
|
carefully written with readability in mind. Readers should be able to learn
|
|
how a C language feature can be implemented just by reading one or a few
|
|
commits of this project. For example, this is how
|
|
[while](https://github.com/rui314/chibicc/commit/773115ab2a9c4b96f804311b95b20e9771f0190a),
|
|
[[]](https://github.com/rui314/chibicc/commit/75fbd3dd6efde12eac8225d8b5723093836170a5),
|
|
[?:](https://github.com/rui314/chibicc/commit/1d0e942fd567a35d296d0f10b7693e98b3dd037c),
|
|
and [thread-local
|
|
variable](https://github.com/rui314/chibicc/commit/79644e54cc1805e54428cde68b20d6d493b76d34)
|
|
are implemented. If you have plenty of spare time, it might be fun to read
|
|
it from the [first
|
|
commit](https://github.com/rui314/chibicc/commit/0522e2d77e3ab82d3b80a5be8dbbdc8d4180561c).
|
|
|
|
If you like this project, please consider purchasing a copy of the book
|
|
when it becomes available! 😀 I publish the source code here to give people
|
|
early access to it, because I was planing to do that anyway with a
|
|
permissive open-source license after publishing the book. If I don't charge
|
|
for the source code, it doesn't make much sense to me to keep it private. I
|
|
hope to publish the book in 2021.
|
|
|
|
I pronounce chibicc as _chee bee cee cee_. "chibi" means "mini" or
|
|
"small" in Japanese. "cc" stands for C compiler.
|
|
|
|
## Status
|
|
|
|
Features that are often missing in a small compiler but supported by
|
|
chibicc include (but not limited to):
|
|
|
|
- Preprocessor
|
|
- long double (x87 80-bit floting point numbers)
|
|
- Bit-field
|
|
- alloca()
|
|
- Variable-length array
|
|
- Thread-local variable
|
|
- Atomic variable
|
|
- Common symbol
|
|
- Designated initializer
|
|
- L, u, U and u8 string literals
|
|
|
|
chibicc does not support digraphs, trigraphs, complex numbers, K&R-style
|
|
function prototype, and inline assembly.
|
|
|
|
chibicc outputs a simple but nice error message when it finds an error in
|
|
source code.
|
|
|
|
There's no optimization pass. chibicc emits terrible code which is probably
|
|
twice or more slower than GCC's output. I have a plan to add an
|
|
optimization pass once the frontend is done.
|
|
|
|
## Internals
|
|
|
|
chibicc consists of the following stages:
|
|
|
|
- Tokenize: A tokenizer takes a string as an input, breaks it into a list
|
|
of tokens and returns them.
|
|
|
|
- Preprocess: A preprocessor takes as an input a list of tokens and output
|
|
a new list of macro-expanded tokens. It interprets preprocessor
|
|
directives while expanding macros.
|
|
|
|
- Parse: A recursive descendent parser constructs abstract syntax trees
|
|
from the output of the preprocessor. It also adds a type to each AST
|
|
node.
|
|
|
|
- Codegen: A code generator emits an assembly text for given AST nodes.
|
|
|
|
## Contributing
|
|
|
|
When I find a bug in this compiler, I go back to the original commit that
|
|
introduced the bug and rewrite the commit history as if there were no such
|
|
bug from the beginning. This is an unusual way of fixing bugs, but as a a
|
|
part of a book, it is important to keep every commit bug-free.
|
|
|
|
Thus, I do not take pull requests in this repo. You can send me a pull
|
|
request if you find a bug, but it is very likely that I will read your
|
|
patch and then apply that to my previous commits by rewriting history. I'll
|
|
credit your name somewhere, but your changes will be rewritten by me before
|
|
submitted to this repository.
|
|
|
|
Also, please assume that I will occasionally force-push my local repository
|
|
to this public one to rewrite history. If you clone this project and make
|
|
local commits on top of it, your changes will have to be rebased by hand
|
|
when I force-push new commits.
|
|
|
|
## About the Author
|
|
|
|
I'm Rui Ueyama. I'm the creator of [8cc](https://github.com/rui314/8cc),
|
|
which is a hobby C compiler, and also the original creator of the current
|
|
version of [LLVM lld](https://lld.llvm.org) linker, which is a
|
|
production-quality linker used by various operating systems and large-scale
|
|
build systems.
|
|
|
|
## References
|
|
|
|
- [tcc](https://bellard.org/tcc/): A small C compiler written by Fabrice
|
|
Bellard. I learned a lot from this compiler, but the design of tcc and
|
|
chibicc are different. In particular, tcc is a one-pass compiler, while
|
|
chibicc is a multi-pass one.
|
|
|
|
- [lcc](https://github.com/drh/lcc): Another small C compiler. The creators
|
|
wrote a [book](https://sites.google.com/site/lccretargetablecompiler/)
|
|
about the internals of lcc, which I found a good resource to see how a
|
|
compiler is implemented.
|
|
|
|
- [An Incremental Approach to Compiler
|
|
Construction](http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf)
|