Toolchain architecture
Table of contents
Goals
The toolchain represents the production portion of Carbon. At a high level, the toolchain’s top priorities are:
- Correctness.
- Quality of generated code, including performance.
- Compilation performance.
- Quality of diagnostics for incorrect or questionable code.
TODO: Add an expanded document that details the goals and priorities and link to it here.
High-level architecture
The main components are:
- Driver: Provides commands and ties together compilation flow.
- Diagnostics: Produces diagnostic output.
-
Compilation flow:
- Source: Load the file into a SourceBuffer.
- Lex: Transform a SourceBuffer into a Lex::TokenizedBuffer.
- Parse: Transform a TokenizedBuffer into a Parse::Tree.
- Check: Transform a Tree to produce SemIR::File.
- Lower: Transform the SemIR to an LLVM Module.
- CodeGen: Transform the LLVM Module into an Object File.
Design patterns
A few common design patterns are:
-
Distinct steps: Each step of processing produces an output structure, avoiding callbacks passing data between structures.
-
For example, the parser takes a
Lex::TokenizedBuffer
as input and produces aParse::Tree
as output. -
Performance: It should yield better locality versus a callback approach.
-
Understandability: Each step has a clear input and output, versus callbacks which obscure the flow of data.
-
-
Vectorized storage: Data is stored in vectors and flyweights are passed around, avoiding more typical heap allocation with pointers.
-
For example, the parse tree is stored as a
llvm::SmallVector<Parse::Tree::NodeImpl>
indexed byParse::Node
which wraps anint32_t
. -
Performance: Vectorization both minimizes memory allocation overhead and enables better read caching because adjacent entries will be cached together.
-
-
Iterative processing: We rely on state stacks and iterative loops for parsing, avoiding recursive function calls.
-
For example, the parser has a
Parse::State
enum tracked instate_stack_
, and loops inParse::Tree::Parse
. -
Scalability: Complex code must not cause recursion issues. We have experience in Clang seeing stack frame recursion limits being hit in unexpected ways, and non-recursive approaches largely avoid that risk.
-
See also Idioms for abbreviations and more implementation techniques.
Adding features
We have a walkthrough for adding features.
Videos
Talks
These talks are focused on implementation details of the toolchain, and can be helpful for learning how the toolchain internals work.
2025
Implementation walkthroughs
These are recordings of implementing PRs.