Toolchain architecture
Table of contents
Goals
The toolchain represents the production portion of Carbon. At a high level, the toolchain’s top priorities are:
- Correctness.
- Quality of generated code, including performance.
- Compilation performance.
- Quality of diagnostics for incorrect or questionable code.
TODO: Add an expanded document that details the goals and priorities and link to it here.
High-level architecture
The main components are:
- Driver: Provides commands and ties together compilation flow.
- Diagnostics: Produces diagnostic output.
-
Compilation flow:
- Source: Load the file into a SourceBuffer.
- Lex: Transform a SourceBuffer into a Lex::TokenizedBuffer.
- Parse: Transform a TokenizedBuffer into a Parse::Tree.
- Check: Transform a Tree to produce SemIR::File.
- Lower: Transform the SemIR to an LLVM Module.
- CodeGen: Transform the LLVM Module into an Object File.
Design patterns
A few common design patterns are:
-
Distinct steps: Each step of processing produces an output structure, avoiding callbacks passing data between structures.
-
For example, the parser takes a
Lex::TokenizedBuffer
as input and produces aParse::Tree
as output. -
Performance: It should yield better locality versus a callback approach.
-
Understandability: Each step has a clear input and output, versus callbacks which obscure the flow of data.
-
-
Vectorized storage: Data is stored in vectors and flyweights are passed around, avoiding more typical heap allocation with pointers.
-
For example, the parse tree is stored as a
llvm::SmallVector<Parse::Tree::NodeImpl>
indexed byParse::Node
which wraps anint32_t
. -
Performance: Vectorization both minimizes memory allocation overhead and enables better read caching because adjacent entries will be cached together.
-
-
Iterative processing: We rely on state stacks and iterative loops for parsing, avoiding recursive function calls.
-
For example, the parser has a
Parse::State
enum tracked instate_stack_
, and loops inParse::Tree::Parse
. -
Scalability: Complex code must not cause recursion issues. We have experience in Clang seeing stack frame recursion limits being hit in unexpected ways, and non-recursive approaches largely avoid that risk.
-
See also Idioms for abbreviations and more implementation techniques.
Adding features
We have a walkthrough for adding features.