Basic Syntax
Table of contents
Table of contents
Problem
The purpose of this proposal is to establish some basic syntactic elements of the Carbon language and make sure that the grammar is unambiguous and can be parsed by an LALR parser such as yacc
or bison
. The grammar presented here has indeed been checked by bison
. The language features in this basic grammar include control flow by way of if
and while
, functions, simple structures, choice, pattern matching, and a sample of operators on Booleans and integers. The main syntactic categories are declaration
, statement
, and expression
. Establishing these syntactic categories should help the other proposals choose syntax that is compatible with the rest of the language.
Background
The grammar proposed here is based on the following proposals:
- Carbon language overview
- pattern matching #87,
- structs #98,
- tuples #111,
- sum types #157, and
- metaprogramming #89.
Proposal
We summarize the four main syntactic categories here and define the grammar in the next section.
-
declaration
includes function, structure, and choice definitions. -
statement
includes local variable definitions, assignment, blocks,if
,while
,match
,break
,continue
, andreturn
. -
expression
includes function calls, arithmetic, literals, and other syntaxes that evaluate to a value. In Carbon, a type is a kind of value, and Carbon makes no syntactic distinction between type-valued expressions and other kinds of expressions, so theexpression
nonterminal is used in positions where a type is expected. -
pattern
for the patterns in amatch
statement, for the left-hand side of a variable definition, and for describing the parameters of a function. The grammar treats patterns and expressions identically, but the type system will only allow pattern variables (expression ':' identifier
) in patterns and not in value or type-producing expressions. Having the separate non-terminals forpattern
andexpression
helps to document this distinction.
The proposal also specifies the abstract syntax.
When this proposal is accepted, the accompanying flex
, bison
, and C++ files that define the grammar and implement the parser actions will be placed in the executable_semantics
directory of the carbon-lang
repository.
In the near future there will be proposals regarding the semantic analysis (aka. type checking) and specification of runtime behavior for this subset of Carbon with the plan to add the accompanying executable forms of those specifications to the executable_semantics
directory.
Looking towards growing the Carbon language, the intent is for other proposals to add to this grammar by modifying the flex
and bison
files and the abstract syntax definitions in the executable_semantics
directory.
Details
Expressions and Patterns
The following grammar defines the concrete syntax for expressions. Below we comment on a few aspects of the grammar.
pattern:
expression
;
expression:
identifier
| expression designator
| expression '[' expression ']'
| expression ':' identifier
| integer_literal
| "true"
| "false"
| tuple
| expression "==" expression
| expression '+' expression
| expression '-' expression
| expression "and" expression
| expression "or" expression
| "not" expression
| '-' expression
| expression tuple
| "auto"
| "fnty" tuple return_type
;
designator:
'.' identifier
;
tuple:
'(' field_list ')'
;
field_list:
/* empty */
| field
| field ',' field_list
;
field:
pattern
| designator '=' pattern
;
return_type:
/* empty */
| "->" expression
;
The grammar rule
expression: expression ':' identifier
is for pattern variables. For example, in a variable definition such as
var Int: x = 0;
the Int: x
is parsed with the grammar rule for pattern variables. In the right-hand side of the above grammar rule, the expression
to the left of the :
must evaluate to a type at compile time.
The grammar rule
expression: "fnty" tuple return_type
is for function types. They are meant to play the role that function pointers play in C and that std::function
plays in C++.
Regarding the grammar rule for the return type of a function:
return_type: "->" expression
the expression
is expected to evaluate to a type at compile-time.
The grammar rule
tuple: '(' field_list ')'
is primarily for constructing a tuple, but it is also used for creating tuple types and tuple patterns, depending on the context in which the expression occurs.
Regarding the grammar rules for designator
and for a named argument
designator: '.' identifier
field: designator '=' pattern
The period enables the addition of new keywords to Carbon without colliding with field names. The period also aids integrated development environments with auto-completion. The issue is that without the period, when the programmer is typing the identifier (and not yet typed the =
), the IDE doesn’t know whether the identifier is for a positional argument, in which case it is a variable occurrence, or whether the identifier is a field name.
Statements
The following grammar defines the concrete syntax for statements.
statement:
"var" pattern '=' expression ';'
| expression '=' expression ';'
| expression ';'
| "if" '(' expression ')' statement "else" statement
| "while" '(' expression ')' statement
| "break" ';'
| "continue" ';'
| "return" expression ';'
| '{' statement_list '}'
| "match" '(' expression ')' '{' clause_list '}'
;
statement_list:
/* empty */
| statement statement_list
;
clause_list:
/* empty */
| clause clause_list
;
clause:
"case" pattern "=>" statement
| "default" "=>" statement
;
Declarations
The following grammar defines the concrete syntax for declarations.
declaration:
"fn" identifier tuple return_type '{' statement_list '}'
| "fn" identifier tuple "=>" expression ';'
| "fn" identifier tuple return_type ';'
| "struct" identifier '{' member_list '}'
| "choice" identifier '{' alternative_list '}'
;
member:
"var" expression ':' identifier ';'
;
member_list:
/* empty */
| member member_list
;
alternative:
identifier tuple ';'
;
alternative_list:
/* empty */
| alternative alternative_list
;
declaration_list:
/* empty */
| declaration declaration_list
;
In the grammar rule for function definitions
declaration: "fn" identifier tuple return_type '{' statement_list '}'
the tuple
is used as a pattern to describe the parameters of the function. The grammar for function definitions does not currently include implicit parameters, but the intent is to add them to the grammar in the future.
In the rule for field declarations
member: "var" expression ':' identifier ';'
the expression
must evaluate to a type at compile time. The same is true for the tuple
in the grammar rule for an alternative:
alternative: identifier tuple ';'
Precedence and Associativity
The following precedence and associativity specification is meant to approximate the definitions in the operators and precedence proposal #168 to the extent that is possible in a yacc
/bison
generated parser. The ordering is from lowest to highest precedence, with operators on the same line having equal precedence. Proposal 168 differs in that the operator groups are partially ordered instead of being totally ordered.
nonassoc '{' '}'
nonassoc ':' ','
left "or" "and"
nonassoc "==" "not"
left '+' '-'
left '.' "->"
nonassoc '(' ')' '[' ']'
Abstract Syntax
The output of parsing is an abstract syntax tree. There are many ways to define abstract syntax. Here we simply use C-style struct
definitions. The definition of the abstract syntax is important because it is used in the specification of the semantic analysis and the specification of runtime behavior.
Abstract Syntax for Expressions
enum ExpressionKind { Variable, PatternVariable, Int, Bool,
PrimitiveOp, Call, Tuple, Index, GetField,
IntT, BoolT, TypeT, FunctionT, AutoT };
enum Operator { Neg, Add, Sub, Not, And, Or, Eq };
struct Expression {
ExpressionKind tag;
union {
struct { string* name; } variable;
struct { Expression* aggregate; string* field; } get_field;
struct { Expression* aggregate; Expression* offset; } index;
struct { string* name; Expression* type; } pattern_variable;
int integer;
bool boolean;
struct { vector<pair<string,Expression*> >* fields; } tuple;
struct {
Operator operator_;
vector<Expression*>* arguments;
} primitive_op;
struct { Expression* function; Expression* argument; } call;
struct { Expression* parameter; Expression* return_type;} function_type;
} u;
};
The correspondence between most of the grammar rules and the abstract syntax is straightforward. However, the parsing of the field_list
deserves some explanation. The fields can be labeled with the grammar rule:
field: designator '=' pattern
or unlabeled, with the rule
field: pattern
The unlabeled fields are given numbers (represented as strings) for field labels, starting with 0 and going up from left to right.
Regarding the rule for tuples:
tuple: '(' field_list ')'
if the field list only has a single unlabeled item without a trailing comma, then the parse result for that item is returned. Otherwise a tuple
AST node is created containing the parse results for the fields.
In the following grammar rules, if the parse result for tuple
is not a tuple (because the field list was a single unlabeled item without a trailing comma), then a tuple is wrapped around the expression to ensure that it is a tuple.
expression: expression tuple
expression: "fnty" tuple return_type
function_definition:
"fn" identifier tuple return_type '{' statement_list '}'
| "fn" identifier tuple DBLARROW expression ';'
| "fn" identifier tuple return_type ';'
alternative: identifier tuple ';'
Abstract Syntax for Statements
enum StatementKind { ExpressionStatement, Assign, VariableDefinition,
If, Return, Sequence, Block, While, Break, Continue,
Match };
struct Statement {
StatementKind tag;
union {
Expression* exp;
struct { Expression* lhs; Expression* rhs; } assign;
struct { Expression* pat; Expression* init; } variable_definition;
struct { Expression* cond; Statement* then_; Statement* else_; } if_stmt;
Expression* return_stmt;
struct { Statement* stmt; Statement* next; } sequence;
struct { Statement* stmt; } block;
struct { Expression* cond; Statement* body; } while_stmt;
struct {
Expression* exp;
list< pair<Expression*,Statement*> >* clauses;
} match_stmt;
} u;
};
Abstract Syntax for Declarations
struct FunctionDefinition {
string name;
Expression* param_pattern;
Expression* return_type;
Statement* body;
};
enum MemberKind { FieldMember };
struct Member {
MemberKind tag;
union {
struct { string* name; Expression* type; } field;
} u;
};
struct StructDefinition {
string* name;
list<Member*>* members;
};
enum DeclarationKind { FunctionDeclaration, StructDeclaration,
ChoiceDeclaration };
struct Declaration {
DeclarationKind tag;
union {
struct FunctionDefinition* fun_def;
struct StructDefinition* struct_def;
struct {
string* name;
list<pair<string, Expression*> >* alts;
} choice_def;
} u;
};
Alternatives considered
Expressions, type expressions, and patterns could instead be defined using completely disjoint grammar productions, but that would require adding more keywords or symbols to avoid ambiguity and there would be a fair amount of repetition in grammar productions.
The proposal does not include declarations for uninitialized variables, leaving that to a later proposal.
In this proposal, assignment is a statement. It could instead be an expression as it is in C and C++. The arguments against assignment-as-an-expression include 1) it complicates reasoning about the ordering of side-effects and 2) it can cause confusion between =
and ==
(SEI CERT C Coding Standard) Visual Studio Warning.
The syntax for variable initialization uses =
, which is the same token used in assignment. An alternative would be to use a different syntax such as :=
to better signal the semantic differences between initialization and assignment. The author does not have a preference on this choice. The :=
alternative does not introduce any complications regarding ambiguity despite the other uses of :
in the grammar.
Inside a choice
, an alternative
could instead begin with a keyword such as alt
, fn
, or var
, as discussed in the proposal for sum types #157. The author does not have a preference in this regard.
The precedence for and
and or
are equal, following the suggestion of proposal #168, whereas in C++ &&
has higher precedence than ||
.
The parameters of a function are described using the syntax of a pattern. There could instead be separate syntax that is special to function parameters, as there is in C++.
There are open questions regarding the design of function types, so we could alternatively postpone the specification of the syntax for function types. The choice to include them is based on the idea that Carbon will need something that fills the roles of std::function
and C-style function pointers. Also, the features were selected for the basic syntax proposal in a way that aimed to make the proposal complete in the sense that each value-oriented feature (functions, tuples, structures) comes with syntax for 1) creating values, 2) using values, and 3) a type for classifying values.
The parameters of a function type are described using the syntax for type expression. There could instead be separate syntax that is special to the parameters of a function type, as there is in C++.
The keyword for introducing a function type is fnty
, which is an arbitrary choice. Other alternatives include fn_type
and fn
. The fn
alternative would conflict with future plans to use fn
for lambda expressions. Some dislike of fn_type
was voiced, which motivated the choice of fnty
. It would be nice to do without a keyword, but as the grammar currently stands, that introduce ambiguity.
The pattern
non-terminal is defined in terms of expression
. It could instead be the other way around, defining expression
in terms of pattern
.
Regarding tuples and tuple types, this proposal follows #111 in that there is not a distinct syntax for tuple types. Instead, the intent is that when a tuple of types results from an expression in a context that expects a type, the type system will convert the tuple to a tuple type. An alternative approach has been discussed in which tuple types have a distinct syntax, such as Tuple(Int, Bool)
.
Rationale
Using code to validate our specification is a really promising direction, and this proposal seems like a good starting point. A reference implementation that’s simple enough to be part of the design iteration process should help us move faster, by quickly uncovering the places where our specifications are ambiguous, syntactically or semantically unsound, or don’t give the behavior we expect. In other words, it will help us keep ourselves honest, even at the proposal stage, which will help us avoid wasting time and effort implementing designs that turn out to be unworkable.
This can be considered as sort of a counterpart to In-progress design overview #83, in that the design specifics are being approved in order to bootstrap the specification process. We aren’t necessarily adopting the specific syntax and semantics expressed by this proposal, and those choices will need to be presented and justified from scratch by future proposals.
This decision is deferring the implementation to code review. The specific tooling used to implement the syntax checker, such as Bison, is a detail which may be changed, now or later, without requiring a proposal for core team review.