Symbolic Tokens

Table of contents

Overview

A symbolic token is one of a fixed set of tokens that consist of characters that are not valid in identifiers. That is, they are tokens consisting of symbols, not letters or numbers. Operators are one use of symbolic tokens, but they are also used in patterns :, declarations (-> to indicate return type, , to separate parameters), statements (;, =, and so on), and other places (, to separate function call arguments).

Carbon has a fixed set of symbolic tokens, defined by the language specification. Developers cannot define new symbolic tokens in their own code.

Symbolic tokens are lexed using a “max munch” rule: at each lexing step, the longest symbolic token defined by the language specification that appears starting at the current input position is lexed, if any.

When a symbolic token is used as an operator, the surrounding whitespace must follow certain rules:

  • There can be no whitespace between a unary operator and its operand.
  • The whitespace around a binary operator must be consistent: either there is whitespace on both sides or on neither side.
  • If there is whitespace on neither side of a binary operator, the token before the operator must be an identifier, a literal, or any kind of closing bracket (for example, ), ], or }), and the token after the operator must be an identifier, a literal, or any kind of opening bracket (for example, (, [, or {).

These rules enable us to use a token like * as a prefix, infix, and postfix operator, without creating ambiguity.

Details

Symbolic token list

The following is the initial list of symbolic tokens recognized in a Carbon source file:

Symbolic Tokens Explanation
+ Addition
- Subtraction and negation
* Indirection, multiplication, and forming pointer types
/ Division
% Modulus
^ Complementing and Bitwise XOR
& Address-of and Bitwise AND
\| Bitwise OR
<< Arithmetic and Logical Left-shift
>> Arithmetic and Logical Right-shift
= Assignment and initialization
++ Increment
-- Decrement
+= Add-and-assign
-= Subtract-and-assign
*= Multiply-and-assign
/= Divide-and-assign
%= Modulus-and-assign
&= Bitwise-AND-and-assign
\|= Bitwise-OR-and-assign
^= Bitwise-XOR-and-assign
<<= Left-shift-and-assign
>>= Right-shift-and-assign
== Equality or equal to
!= Inequality or not equal to
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
-> Return type and indirect member access
=> Match syntax
[ and ] Subscript and deduced parameter lists
( and ) Function call, function declaration, and tuple literals
{ and } Struct literals, blocks of control flow statements, and the bodies of definitions (classes, functions, etc.)
, Separate tuple and struct elements
. Member access
: Name binding patterns
:! Compile-time binding patterns
; Statement separator

Alternatives considered

Alternatives from proposal #601:

  • lex the longest sequence of symbolic characters rather than lexing only the longest known operator
  • support an extensible operator set
  • different whitespace restrictions or no whitespace restrictions

References