var
statement
Table of contents
- Problem
- Background
- Proposal
- Rationale based on Carbon’s goals
- Caveats
- Alternatives considered
Problem
The var
statement is noted in the language overview, but is provisional — no justification has been provided. Variable declarations are fundamental, and it should be clear to what degree the current syntax is adopted.
It’s expected that after the adoption of this proposal, var
syntax will still not be finalized: the proposal is an experiment.
Constants
Although constants are naturally related to variables, this proposal does not include any syntax for constants. This is expected to be revisited later.
Background
Terminology
In this proposal, “variable” is defined as an identifier referring to a mutable value.
Out of scope features
Questions have come up about:
- The type system
- Type checking
- Scoping
All of these are important features. However, in the interest of small proposals, they are out of scope of this proposal.
Language comparison
Variables are standard in many languages. Some various forms to consider are:
-
C++:
int x; int y = 0; bool a = true, *b = nullptr, c;
-
Python:
x = None y = 0 z: int = 7 # Added by PEP 526.
-
Swift:
var x = 0 var y: Int = 0 var z: Int
-
TypeScript
let y: Number = 0; var x = 0; # Legacy from JavaScript.
-
Rust
let mut x = 0; let mut y: i32 = 0; let mut z: i32;
-
Go
var x = 0 y := 0 var z int var a, b = true, false
-
Visual Basic
Dim x As Integer = 3
Proposal
Carbon should adopt var <type> <identifier> [ = <value> ];
syntax for variable statements.
Considerations for this syntax are:
- Type and identifier ordering: The ordering of
<type>
before<identifier>
reflects the typical C++ ordering syntax.- Some C++ syntax can put type information after the identifier, such as
int x[6];
. Carbon should be expected to place that as part of the type.
- Some C++ syntax can put type information after the identifier, such as
var
introducer keyword: The use ofvar
makes it clearer for readers to skim and see where variables are being declared. It also reduces complexity and potential ambiguity in language parsing.- One variable: In C++, multiple variables can be declared in a single statement. An equivalent Carbon syntax may end up looking like
var (Int x, String y) = (0, "foo");
, so limiting to one declaration is not fundamentally restrictive. However, by breaking with C++ and requiring the full type to be specified with each identifier, we achieve two important things:- It’s clear what the full type is, preventing difficult-to-read statements with a mix of stack variables, pointers, and similar.
- As the language grows, a function returning a tuple may be assigned to distinctly named and typed variables.
Experiment: The ordering of type and identifier will be researched. For more information, see the alternative.
Executable semantics form
Example bison syntax for executable semantics is:
statement:
"var" expression identifier optional_assignment;
| /* preexisting statements elided */
;
optional_assignment:
/* empty */
| "=" expression
;
Rationale based on Carbon’s goals
Carbon needs variables in order to be writable by developers. That functionality needs a syntax.
Relevant goals are:
-
3. Code that is easy to read, understand, and write:
- Adding a keyword makes it easy for developers to visually identify functions.
-
5. Fast and scalable development:
- The addition of a keyword should make parsing easy.
-
7. Interoperability with and migration from existing C++ code:
- Keeping syntax close to C++ will make it easier for developers to transition.
Caveats
var
name may change
The name var
could still change. However, it’s used with similar meaning in other languages including Swift, Go, and TypeScript, and so it’s reasonable to expect it will not.
Changing to let mut
The idea that var
may change includes the possibility that var
may become something like let mut
in Rust. However, this is not assumed by this proposal:
- This proposal omits constant syntax.
- It would need to be considered whether the syntax tax of
let mut
appropriately focuses on encouraging appropriate usage of features rather than restricting misuse). - Lower verbosity syntax for variables is more consistent with C++, even if constants are made less verbose by way of
let
.
Multiple identifiers in one statement
Although var (Int x, String y) = (0, "foo");
syntax is mentioned, this proposal is not intended to propose such a syntax. It’s noted primarily to explain the likely path, that this does not rule out abbreviated syntax such as that. That should probably be covered as part of tuples.
Update provisional pattern matching syntax
Pattern matching syntax in the overview uses syntax similar to Int: x
. As part of removing the colon between type and identifier from the provisional var
syntax, that syntax should be changed to remove the :
. Details should be resolved as part of the eventual pattern matching proposal, but if changes are needed to add a separator, the var
syntax should be updated to remain consistent. The precise form of that implementation will be part of normal Carbon evolution.
For example, replacing fn Sum(Int: a, Int: b) -> Int;
with fn Sum(Int a, Int b) -> Int;
and case (Int: p, (Float: x, Float: _)) if (p < 13) => {
with case (Int p, (Float x, Float _)) if (p < 13) => {
.
Update provisional $
syntax
Variables using Type:$
and similar should drop the :
, as in Type$
.
Alternatives considered
Noted alternatives are key differences from C++ variable declaration syntax.
No var
introducer keyword
The intent of the var
statement is to improve readability and parsability, and it’s related to fn
for functions. Although code is more succinct without introducers, the noted benefits are expected to be significant. Most other modern languages use similar introducers, and so this break from C++ is adopting a different norm.
Name of the var
statement introducer
var
is used with a similar meaning in several other languages, including Swift, Go, and JavaScript. let
is used by TypeScript. let mut
is used by Rust, with let
used for constants (this use of let
alone is consistent with other languages). In general var
appears to be a more common choice.
Colon between type and identifier
The use of a colon (:
) between the type and identifier is intended to reduce potential parsing ambiguity, and to make reading code easier. As proposed, there is no colon between the type and identifier.
Syntax ambiguity
Using a colon or other separator could make it easier to avoid certain kinds of ambiguities. For example, suppose we decided to use a postfix *
operator to form pointer types, as in C++. In such a setup, we could have code like the following:
var T * x = 3;
var T * x = 3 y;
In the first statement, *
is a unary operator and so T*
is the type and x
is the identifier. However, in the second statement, *
is a binary operator and so T * x = 3
is the type, and y
is the identifier; the resulting compiler errors may be confusing to users. Furthermore, the place of 3
could be taken by an arbitrarily complex expression; this could cause resolving the ambiguity between unary and binary *
to require unbounded look-ahead, adversely impacting code compilation time goals.
Consider instead the code:
var T *: x = 3;
var T * x = 3: y;
The colon makes it unambiguous whether the *
in each case is unary or binary with only one token of look-ahead. More importantly, this syntax immediately calls the reader’s attention to the fact that the second declaration has a highly unusual type.
There are other ways of resolving ambiguities like this. For example, we could avoid allowing the same operator to have both postfix and infix forms, or we could distinguish them by the presence or absence of whitespace. However, even if we avoid formal ambiguity by such means, a separator like :
may be useful for reducing visual ambiguity for human readers.
Confusion with other languages and alternatives
One of the disadvantages of :
is that with var Int: x
, ordering is inconsistent with other languages using :
, such as Rust and Swift, which would say var x: Int
.
It may be worth considering other syntax options. A few to consider are:
var(Int) x;
var Int# x;
var Int @x;
var Int -> x;
These aren’t part of the proposed syntax mainly because it’s not clear any would gain as much support as :
. However, this is an opportunity to make suggestions and see if there’s a good compromise.
Use in pattern matching
The old draft pattern matching proposal used :
as a separator. In pattern matching, the :
may be particularly important to distinguish between value matching and type name matching. However, the pattern matching proposal should examine these choices and alternatives before we reach a conclusion that :
is necessary for pattern matching. Per syntax ambiguity, it is expected that :
has some advantages, but may not turn out to make a compiler difference due to prevailing constraints on type expression syntax.
This proposal suggests we update provisional pattern matching syntax to match the proposed var
syntax.
Advantages and disadvantages
Advantages:
- Reduces syntax ambiguity.
- This should improve readability and parsability.
- It should make it easier to debug issues during development.
Disadvantages:
- Deviates from the common syntax used by most languages with the type before the identifier, including C, C++, Java, and C#.
- Changing from C++ is especially significant because of Carbon’s goals for interoperability and migration which will mean an especially large portion of Carbon developers will be actively reading both Carbon and C++ code.
- Other notable languages that us
:
in variable statements, including Swift, put the type after the identifier.- It may be worth considering alternatives to
:
. - If the alternative of type after identifier is adopted, it’s likely a
:
separator will be adopted.
- It may be worth considering alternatives to
Conclusion
Right now the proposal is to not have anything between the type and identifier in order to avoid cross-language ambiguity, and to retain syntax that is closer to C++. However, the ultimate decision may hinge on type and identifier ordering, as well as related future evolution.
Type after identifier
There are many languages that put the type after the identifier. A common format used by Swift and Rust is var x: Int
.
It’s worth considering the sentence-like readings:
var x Int
(orvar x: Int
) may be read as “declare x as an int” or “make a variable x and give it int storage”.var Int x
may be read as “declare an int called x” or “make a variable with int storage called x”.
These readings might be of similar quality, and are presented to offer different perspectives on how to read the possible statement orderings.
Ordering as a way to quickly answer questions
Ordering is essentially a question of pairing identifiers and types. This can be cast as asking which question developers consider more important when reading code:
- What is the type of variable
x
? - What is the identifier of the
Int
variable?
We assert the first question is the more important one: developers will see an identifier in later code, and want to know its type. However, how do we determine which order is better for this purpose?
Unfortunately, little research has been done on this. All we’re aware of right now is a study from an unpublished undergraduate project from Germany. The study was done in Java with 50 students. Its data indicates that it’s faster to answer question 1 if the type comes first, and faster to answer question 2 if the identifier comes first. We do not want to make decisions based on the study because it isn’t published, studied a small group, and doesn’t directly compare possible var
syntaxes; however, it still influences our thoughts.
Syntax popularity
When considering what to use for now, we can consider the popularity of various languages. The top 10 on several sources (with percentages noted by sources that have them) are:
TIOBE | Pct | GitHut | Pct | PYPL | Pct | Octoverse |
---|---|---|---|---|---|---|
C | 16% | JavaScript | 19% | Python | 30% | JavaScript |
Java | 11% | Python | 16% | Java | 17% | Python |
Python | 11% | Java | 11% | JavaScript | 8% | Java |
C++ | 7% | Go | 8% | C# | 7% | TypeScript |
C# | 4% | C++ | 7% | C and C++ | 7% | C# |
Visual Basic | 4% | Ruby | 7% | PHP | 6% | PHP |
JavaScript | 2% | TypeScript | 7% | R | 4% | C++ |
PHP | 2% | PHP | 6% | Objective-C | 4% | C |
SQL | 2% | C# | 4% | Swift | 2% | Shell |
Assembly | 2% | C | 3% | TypeScript | 2% | Ruby |
Sources:
For these languages:
- C, C++, C#, Objective-C, and Java put the identifier after the type.
- These use
<type> <identifier>
, with no keyword.
- These use
- Python, Go, TypeScript, Visual Basic, SQL, and Swift put the type after the identifier.
- Python uses
<identifier>: <type>
, with no keyword. This was added in Python 3.6, and reflects language evolution. - Go uses
var <identifier> <type>
, with no colon. - TypeScript and Swift use
var <identifier>: <type>
. - Visual Basic uses
Dim <identifier> as <type>
. - SQL uses
DECLARE @<identifier> AS <type>
, whereAS
is optional.
- Python uses
- JavaScript, R, Ruby, PHP, Shell, and Assembly language do not specify types in the same ways as other languages.
Type elision
First, with Carbon it’s been discussed to require use of auto
(or a similar explicit syntax marker) instead of allowing developers to elide the type entirely from a var
statement. In other words, while var Int x = 0;
is valid, and var auto x = 0;
is equivalent, there is no form such as var x = 0;
which removes the type entirely.
Most languages that write var x: Int
also allow eliding the type when assigning a value. For example, Go allows x := 0
and Swift allows var x = 0;
. As a result, there is no need for an auto
keyword.
This would be more surprising with var Int x
syntax because removing the Int
now places the identifier immediately after var
, where the type normally is. This may be subtly confusing to developers. However, if auto
is required for explicitness, the issue is moot.
Retaining auto
does not eliminate the need to consider type elision as part of advantages and disadvantages: if the type is put after the identifier and var x: auto
syntax is used, it now becomes an inconsistency with other languages. This inconsistency would be a Carbon innovation that may confuse developers, leading to long-term pressure to remove auto
for consistency with similar languages, and thus a disadvantage.
Advantages and disadvantages
Advantages:
- Consistent with languages that put a
:
in the type declaration.- Notable languages using
x: Int
syntax include Swift, Rust, Kotlin and TypeScript. Go is similar but does not include:
. - Swift puts argument labels before variable declaration. Keeping the identifier first allows consistency with Swift’s argument label syntax while also keeping names adjacent.
- Notable languages using
- More opportunity to unify a concept of putting the identifier immediately after the keyword.
fn
,import
, andpackage
will all have the identifier immediately after the keyword, with non-identifier content following.- It’s expected that a typical function declaration will look like
fn Foo(Int bar)
, withvar
only added when a storage for a copy is required. Thus, this advantage is primarily about the function identifierFoo
, not parameter identifiers.
- It’s expected that a typical function declaration will look like
- Other cases, such as
alias
, are likely flexible and could followvar
in concept by putting the resulting name at the end. That is,alias To = From;
for consistency withvar x: Int;
versusalias From as To;
similar tovar Int x;
ordering.
Disadvantages:
- If Carbon doesn’t support type elision, it creates an inconsistency with other notable languages using
x: Int
syntax.- It may be worth considering alternatives to
:
.
- It may be worth considering alternatives to
- The ordering of
x: Int
is inconsistent with C++ variable syntax.- Negatively affects Familiarity for experienced C++ developers with a gentle learning curve.
- It is consistent with other parts of C++ syntax, particularly
using To = From;
, although nottypedef From To;
.
- Early signs are that putting the identifier first makes it slower for developers to answer the question, “what is the type of variable
x
?”- This indicates worse readability in spite of the sentence ordering, the essential part of Code that is easy to read, understand, and write.
- Popular languages tend to use
int x
syntax, including C, Java, C++, and C#. Other notable languages include Groovy and Dart.
Conclusion
We should conduct a larger study on the topic of type and identifier ordering and syntax. Until then, we should adopt C++-like syntax. This meets the migration sub-goal
Familiarity for experienced C++ developers with a gentle learning curve, and allows applying the higher-priority goal Code that is easy to read, understand, and write if supporting evidence is found.
Experiment: The ordering of type and identifier will be researched.