for
loops
Table of contents
- Problem
- Background
- Proposal
- Details
- Caveats
- Rationale based on Carbon’s goals
- Alternatives considered
Problem
Control flow is documented at language overview. for
loops are common in C++, and Carbon should consider providing some form of it.
Background
C++
There are two forms of for
loops in C++:
- Semisemi (semicolon, semicolon):
for (int i = 0; i < list.size(); ++i)
- Range-based:
for (auto x : list)
Semisemi for
loops have been around for a long time, and are in C. Range-based for
loops were added in C++11.
For example, here is a basic semisemi:
for (int i = 0; i < list.size(); ++i) {
printf("List at %d: %s\n", i, list[i].name);
}
An equivalent semisemi using iterators and the comma operator may look like:
int i = 0;
for (auto it = list.begin(); it != list.end(); ++it, ++i) {
printf("List at %d: %s\n", i, it->name);
}
Range-based syntax can be simpler, but can also make it more difficult if there are multiple pieces of interesting information:
int i = 0;
for (const auto& x : list) {
printf("List at %d: %s\n", i, x.name);
++i;
}
Java
Java provides equivalent syntax to C++. Although Java doesn’t have a comma operator, it does provide for comma-separated statements in the first and third sections of semisemi for loops.
TypeScript and JavaScript
Both TypeScript and JavaScript offer three kinds of for loops:
- Semisemi, mirroring C++.
for (x of list)
, mirroring range-based for loops.for (x in list)
, returning indices.
For example, here is an in
loop:
for (i in list) {
console.log('List at ' + i + ': ' + list[i].name);
}
Python, Swift, and Rust
Python, Swift, and Rust all only support range-based for loops, using for x in list
syntax.
Go
Go uses for
as its primary looping construct. It has:
- Semisemi, mirroring C++.
for i < list.size()
condition-only loops, mirroring C++while
loops.for {
infinite loops.
Proposal
Carbon should adopt C++-style range-based for
loops syntax. Semisemi for
loops should be addressed through a different mechanism.
Related keywords are:
for
continue
: continues with the next loop iteration.break
: breaks out of the loop.
Details
For loop syntax looks like: for (
var
type variable :
expression ) {
statements }
Similar to the if/else proposal, the braces are optional and must be paired ({ ... }
) if present. When there are no braces, only one statement is allowed.
continue
will continue with the next loop iteration directly, skipping any other statements in the loop body.
break
exits the loop immediately.
All of this is consistent with C and C++ behavior.
Range inputs
The syntax for inputs is not being defined in this proposal. However, we can still establish critical things to support:
- Interoperable C++ objects that work with C++’s range-based
for
loops, such as containers with iterators. - Carbon arrays and other containers.
- Range literals. These are not proposed, but for an example seen in other languages,
0..2
may indicate the set of integers [0, 2).
Executable semantics form
%token FOR
statement:
FOR "(" pattern ":" expression ")" statement
| /* preexisting statements elided */
;
The continue
and break
statements are intended to be added as part of the while proposal.
Caveats
C++ as baseline
This baseline syntax is based on C++, following the migration sub-goal Familiarity for experienced C++ developers with a gentle learning curve. To the extent that this proposal anchors on a particular approach, it aims to anchor on C++’s existing syntax, consistent with that sub-goal.
Alternatives will generally reflect breaking consistency with C++ syntax. While most proposals may consider alternatives more, this proposal suggests a threshold of only accepting alternatives that skew from C++ syntax if they are clearly better; the priority in this proposal is to avoid debate and produce a trivial proposal. Where an alternative would trigger debate, it should be examined by an advocate in a separate proposal.
Semisemi support
Carbon will not provide semisemi support. This decision will be contingent upon a better alternative loop structure which is not currently provided by while
or for
syntax. If Carbon doesn’t evolve a better solution, semisemi support will be added later.
For details, see the alternative.
Range literals
Range literals are important to the ergonomics of range-based for
loops, and should be added. However, they should be examined separately as part of limiting the scope of this proposal.
Enumerating containers
Several languages have the concept of providing an index with the object in a range-based for loop:
- Python does
for i, item in enumerate(items)
, with a global function. - Go does
for i, item := range items
, with a keyword. - Swift does
for (i, item) in items.enumerated()
, having removed aenumerate()
global function. - Rust does
for (i, item) in items.enumerate()
.
An equivalent pattern for Carbon should be examined separately as part of limiting the scope of this proposal.
Rationale based on Carbon’s goals
Relevant goals are:
-
3. Code that is easy to read, understand, and write:
- Range-based
for
loops are easy to read and very helpful. - Semisemi
for
syntax is complex and can be error prone for cases where range-based loops work. Avoiding it, even by providing equivalent syntax with a different loop structure, should discourage its use and direct engineers towards better options. The alternative syntax should also be easier to understand than semisemi syntax, otherwise we should just keep semisemi syntax.
- Range-based
-
7. Interoperability with and migration from existing C++ code:
- Keeping syntax close to C++ will make it easier for developers to transition.
Alternatives considered
Both alternatives from the if
/else
proposal apply to while
as well: we could remove parentheses, require braces, or both. The conclusions mirror here in order to avoid a divergence in syntax.
Additional alternatives follow.
Include semisemi for
loops
We could include semisemi for loops for greater consistency with C++.
This is in part important because switching from a semisemi for
loop to a while
loop is not always straightforward due to how for
evaluates the third section of the semisemi. The inter-loop evaluation of the third section is important given how it interacts with continue
. In particular, consider the loops:
for (int i = 0; i < 3; ++i) {
if (i == 1) continue;
printf("%d\n", i);
}
int j = 0;
while (j < 3) {
if (j == 1) continue;
printf("%d\n", j);
++j;
}
int k = 0;
while (k < 3) {
++k;
if (k == 1) continue;
printf("%d\n", k);
}
int l = 0;
while (l < 3) {
if (l == 1) {
++l;
continue;
}
printf("%d\n", l);
++l;
}
To explain the differences between these loops:
- The first loop will print 0 and 2.
- The second loop will print 0, then loop infinitely because the increment is never reached.
- The third loop will only print 2 because the increment happens too early.
- Only the fourth loop is equivalent to the first loop, and it duplicates the increment.
There is no easy place to put the increment in a while
loop.
Advantages:
- We need a plan for migrating both developers and code from C++ semisemis
for
loops, and providing them in Carbon is the easiest solution.- Semisemis remain common in C++ code.
- Semisemis are much more flexible than range-based
for
loops.while
loops do not offer a sufficient alternative.
Disadvantages:
- Semisemi loops can be error prone, such as
for (int i = 0; i < 3; --i)
.- Syntax such as
for (int x : range(0, 3))
leaves less room for developer mistakes. - Removing semisemi syntax will likely improve understandability of Carbon code, a language goal.
- Syntax such as
- If we add semisemi loops, it would be very difficult to get rid of them.
- Code using them should be expected to accumulate quickly, from both migrated code and developers familiar with C++ idioms.
If we want to remove for
loops, we should avoid adding them. We do need to ensure that developers are happy with the replacement, although that should be achievable through providing strong range support, including range literals.
A story for migrating developers and code is still required. For developers, it would be ideal if we could have a compiler error that detects semisemi loops and advises the preferred Carbon constructs. For both developers and code, we need a suitable loop syntax that is easy to use in cases that remain hard to write in while
or range-based for
loops. This will depend on a separate proposal, but there’s at least presently interest in this direction.
Writing in
instead of :
Range-based for loops could write in
instead of :
, such as:
for (x in list) {
...
}
An argument for switching now, instead of using C++ as a baseline, would be that var
syntax has been discussed as using a :
, and avoiding :
in range-based for loops may reduce syntax ambiguity risks. However, the current var
proposal does not use a :
, and so this risk is only a potential future concern: it’s too early to require further evaluation.
Because the benefits of this alternative are debatable and would diverge from C++, adopting in
would run contrary to using C++ as a baseline. Any divergence should be justified and reviewed as a separate proposal.
Multi-variable bindings
C++ allows for (auto [x, y] : range_of_pairs)
which is not explicitly part of the syntax here. Carbon is likely to support this through tuples, so adding special for
syntax for this would likely be redundant.