for loops

Pull request

Table of contents

Problem

Control flow is documented at language overview. for loops are common in C++, and Carbon should consider providing some form of it.

Background

C++

There are two forms of for loops in C++:

  • Semisemi (semicolon, semicolon): for (int i = 0; i < list.size(); ++i)
  • Range-based: for (auto x : list)

Semisemi for loops have been around for a long time, and are in C. Range-based for loops were added in C++11.

For example, here is a basic semisemi:

for (int i = 0; i < list.size(); ++i) {
  printf("List at %d: %s\n", i, list[i].name);
}

An equivalent semisemi using iterators and the comma operator may look like:

int i = 0;
for (auto it = list.begin(); it != list.end(); ++it, ++i) {
  printf("List at %d: %s\n", i, it->name);
}

Range-based syntax can be simpler, but can also make it more difficult if there are multiple pieces of interesting information:

int i = 0;
for (const auto& x : list) {
  printf("List at %d: %s\n", i, x.name);
  ++i;
}

Java

Java provides equivalent syntax to C++. Although Java doesn’t have a comma operator, it does provide for comma-separated statements in the first and third sections of semisemi for loops.

TypeScript and JavaScript

Both TypeScript and JavaScript offer three kinds of for loops:

  • Semisemi, mirroring C++.
  • for (x of list), mirroring range-based for loops.
  • for (x in list), returning indices.

For example, here is an in loop:

for (i in list) {
    console.log('List at ' + i + ': ' + list[i].name);
}

Python, Swift, and Rust

Python, Swift, and Rust all only support range-based for loops, using for x in list syntax.

Go

Go uses for as its primary looping construct. It has:

  • Semisemi, mirroring C++.
  • for i < list.size() condition-only loops, mirroring C++ while loops.
  • for { infinite loops.

Proposal

Carbon should adopt C++-style range-based for loops syntax. Semisemi for loops should be addressed through a different mechanism.

Related keywords are:

  • for
  • continue: continues with the next loop iteration.
  • break: breaks out of the loop.

Details

For loop syntax looks like: for ( var type variable : expression ) { statements }

Similar to the if/else proposal, the braces are optional and must be paired ({ ... }) if present. When there are no braces, only one statement is allowed.

continue will continue with the next loop iteration directly, skipping any other statements in the loop body.

break exits the loop immediately.

All of this is consistent with C and C++ behavior.

Range inputs

The syntax for inputs is not being defined in this proposal. However, we can still establish critical things to support:

  • Interoperable C++ objects that work with C++’s range-based for loops, such as containers with iterators.
  • Carbon arrays and other containers.
  • Range literals. These are not proposed, but for an example seen in other languages, 0..2 may indicate the set of integers [0, 2).

Executable semantics form

%token FOR

statement:
  FOR "(" pattern ":" expression ")" statement
| /* preexisting statements elided */
;

The continue and break statements are intended to be added as part of the while proposal.

Caveats

C++ as baseline

This baseline syntax is based on C++, following the migration sub-goal Familiarity for experienced C++ developers with a gentle learning curve. To the extent that this proposal anchors on a particular approach, it aims to anchor on C++’s existing syntax, consistent with that sub-goal.

Alternatives will generally reflect breaking consistency with C++ syntax. While most proposals may consider alternatives more, this proposal suggests a threshold of only accepting alternatives that skew from C++ syntax if they are clearly better; the priority in this proposal is to avoid debate and produce a trivial proposal. Where an alternative would trigger debate, it should be examined by an advocate in a separate proposal.

Semisemi support

Carbon will not provide semisemi support. This decision will be contingent upon a better alternative loop structure which is not currently provided by while or for syntax. If Carbon doesn’t evolve a better solution, semisemi support will be added later.

For details, see the alternative.

Range literals

Range literals are important to the ergonomics of range-based for loops, and should be added. However, they should be examined separately as part of limiting the scope of this proposal.

Enumerating containers

Several languages have the concept of providing an index with the object in a range-based for loop:

  • Python does for i, item in enumerate(items), with a global function.
  • Go does for i, item := range items, with a keyword.
  • Swift does for (i, item) in items.enumerated(), having removed a enumerate() global function.
  • Rust does for (i, item) in items.enumerate().

An equivalent pattern for Carbon should be examined separately as part of limiting the scope of this proposal.

Rationale based on Carbon’s goals

Relevant goals are:

  • 3. Code that is easy to read, understand, and write:

    • Range-based for loops are easy to read and very helpful.
    • Semisemi for syntax is complex and can be error prone for cases where range-based loops work. Avoiding it, even by providing equivalent syntax with a different loop structure, should discourage its use and direct engineers towards better options. The alternative syntax should also be easier to understand than semisemi syntax, otherwise we should just keep semisemi syntax.
  • 7. Interoperability with and migration from existing C++ code:

    • Keeping syntax close to C++ will make it easier for developers to transition.

Alternatives considered

Both alternatives from the if/else proposal apply to while as well: we could remove parentheses, require braces, or both. The conclusions mirror here in order to avoid a divergence in syntax.

Additional alternatives follow.

Include semisemi for loops

We could include semisemi for loops for greater consistency with C++.

This is in part important because switching from a semisemi for loop to a while loop is not always straightforward due to how for evaluates the third section of the semisemi. The inter-loop evaluation of the third section is important given how it interacts with continue. In particular, consider the loops:

for (int i = 0; i < 3; ++i) {
  if (i == 1) continue;
  printf("%d\n", i);
}

int j = 0;
while (j < 3) {
  if (j == 1) continue;
  printf("%d\n", j);
  ++j;
}

int k = 0;
while (k < 3) {
  ++k;
  if (k == 1) continue;
  printf("%d\n", k);
}

int l = 0;
while (l < 3) {
  if (l == 1) {
    ++l;
    continue;
  }
  printf("%d\n", l);
  ++l;
}

To explain the differences between these loops:

  • The first loop will print 0 and 2.
  • The second loop will print 0, then loop infinitely because the increment is never reached.
  • The third loop will only print 2 because the increment happens too early.
  • Only the fourth loop is equivalent to the first loop, and it duplicates the increment.

There is no easy place to put the increment in a while loop.

Advantages:

  • We need a plan for migrating both developers and code from C++ semisemis for loops, and providing them in Carbon is the easiest solution.
    • Semisemis remain common in C++ code.
  • Semisemis are much more flexible than range-based for loops.
    • while loops do not offer a sufficient alternative.

Disadvantages:

  • Semisemi loops can be error prone, such as for (int i = 0; i < 3; --i).
    • Syntax such as for (int x : range(0, 3)) leaves less room for developer mistakes.
    • Removing semisemi syntax will likely improve understandability of Carbon code, a language goal.
  • If we add semisemi loops, it would be very difficult to get rid of them.
    • Code using them should be expected to accumulate quickly, from both migrated code and developers familiar with C++ idioms.

If we want to remove for loops, we should avoid adding them. We do need to ensure that developers are happy with the replacement, although that should be achievable through providing strong range support, including range literals.

A story for migrating developers and code is still required. For developers, it would be ideal if we could have a compiler error that detects semisemi loops and advises the preferred Carbon constructs. For both developers and code, we need a suitable loop syntax that is easy to use in cases that remain hard to write in while or range-based for loops. This will depend on a separate proposal, but there’s at least presently interest in this direction.

Writing in instead of :

Range-based for loops could write in instead of :, such as:

for (x in list) {
  ...
}

An argument for switching now, instead of using C++ as a baseline, would be that var syntax has been discussed as using a :, and avoiding : in range-based for loops may reduce syntax ambiguity risks. However, the current var proposal does not use a :, and so this risk is only a potential future concern: it’s too early to require further evaluation.

Because the benefits of this alternative are debatable and would diverge from C++, adopting in would run contrary to using C++ as a baseline. Any divergence should be justified and reviewed as a separate proposal.

Multi-variable bindings

C++ allows for (auto [x, y] : range_of_pairs) which is not explicitly part of the syntax here. Carbon is likely to support this through tuples, so adding special for syntax for this would likely be redundant.