Tuples and tuple indexing

Pull request

Table of contents

Abstract

Add support for extracting elements of a tuple by their numerical index.

Also formally add the well-established basic syntactic and semantic rules for tuples, for which we have had leads issues but no proposal, into the design.

Problem

Currently, the only way to access the elements of a tuple is through pattern matching. While this handles many cases well, it is sometimes desirable to access an element of a tuple more succinctly, especially in cases where only a single element’s value is needed.

Background

In Python, tuple indexing is performed using square brackets:

tup = (1, 2, 3)
# Prints 2.
print(tup[1])

In C++, std::pair is indexed using .first and .second, and std::tuple is indexed using std::get<I>.

In Rust and Swift, a tuple is indexed using .N, where N is a decimal integer literal.

  • Rust disallows digit separators and base prefixes in N, but allows certain literal suffixes for historical reasons.
  • Swift disallows digit separators and base prefixes in N. swiftc allows leading 0 digits, although this appears to be an unintentional consequence of llvm::StringRef::getAsInteger allowing them.

The current Carbon documentation suggests using tuple[i] for tuple indexing, but this has not been the subject of an approved proposal.

Proposal

Formally, we have not yet approved a proposal that says that Carbon has tuple types, although we have approved several proposals that explicitly include support for tuples. So, this proposal does that: tuples exist in Carbon, and are product types with unnamed positional elements.

This proposal also updates the design to match other decisions that have been made in leads issues but not captured by a proposal, specifically:

  • Leads issue #2191 (one-tuples and one-tuple syntax), despite being focused on one-tuples, established the syntax for tuples of all arities.
  • Leads issue #710 established rules for assignment, comparison, and implicit conversion of tuples. These operations are performed elementwise, with relational comparisons being performed lexicographically.

Finally, the main intent of this proposal is to add support for indexing tuples, using the following syntaxes:

  • . N, where N is an integer literal, and
  • . ( expr ), where expr is a template constant of integer type.

For pointers to tuples, -> N and -> ( expr ) are also supported.

Details

Lexing

Multi-level tuple indexing will result in constructs such as tuple_of_tuples.1.2. It’s important that these are lexed as two tuple indexing operations, not as tuple_of_tuples . 1.2, as it would be under the current lexical rules, so a new rule is introduced:

  • When a . or -> token is followed immediately by a digit, it is lexed as a . or -> token followed by an integer literal, never a real literal.

Note that this results in lexing being slightly contextual: the rule to lex a token after a . or -> is different from the rule to lex a token in any other context. However, there is an alternative equivalent formulation of the rule that is not context-sensitive: that .integer is treated as a single lexeme that produces two tokens, and likewise for ->integer.

Indexes as names

The elements of a tuple are treated as if they had decimal integers as their names: .0, .1, and so on. It is an error to use a different spelling of that integer in a simple member access, because that spelling would not match the element name. For example, (1, 2).0x0 is invalid, as is large_tuple.1_2. These spellings can be used as an expression operand as described below: (1, 2).(0x0) and large_tuple.(1_2) are both valid.

Precedence

The . N syntax has the same precedence as postfix member access syntax, . name, and can be combined in the same expression: a.0.x.1 is valid.

The . ( expr ) syntax is not new in this proposal, and continues to have the same precedence as . name.

Expression operand

In the . ( expr ) syntax, if the first operand is a tuple and the second operand is a constant of any integer type, the result is the corresponding tuple element, as if specified by a decimal integer literal. This rule is built into the language; the . () notion is not currently overloadable.

Bounds

If the tuple index is not between 0 and one less than the number of elements in the tuple, inclusive, the indexing is invalid.

Tuple slicing

The current skeleton design suggests using tuple[a .. b] to slice tuples. For example, tuple[0 .. 2] could be used to extract the first two elements of a tuple. Tuple slicing support is not covered by this proposal, but could be added in the future with syntax such as tuple.(0 .. 2). However, note that there is a risk that this syntax may lead to an incorrect theory about how Carbon works: namely, that tuple.__ gives an element whereas tuple.(__) gives a tuple.

Rationale

Goals:

  • Language tools and ecosystem
    • The lexing rule is relatively simple to implement. Tools such as syntax highlighters can treat .i as a distinct kind of token rather than implementing any kind of context-sensitive lexing.
  • Software and language evolution
    • Consistent use of tuple field indexes can be used to support code that adds new tuple elements over time.
  • Code that is easy to read, understand, and write
    • This feature allows tuple access to be written more concisely than pattern matching would allow.
    • Lexing .1.2 as four tokens rather than two avoids a surprise that would make chained member access hard to write.
    • For simple member access, requiring a decimal integer with no digit separators allows the member access to be treated as an element name, making the indexing easier to understand.
  • Interoperability with and migration from existing C++ code
    • This feature provides a migration syntax for existing use of .first, .second, and std::get<I>. The permission to use expressions rather than only literals supports migration of std::get<expression>.

Principles:

  • Low context sensitivity.
    • We look only at the character immediately before a numeric literal to determine whether it is lexed as a tuple index that stops before the next . or as a general numeric literal.

Alternatives considered

Alternative lexing rule

We could lex .0, .1, … as a single token rather than as separate . and 0, 1, … tokens. This would somewhat simplify the lexing rules, because they would no longer be contextual. We choose to not do this because:

  • This would be inconsistent with our handling of struct.fieldname.
  • Either tuple . 0 would be invalid, unlike struct.fieldname, or it would need to use a distinct grammar production from tuple.0.

We could lex an integer literal when the previous token is ., regardless of whether the literal follows the . immediately. For example, we could treat

let n: i32 = ((1, 2, 3), 4) . 0.1;

as tuple indexing, rather than as a tuple followed by a . and a real literal. This is what Swift does. We choose to not do this because:

  • The 0.1 literal in this case looks like a real literal, not tuple indexing, so this would likely cause surprise for readers.
  • This would make the context-sensitive lexing be non-local. The chosen rule can be interpreted as lexing .[0-9]* as a single lexeme, but forming two tokens from it, whereas this alternative rule would be much more firmly a context-sensitive lexing rule.

We could get a similar result in other ways:

  • We could allowing a real literal after a ., and split it into a pair of member accesses when needed. This is what rustc does.
  • We could lex a real literal as three tokens: an integer token, a . token, and a suffix token, and merge them back together in the parser. This is what intellij does when parsing Rust.

Note that these approaches are not entirely equivalent to each other. In Rust, for example, the difference is observable in proc macros. Also, using any kind of token merging or splitting approach would result in the token stream not matching the interpretation of the program, which is problematic for tooling. For example, many common Rust syntax highlighters do not properly highlight chained tuple indexing.

Decimal indexing restriction

Carbon follows Rust and Swift in restricting tuple indexes to being decimal integers:

// OK
let a: i32 = (1, 2, 3).0;

// Error, invalid index for tuple element.
let b: i32 = (1, 2, 3).0x0;

This restriction introduces an inconsistency between .0x0 and .(0x0), and we could easily drop it. However, the restriction allows us to consider .0, .1, and so on to simply be the names of the tuple elements, analogous to struct field names, and there isn’t a clear utility for permitting a base prefix or a digit separator in a tuple index.

Square bracket notation

Instead of tuple.0 and tuple.(IndexConstant), we could use tuple[0] and tuple[IndexConstant]. This would result in more consistent syntax for indexing with a constant versus with an expression, but would make accessing an element of a tuple less consistent with accessing an element of a struct. We expect tuple access with a non-literal index to be a rare operation, so the consistency with that syntax seems to have lower value.

Also, the use of . notation aims to convey the intent of the developer better: we intend x[n] notation to be used primarily for homogenous indexing, whereas . notation is used for heterogenous access. This also reflects the difference in phase: tuple indexing requires a constant index in the same way that struct member access requires a constant name, whereas array or container indexing would typically be expected to permit a runtime index.

The .N notation can also be extended to perform member indexing into a struct or class, at least the latter of which would not be reasonable to support with [] notation. However, such support is not part of this proposal.

Use of [] notation has the advantage of reducing visual ambiguity for cases such as O.0, l.0, and Z.0, which might be visually confused with 0.0, 1.0, and 2.0, respectively. However, we’re not aware of this being a problem in practice in Rust or Swift, which use this notation, and the same problem exists even without the .0 suffix: F(O, l, Z) may resemble F(0, 1, 2).

Negative indexing from the end of the tuple

We could support tuple.-1, or perhaps tuple.(-1), as a notation for “the last element of the tuple”, as used for example in Python. We choose not to support this at this time because such notation can be confusing and has awkward edge cases. An off-by-one error, or an attempt to access a one-past-the-start element, will sometimes be accepted and silently do the wrong thing.

If a future proposal introduces tuple slicing, it should revisit this question, because this kind of indexing from the end is often desirable when forming a slice. The possibility of using a different notation for this operation should be considered, such as tuple.(.size - 1).

Trailing commas

Carbon permits optional trailing commas in tuples, with mandatory trailing commas for one-tuples. Alternatives to this choice were considered in leads issue #2191.