Nominal classes and methods

Problem
Background
Proposal
Rationale based on Carbon’s goals
Alternatives considered

Problem

We need facilities for defining new nominal record types in Carbon. They will need to support object-oriented features such as methods.

Background

This is a follow up to #561: Basic classes: use cases, struct literals, struct types, and future work which laid a foundation.

Proposal

The proposal is to update docs/design/classes.md as described in this PR.

Rationale based on Carbon’s goals

This particular proposal is focusing on the Carbon goal that code is “easy to read, understand, and write.” Future proposals will address other aspects of the class type design such as performance.

Alternatives considered

Method syntax

The proposed method syntax was decided in question-for-leads issue #494: Method syntax.

A wide variety of alternatives were considered:

Using a different introducer such as method to distinguish methods from functions.
Putting a pattern to match the receiver before the method name.
Omitting either the receiver name or the receiver type.
Allowing the user to specify the receiver name instead of using a parameter named me to indicate it is a method.
Using syntax between fn and the method name to indicate that it is a method, and whether the method takes a value or the address of the receiver.
Using a keyword to indicate that the first parameter is the receiver, as in C++’s “Deducing this” proposal.
Using a delimiter in the parameter list after the receiver type, but the main choice of ; seemed pretty subtle.
Putting the receiver pattern in a separate set of brackets. This could be an additional set of parens (…) before the explicit parameter list. Or the deduced parameters could be put in angle brackets <…> and the receiver in square brackets […].

Examples:

method (this: Self*) Set(n: Int);
fn [Me*].Set(n: Int);
fn ->Set(n: Int);
fn& Set(n: Int);
fn &Set(s: Self*, n: Int);
fn Set(this self: Self*, n: Int);
fn Set[this self: Self*](n: Int);
fn Set(self: Self*; n: Int);
fn Set(self: Self*)(n: Int);

Note: These examples are written using the parameter syntax decided in issue #542, though that does not match the syntax used in the discussion at the time.

Rationale

Full receiver type

We wanted the option to specify the type of the receiver. In C++, the type of this is controlled by way of special syntax, such as putting a const keyword at the end of the method declaration. We wanted to treat the receiver type more consistently with other parameter types, like is being considered in C++’s “Deducing this” proposal. That proposal also showed the value of including the entire type, rather than just part such as whether the parameter would be passed by value or address. We already plan to use this feature to conditionally include methods based on whether parameters to the type satisfy additional constraints. In this example, FixedArray(T, N) has a Print() method if T is Printable:

class FixedArray(T:! Type, N:! Int) {
  // ...
  fn Print[me: FixedArray(P:! Printable, N)]() { ... }
}

We did consider the option of making the type of the receiver optional. This is not what we decided to try first, but is an idea we would consider in the future.

Method names line up

Using the same introducer fn as functions, as decided in issue #463, along with the method name immediately afterwards means that function and method names line up in the same column visually.

struct IntContainer {
  // Non-methods for building instances
  fn MakeFromInts ... -> IntContainer;
  fn MakeRepeating ... -> IntContainer;

  // Methods
  fn Size ... -> Int;
  fn First ... -> Int;

  fn Clear ...;
  fn Append ...;
}

This is to ease scanning and to put the most important information up front.

We also considered using another 2-character introducer that would distinguish methods from associated functions, like me, but it did not garner much support. The name me was not evocative enough of “method”, and people generally preferred matching other function declarations. We also considered using different introducers to communicate whether the receiver was passed by value or by address, for example ro for “read-only” and rw for “read-write”.

Receiver in square brackets

Putting the receiver pattern inside the explicit parameter list, as is done in Python, would lead to an unfortunate mismatch between the arguments listed at the call and the function declaration. That motivated putting the receiver pattern inside square brackets ([…]) with deduced parameters. There were concerns, however, that this parameter did not act like deduced parameters.

Receiver parameter is named `me`

Having the receiver name be fixed is helpful for consistency. This benefits readers, and simplifies copying or moving code between functions. A fixed receiver name also allows us to identify the receiver pattern in the function declaration, and identify it as a method instead of an ordinary function associated with the type, like static methods in C++.

Finally, we wanted the bodies of methods to access members of the object using an explicit member access. This means we need the name used to access members to be short, and so we chose the name me.

Keyword to indicate pass by address

We considered using just whether the receiver type was a pointer type to indicate whether the receiver was passed by address, but we preferred type information to flow in one direction. Otherwise this would cause a problem if we want to allow deduction of the object parameter type. This deduction can’t be used to select between pointer and not pointer.

We also considered supporting reference types that could bind to lvalues without explicitly taking the address of the object, but references would have added a lot of complexity to the type system and we believed that they are not otherwise necessary.

We decided to adopt an approach where the parameter declaration can contain a marker for matching an argument’s value category separately from its type. This marker would be present for methods that mutate the object, and so requires that the receiver be an lvalue to match the pattern. The marker means “first take the address of the argument and then match the rest of the pattern.” We considered a few different possible ways to write this:

fn Set[&me: Self*](n: Int);
fn Set[*(me: Self*)](n: Int);
fn Set[ref me: Self*](n: Int);

We eventually settled on using a keyword addr.

fn Set[addr me: Self*](n: Int);

This doesn’t use up a symbol, makes it easier to find in search engines, and allows us to add more keywords in the future for other calling conventions, such as inout. We may even find that those other keywords have preferable semantics so we may eventually drop addr.

This keyword approach also is consistent with how we are marking template parameters with the template keyword, as decided in issue #565 on generic syntax.

Marking mutating methods at the call site

We considered making it visible at the call site when the receiver was passed by address, with the address-of operator &.

(&x).Set(4);

This would in effect make the mutating methods be methods on the pointer type rather than the class type.

Differences between functions and methods

Question-for-leads issue #494: Method syntax also considered what would be different between functions and methods:

Methods use a different calling syntax: x.F(n).
Methods distinguish between taking the receiver (x) by value or by pointer, without changing the call syntax.
Methods have to be declared in the body of the class.
Methods and associated functions both have access to private members of the class.
Only methods can opt in to using dynamic dispatch.
The receiver parameter to a method varies covariantly in inheritance unlike other parameter types.

Specifying linkage as part of the access modifier

We considered various access modifiers such as internal or private.internal that would enforce internal linkage in addition to restricted visibility. We decided to postpone such considerations for the time being.

Even if we do need explicit controls here, we did not feel the need to front-load that complexity right now and with relatively less time or experience to evaluate the tradeoffs. If and when linkage becomes a visible issue and important to discuss, we can add it. Until then, we can see how much mileage we can get out of purely implementation techniques.

Nominal data class

We needed some way of opting a nominal class into the fieldwise behavior of a data class. In Kotlin, you precede the class declaration with a data keyword. Carbon already has a way of marking types as having specific semantic properties, implementing interfaces. This leverages the support for interfaces in the language to be able to express things like “here is a blanket implementation of an interface for all data classes” in a consistent way.

We chose the name Data rather than DataClass for the interface since tuples implicitly implement the interface and were “product types” rather than “classes”.

Let constants

We wanted a consistent interpretation for let declarations in classes and function bodies. Experience from C++ const int variables, where we try constant-evaluation and then give the program different semantics based on whether such evaluation happened to work, suggested we wanted to instead require :! in all places where a value that’s usable during compilation is introduced.