The Core.Array type for direct-storage immutably-sized buffers

Pull request

Table of contents

Abstract

We propose to add Core.Array(T, N) as a library type in the prelude library of the Core package. Since arrays are a very frequent type, we propose to privilege use of this type by providing a builtin keyword array(T, N) that resolves to the Core.Array(T, N) type.

Problem

Carbon’s current syntax for a fixed-size, direct storage array (hereafter called “array”) is the provisional [T; N] and there is no syntax yet for a mutably-sized indirect storage buffer (hereafter called “heap-buffer”).

Arrays and heap-buffers are some of the most commonly used types, after fundamental types. The syntax, whatever it is, will be incredibly frequent in Carbon source code.

We explore and propose a new syntax for arrays that addresses design issues with the provisional syntax that allows for writing each of the following in clear ways: slice, compile-time sized slice, array, and pointer to array. And that leaves clear room for a sibling indirect-storage type.

Background

We have developed a matrix for enumerating and describing the vocabulary of owning array and buffer types. Direct refers to an in-place storage buffer, as with arrays. Indirect refers to heap allocation, where the type itself holds storage of a pointer to the buffer, as with heap-buffers.

Here we are discussing the location of storage (direct vs indirect) as a way to categorize types. Indirect-storage types may, for small payloads, store state directly in its fields (such as with the Small String Optimization), but this is an optimization for specific payloads and the category of the type remains an indirect-storage type.

To provide familiarity, here is the table for the C++ language as a baseline:

Owning type Runtime Sized Compile-time Sized
Direct, Immutable Size - T[N] / std::array<T, N>
Indirect, Immutable Size std::unique_ptr<T[]> std::unique_ptr<T[N]>
Indirect, Mutable Size std::vector<T> -

Rust

The Rust vocabulary is as follows:

Owning type Runtime Sized Compile-time Sized
Direct, Immutable Size - [T; N]
Indirect, Immutable Size Box<[T]> Box<[T; N]>
Indirect, Mutable Size Vec<T> -

There are a few things of note when comparing to C++:

  • The Rust Box and Vec types are part of std but are imported into the current scope automatically, so they do not need any prefix.
  • The [T] type represents a fixed-runtime-size buffer. The type itself is not instantiable since its size is not known at compile time. Box is specialized for the type to store a runtime size in its own type.
  • The array type syntax matches the Carbon provisional syntax.
  • The heap-buffer type name matches the C++ vector type, but it is privileged with a shorter name. The Vec type name is at most the same length as an array type name (for the same T).

Swift

The Swift vocabulary is significantly smaller, to support automatic refcounting:

Owning type Runtime Sized Compile-time Sized
Direct, Immutable Size InlineArray<T> -
Indirect, Immutable Size - -
Indirect, Mutable Size Array<T> / [T] -

Because there was historically no direct storage option, only one name was needed, and “Array” was used to refer to a heap-buffer.

On Feb 5 2025, a proposal was accepted to introduce InlineArray<T> for the direct storage immutably sized array type. Because “Array” is already taken, the original proposal called this new type “Vector” in reference to mathematical vectors. The choice of name was heavily discussed however, due to the confusion with C++’s std::vector and Rust’s std::vec::Vec. It was provisionally renamed to Slab but settled on InlineArray.

Safe C++

The Safe C++ proposal introduces array syntax very similar to Rust:

Owning type Runtime Sized Compile-time Sized
Direct, Immutable Size - [T; N]
Indirect, Immutable Size std2::box<[T; dyn]> std2::box<[T; N]>
Indirect, Mutable Size std2::vector<T> -

There are a few things of note:

  • While Rust omits a size to indicate the size is known only at runtime, Safe C++ uses a dyn keyword indicate the same.
  • The heap-buffer type name is unchanged from C++, sticking with vector.

Goals

It will help to establish some goals in order to weigh alternatives against. These goals are based on the open discussion from 2024-12-05, where we discussed the Pointers, Arrays, Slices document.

The goals here are largely informed by and trying to achieve the top-level goal of “Code that is easy to read, understand, and write”. We define some more specific targets here as relate to the specifics of the array syntax.

Privileging the most common type names

  • “Explicitness must be balanced against conciseness, as verbosity and ceremony add cognitive overhead for the reader, while explicitness reduces the amount of outside context the reader must have or assume.”

The more common it will be for a type to be used, the shorter we would like the name to be. This follows from the presumption that we weigh conciseness as increasingly valuable for types that will appear more frequently in Carbon code.

We expect the ordering of frequency in Carbon code to be:

  • fundamental types ≈ tuples » heap-buffers > arrays » everything else1.

Where fundamental types are: machine-sized integers (8 bit, 16 bit, etc.), machine-sized floating points, and pointers including slices2. Function parameters/arguments are an example of tuples.

From this, we derive that we want:

  • Fundamental types and tuples to have the most concise names.
    • We can lean on special syntax or keywords as needed to make them concise but descriptive.
  • Heap-buffers to have a concise name, even more so than arrays.
    • We could use special syntax or keywords if needed to achieve conciseness.
  • Arrays to have a concise name, but they do not need to be comparably concise to fundamental types and tuples.
    • We should try to avoid special syntax.
  • Everything else should be written as idiomatic types with descriptive names.

Absence of syntax should make clear defaults

One way to write arrays and compile-time-sized slices is like we see in Rust: [T; N] and &[T; N]. This suggests a relationship where array is like slice, and the default form. But they are very different types, rather than a modification of a single type, and this can be confusing3 for developers learning the language.

We want to avoid the situation where absence of syntax, such as a missing pointer indicator, changes the entire meaning of the remaining syntax or is otherwise confusing.

Avoiding confusion with other languages

The most general meaning of “array” is a range of consecutive values in memory.

However in many languages it is used, either in formally or informally, to refer to a direct-storage, immutably-sized memory range:

In particular, this is the usage in the languages which Carbon will most frequently interoperate, and/or from which code will be migrated to Carbon and thus comments and variable names would use these terms in this way.

Languages which require shared ownership don’t have direct-storage arrays, so the same term gets used for indirect storage:

  • Swift, Array
    • As noted earlier, Swift is in the processes of adding a direct-storage array called InlineArray. Backwards compatibility prevents the use of Array for this.
  • Javascript, Array
  • Java and Kotlin, ArrayList

And some languages use array to refer to both direct and indirect storage types.

  • Dlang has direct-storage arrays colloqually and the indirect-storage Array type.
  • Pascal uses the presence or absence of a size to determine if Array uses direct (immutably-sized) or indirect (mutably-sized) storage.

In sum, languages which have direct-storage immutably-sized arrays use the term “array” to refer to those, and most then use a separate name for the indirect-storage type.

Avoiding confusion with other domains

The term “vector” in mathematics refers to a fixed-size set of numbers. This leads to confusion with the C++ type std::vector since it holds a mutably-sized set of values. Developers coming from other domains must learn a new and contradictory term of art. The Rust language chose naming that derives from C++, with std::vec::Vec.

These type names conflict with names in mathematical and graphics libraries, which want to use vector in its mathematical sense. In Rust this leads to Vec for a mutably-sized array, and Vec3, Vec4, and so on for fixed-size mathematical vectors. While not fatal, this does create ambiguity that must be overcome by developers.

No predeclared identifiers

Recently the proposal p4864: No predeclared identifiers, Core is a keyword clarified a direction for the Carbon language, wherein there will not be implicit imports from the Core library. Anything accessible directly in the language, rather than through a package name, is done so through a builtin keyword. This ensures that raw identifier syntax is always available for those same names in Carbon code.

Proposal

The All APIs are library APIs principle states:

In Carbon, every public function is declared in some Carbon API file.

As such, we propose a Core library type for a direct-storage immutably-sized array, and then a builtin shorthand for referring to that library type.

In line with other languages surveyed above, given the presence of a direct-storage immutably-sized array in Carbon, we will reserve the unqualified name “array” for this type. In full, its name is Core.Array(T, N), where T is the type of elements in the array, and N is the number of elements. Notably this leaves room for supporting multi-dimensional arrays by adding further optional size parameters, either in the Array type or in a similar sibling type.

Here is a provisional vocabulary table to compare with other languages:

Owning type Runtime Sized Compile-time Sized
Direct, Immutable Size - array(T, N) / Core.Array(T, N)
Indirect, Immutable Size ? Core.Box(Array(T, N))
Indirect, Mutable Size Core.Buf(T) -

Carbon does not have proposed names for heap-allocated storage, so we use some placeholders here, in order to show where Array fits into the picture:

  • Box(T) for a heap-allocated T value.
  • Buf(T) for a heap-buffer of T values.

An indirect, immutably-sized buffer does not have a clearly expressible syntax at the moment. Box([T]) is the closest fit with the current provisional syntax for slices. But [T] is a sized pointer, which would make this type a heap-allocated sized pointer, rather than a heap-allocated fixed-size array. This is in contrast with Rust where &[T] is a slice, and [T] is a fixed-size buffer; so it then follows that Box<[T]> is a heap-allocated fixed-size buffer.

Because arrays will be very common in Carbon code, we want to privilege their usage. There are at least two ways in which we can do so. The first is to include them in the prelude library of the Core package. This ensures they are available in every Carbon file as Core.Array(T, N). The second is by making the type available through a shorthand without being qualfiied by the Core package name. We propose the array(T, N) builtin keyword as that shorthand.

Rationale

As this proposal is addressing the question of introducing a new prelude library type in Core, it is mostly focused on the goal Code that is easy to read, understand, and write

This proposal aims to make code easy to understand by using a name that is consistent across systems programming languages, and avoiding names that have conflicting meaning. It also uses a standard type syntax, with a type in the Core package, making the type and its documentation maximally discoverable without requiring special-casing.

We introduced some more specific sub-goals above:

  1. Privileging the most common type names

This proposal privileges Core.Array as it will appear frequently in code, by placing it in the prelude library. This avoids the need for developers to import another Core library in order to access the type.

Since array types are expected to be very frequent, we also propose an array builtin keyword as a shorthand. This is in line with the No predeclared identifiers proposal, and uses a lowercase spelling to mark the word as a builtin. The spelling of array(T, N) will resolve to the library type Core.Array(T, N).

In this proposal, we avoid introducing additional syntax (such as with [T; N] or (1, 2)) because the frequency of use of arrays will be lower than that of fundamental types and tuples.

  1. Absence of syntax should make clear defaults

We introduce a type name, with a keyword that has a clear relationship to the generic type name, rather than making arrays look more like slices but without being a pointer. This is maent to avoid the confusion raised when removing syntax changes the meaning significantly, and especially in ways that differ from defaults/options for a single language concept.

  1. Avoiding confusion with other languages

We propose using the Array type name, and array shorthand, in line with how other languages use the same term. When a direct-storage array type is part of the language, it’s consistently referred to as an “array” without qualifications.

Most importantly, the name is consistent with the meaning in C++ and its standard library (std::array<T, N>) as well as with Rust, the languages which we expect Carbon code to interact with the most.

  1. Avoiding confusion with other domains

The name Vector is a possible choice for a fixed-length set of values, due to its mathematical meaning, as was originally proposed for the direct-storage immutably-sized array type in Swift. However any use of the name Vector in a core systems programming language construct is fraught. Either the name is to be incorrectly confused with a mathematical vector or with a C++ std::vector. We avoid the confusion by avoiding this name.

Future work

Namespacing the Core package

At this time, the Core package remains small, but there will come a time where the names within need to be split into smaller namespaces. Then the name Core.Array, among others, will become longer and the act of previleging the name through the array keyword will become more pronounced and helpful. At this time, we don’t propose to put Array into a namespace in Core as there’s no such existing structure to point to yet.

Alternatives considered

[T; N] builtin syntax

This is the current syntax used by the toolchain, however it had the following problems raised:

  • It’s very similar to the syntax for slices, which is [T], but very different in nature, being storage instead of a reference to storage.
  • Given [T] is a slice, [T; N] would better suit a compile-time-sized slice.

The syntax for a slice may also be changed, we discussed adding a pointer annotation to it, such as [T]* and [T; N]*. Some downsides remained:

  • The [T; N]* syntax would be a fixed-size slice, rather than a pointer to an array. This leaves no room for writing a pointer to an array, which can indicate a different intent, that it always includes the full memory range of the array. Without this distinction, we can’t model both std::span<T, N> and std::array<T, N>* in code migrated from C++ to Carbon and would need to collapse these to a single type.
  • Removing the pointer annotation would change the meaning of the type expression more then we’d like, since it would change from a slice into an array, rather than pointer-to-an-array into an array.

array [T; N] builtin syntax

This introduces a keyword as a modifier of a fixed-size slice, rather than a builtin forwarding type. While arrays will be very common, it’s not clear that they rise to the level of requiring breaking the languages naming rules (using a lowercase name) in order to provide a shorthand. And the shorthand is longer in the end than the Array(T, N) being proposed here. So this uses a larger weirdness budget for privileging the type while achieving less conciseness.

This has a similar issue as with [T; N] but in the reverse. Removing the array modifier keyword changes the meaning of the type expression in ways that are larger than a default/modifier relationship. Fixed-size slices are not the more-default array.

The use of a lowercase keyword also costs us by preventing users from using the word array in variables, a name which is quite common.

Just the Core.Array(T, N) library type

Providing just the library type is possible, but arrays will be one of the most common types in Carbon code, as described earlier. Privileging them with a shorthand that avoids Core. will help make Carbon code significantly more concise, due to the frequency, without hurting understandability. This makes it worth the tradeoff of putting a name into the file scope (by way of a builtin type).

Implicitly importing Core.Array(T, N) to the file scope

We considered a direction where some subset of names in the Core packages’s prelude library are imported automatically to the file scope. This would be similar to the Rust std::prelude module, which names aliases that are pulled into the global scope.

Importing names poses challenges for migrating code to Carbon. Names imported from a library may allow shadowing in other scopes, which would create ambiguity about the meaning of these names. And names imported from a library would not be avoidable using the raw identifier syntax: Array and r#Array would both refer to the same imported Core.Array type. This would present a challenge for code migrated to Carbon which uses the name Array in its own types or variables. Whereas with a builtin keyword, array and r#array refer to different things: The first is the keyword, and the second is a name that the developer can use freely for other purposes.

The No predeclared identifiers, Core is a keyword proposal discusses in more detail why this approach was not taken.

  1. “[chandlerc] Prioritize: slices first, then [resizable storage], then compile-time sized storage, then everything else is vastly less common. Between those three, the difference in frequency between the first two is the biggest.” from open discussion on 2024-12-05 

  2. Slices are included with fundamental types for simplicity, since they will take the place of many pointers in C++, giving them similar frequency to pointers, and can be logically thought of as a bounded pointer. 

  3. https://fire.asta.lgbt/notes/a1iay7r3e7or0a59 (content-warning: swearing) 

  4. Maybe this is more formal than colloquial, but the name is not part of the typename/syntax.