Numeric type literal syntax

Pull request

Table of contents

Problem

We want to establish a syntax for fixed-size scalar number types. These types include the two’s complement signed integer, the unsigned integer, and the floating-point number.

As these types are pervasive throughout the language, our goal here is to align on a terse, convenient, yet understandable, and ergonomic syntax to the author.

Background

For developer convenience, names are given to number types that map to native machine register widths. These sizes typically include 8-bit, 16-bit, 32-bit, 64-bit, and, more recently, 128-bit widths.

For example, in C++11+, integer types such as int8_t (8-bit two’s complement signed integer type) and uint16_t (16-bit unsigned integer type) exist, among similar types for 32- and 64-bit values. Correspondingly, you have the i8 and u16 (among others) scalar integer types in Rust. And in Swift, the Int8 and UInt16 (among others) integer value types.

In each case, the intent is to provide a clear and pragmatic syntax.

Additional discussion around this proposal’s background can be found in #543.

Proposal

We introduce a simple keyword-like syntax of iN, uN, and fN for two’s complement integers, unsigned integers, and floating-point numbers, respectively. Where N can be a positive multiple of 8, including the common power-of-two sizes (for example, N = 8, 16, 32). We think of these as “type literals” just like 7 is a “numeric literal.” This structure follows the successful precedent set by Rust and LLVM development communities and potentially saves 40% or more on characters required compared to other options such as IntN (for example, i16 versus Int16). While bit sizes greater than 128-bits will be well-supported, some operations like division will not be available on these large sizes.

Non-goals

  • This does not address any considerations around the bool type
  • This does not provide a formal plan for the shape or mapping of the underlying types (#767 comments)
  • This does not prescribe an official grammar for parsing these types
  • This proposal does not address other, non-multiple of 8 bit sizes, such as those used in a bit field

Details

Syntax

The syntax for a two’s complement signed integer, the unsigned integer, and the floating-point number corresponds to a lowercase ‘i’, ‘u’, or ‘f’ character, respectively, indicating the type followed by a numeric value specifying the width.

As a regular expression, this can be illustrated as:

([iuf])([1-9][0-9]*)

Capture group 1 indicates either an ‘i’ for a two’s complement signed integer type, a ‘u’ for an unsigned integer type, or an ‘f’ for an IEEE-754 binary floating-point number type. Capture group 2 specifies the width in bits. Note that this bit width is restricted to a multiple of 8.

Examples of this syntax include:

  • i16 - A 16-bit two’s complement signed integer type
  • u32 - A 32-bit unsigned integer type
  • f64 - A 64-bit IEEE-754 binary floating-point number type

Usage

package sample api;

fn Sum(x: i32, y: i32) -> i32 {
  return x + y;
}

fn Main() -> i32 {
  return Sum(4, 2);
}

In the above example, Sum has parameters x and y, each of which is typed as a 32-bit two’s complement signed integer. Main then returns the output of Sum as a 32-bit two’s complement signed integer.

Rationale

Following Carbon’s goal to facilitate “Code that is easy to read, understand, and write”, an explicit goal is to provide excellent ergonomics.

Highlighting relevant aspects of this from the project goals:

  • Carbon should not use symbols that are difficult to type, see, or differentiate from similar symbols in commonly used contexts.
  • Syntax should be easily parsed and scanned by any human in any development environment, not just a machine or a human aided by semantic hints from an IDE.
  • Explicitness must be balanced against conciseness, as verbosity and ceremony add cognitive overhead for the reader, while explicitness reduces the amount of outside context the reader must have or assume.

The type system syntax must also complement Carbon’s target for “Performance-critical software”

Specifically, there should be “No need for a lower level language.”

  • Developers should not need to leave the rules and structure of Carbon, whether to gain control over performance problems or to gain access to hardware facilities.

Alternatives considered

As discussed in #543, four other options were considered:

C++ LP64 convention

Where char is the 8-bit type, short is the 16-bit type, int is the 32-bit type, long is the 64-bit type.

Advantages:

  • The type name indicates its use to the reader
  • There is an existing precedent of this pattern in many programming languages, including C++
  • In the case of a typo, potentially better compiler checks versus an abbreviated form (for example, i332)

Disadvantages:

  • The type names themselves, as compared to the actual width and potentially use often can be arbitrary and confusing
  • The names themselves can be longer than the other syntax options
  • Some common C++ implementations use other models, which may create confusion when interoperating with C++ code. For example, Windows uses the LLP64 model, where long is a 32-bit type, so Carbon code and C++ on Windows would have different and incompatible definitions for long.

Type name with length suffix

Complete type name with a length-specifying suffix - int8, int16, int32, int64, uint32, float64.

Advantages:

  • Are more explicit than an abbreviated version
  • Stand out against similar variable names, for example, i8 versus i = 8)

Disadvantages:

  • Contain additional verbosity for potentially a non-significant amount of clarity
  • There are precedents from other communities (for example, Rust) that indicate authors enjoy a more compact syntax

Uppercase suffixes

The suffix can be upper - Int8, UInt8, Float16; I8, U8, F16.

Advantages:

  • May help screen readers distinguish the type

Disadvantages:

  • Can be visually similar to other values, for example, I8 versus l8 (second is a lowercase L)

Additional bit sizes

Support for additional bit sizes such as all bit sizes or common powers of two.

Advantages:

  • Adds flexibility and convenience for further use cases such as bit fields

Disadvantages:

  • May increase chances of typos without strong compiler guards, for example, i32 versus i22 versus i23
  • Variables such as i1 and i2 already exist in C++ code in practice (example1, example2, example3)
  • Adds complexity through additional size rules, for example, we can’t support pointers to arbitrary bits
  • Adds confusion in syntactical overlap, for example, i1, il, i18, and i18n