CharLiteral difference should be an IntLiteral
Table of contents
Abstract
Change the result type CharLiteral - CharLiteral from i32 to Core.IntLiteral.
Problem
Per p006710, the subtraction of two character literals (for example 'b' - 'a') is of type i32. This introduces two main problems:
- Inconsistency with other literals: In Carbon, operations on literals (like
IntLiteral - IntLiteral) yield other literal types (such asIntLiteral). This allows them to be used in template contexts and to be implicitly converted to other types as appropriate. HavingCharLiteral - CharLiteralproduce a concrete, fixed type likei32is an exception to this rule. For example,'b' - 'a'should be usable as a template argument of typeCore.IntLiteraland should be implicitly convertible to any integer type (likei64oru8) depending on the value of the difference, rather than being locked toi32. - Layering issue in the toolchain: In the toolchain implementation,
Coretypes and arithmetic operators are defined in a layered, sequential fashion in the prelude, and this layering is enforced by the orphan rule.Core.CharLiteralandCore.IntLiteralare fundamental, primitive types that the compiler knows about intrinsically. However,i32is not a primitive type; it is a type alias forCore.Int(32), which is defined incore/prelude/types/int.carbon. IfCharLiteral - CharLiteralproducesi32, then the definition ofCharLiteraloperations incore/prelude/operators/arithmetic.carbonmust depend oni32, creating a circular dependency where the definition of primitive arithmetic operators needs to know about a specific concrete integer type that hasn’t been fully defined yet. The only solution to this that satisfies the orphan rule is reverse the dependency edge, soarithmetic.carbondepends onint.carbon, and to define allCore.Intoperations inarithmetic.carbon. By producingCore.IntLiteral, we keep the integer types self-contained.
Background
In p006710-char-redesign, character literals were redesigned. As part of that, the difference of two character literals was defined as producing i32, with the rationale:
The difference of two characters produces an
i32. This is preferred even forcharto be consistent with the range needed to represent the difference of twoCore.CharLiteralvalues.
CharLiteral values are in the range [0, 0x10FFFF], so the smallest fixed-width power-of-two-sized type their differences fit within is indeed i32. However, we did not consider using a literal type, nor the layering impact of this choice.
Proposal
Change the type of CharLiteral - CharLiteral to be IntLiteral.
Example
With this change, subtracting two character literals results in an IntLiteral, which is evaluated at compile time and can be implicitly converted to any integer type that can represent the value:
// 'b' - 'a' produces an IntLiteral with value 1.
// This can initialize a u8, i64, etc.
var offset_u8: u8 = 'b' - 'a';
var offset_i64: i64 = 'b' - 'a';
// Error, `u8` cannot represent -1.
var offset_u8: u8 = 'a' - 'b';
// OK, result fits in `u16`.
var table_size: u16 = '\u{10FFFF}' - '\0' + 1;
Rationale
This proposal advances Carbon’s goals and principles in the following ways:
- Language tools and ecosystem: By resolving the layering issue, we simplify the prelude implementation.
- Code that is easy to read, understand, and write: The change provides consistency across all literal types. Developers do not need to remember that subtracting character literals produces a concrete
i32while other literal operations produce literal types. It also enables more use cases, such as using the difference of two character literals to initialize an unsigned type without explicit and potentially lossy casts.
Future work
This proposal does not consider making any changes to the type of char - char, which will therefore continue to produce i32.
This is somewhat arbitrary, especially as the old rationale of consistency with CharLiteral is gone. The following options may be worth considering in a future proposal:
- Use
i9, as the smallest type that can fit all possible results. - Use
i16, as the smallest power-of-2 type that can fit all possible results. - Keep
i32, as it is expected to be the “normal” integer type.
Alternatives considered
Use a fixed result type
We could keep the result type of CharLiteral - CharLiteral as i32 as originally specified in p006710-char-redesign, or similarly pick a different fixed type such as i16.
- Disadvantages:
- Preserves the layering issue in the toolchain.
- Creates an inconsistency in the language design between character literals and other literal types.
- Requires explicit casts (for example,
as u8oras i64) when the difference is used to initialize non-i32types, even though the value is a compile-time constant that is guaranteed to fit in the target type.