C++ interop type mapping for integer and floating-point literals

Abstract
Problem
Background
- C++ literals
  - Integer literals
  - Floating-point literals
- Carbon literals
Proposal
Details
- Carbon to C++ type mapping
  - Integer literals
  - Floating-point literals
- C++ to Carbon type mapping
  - Integer literals
  - Floating-point literals
Future work
- More robust hexadecimal and binary literals in Carbon
Rationale
Alternatives considered
- Carbon to C++ type mapping
  - Use the C++ standard rules for hexadecimal and binary literals
  - Use the Clang way of determining the type
- C++ to Carbon type mapping
  - Use Carbon literal types

Abstract

Provides bidirectional mappings for types of integer and floating-point literals between Carbon and C++. For example, given a literal 123, defines the interop type.

Problem

This document addresses Carbon <-> C++ type mapping of integer and floating literals. This comes to use for example during overloading resolution of C++ functions, when a C++ function is called from Carbon with a literal as a call argument. It also appears for example when importing C++ macros to Carbon for example macros which define constants where the constant is a literal.

Background

C++ literals

Integer literals

Type of C++ integer literals

As specified in the C++ standard, the type of the literal is the first type from the list below in which the value can fit based on its suffix and the base of the literal.

Suffix	Decimal bases	Binary, octal or hexadecimal bases
(none)	`int`, `long`, `long long`	`int`, `unsigned`, `long`, `unsigned long`, `long long`, `unsigned long long`
`u` or `U`	`unsigned`, `unsigned long`, `unsigned long long`	`unsigned`, `unsigned long`, `unsigned long long`
`l` or `L`	`long`, `long long`	`long`, `unsigned long`, `long long`, `unsigned long long`
`u` or `U` and `l` or `L`	`unsigned long`, `unsigned long long`	`unsigned long`, `unsigned long long`
`ll` or `LL`	`long long`	`long long`, `unsigned long long`
`u` or `U` and `ll` or `LL`	`unsigned long long`	`unsigned long long`
`z` or `Z` (since C++23)	the signed version of `std::size_t`	the signed version of `std::size_t`, `std::size_t`
`u` or `U` and `z` or `Z` (since C++23)	`std::size_t`	`std::size_t`

For details on std::size_t, see cppreference.

If the value is too big to fit in any of these types, and an extended integer type exists, then it may be assigned an extended type. If all types are signed, then it may fit into a signed extended type. If all types are unsigned then it may be fitted into an unsigned extended type. If there are both signed and unsigned types in the list, then both signed and unsigned extended types are possible types. If the value can’t fit in any of the types, the program is ill-formed.

Clang

Clang diverges from the standard in that if the decimal integer literal doesn’t fit to a type from the list, instead of assigning an extended integer type, it assigns unsigned long long.

Example:

#include <iostream>

inline auto foo(long a) -> void { printf("hello from foo_long(%ld)", a); }
inline auto foo(unsigned long a) -> void { printf("hello from foo_unsigned_long(%lu)", a); }

inline auto foo(long long a) -> void { printf("hello from foo_long_long(%lld)", a); }
inline auto foo(unsigned long long a) -> void { printf("hello from foo_unsigned_long_long(%llu)", a); }

inline auto foo(__int128 a) -> void { printf("hello from foo___int128");}

int main() {
   foo(9223372036854775808);  // MAX_LONG + 1
   return 0;
}

Output:

<source>:12:9: warning: integer literal is too large to be represented in a signed integer type, interpreting as unsigned [-Wimplicitly-unsigned-literal]
  12 |     foo(9223372036854775808);
     |        ^
1 warning generated.
Execution build compiler returned: 0
Program returned: 0
hello from foo_unsigned_long_long(9223372036854775808)

Gcc

Gcc shows a warning that the value will be treated as an unsigned integer, but it actually assigns the type __int128 to it.

Output of the example above:

<source>:12:9: warning: integer constant is so large that it is unsigned
  12 |     foo(9223372036854775808);
     |         ^~~~~~~~~~~~~~~~~~~
Execution build compiler returned: 0
Program returned: 0
hello from foo___int128

Floating-point literals

The type of a floating literal is double unless explicitly specified by a suffix. When there is a suffix, then the suffix determines the type.

Suffix	Floating-point literal type
(none)	`double`
`f` or `F`	`float`
`l` or `L`	`long double`
`f16` or `F16` (since C++23)	`std::float16_t`
`f32` or `F32` (since C++23)	`std::float32_t`
`f64` or `F64` (since C++23)	`std::float64_t`
`f128` or `F128` (since C++23)	`std::float128_t`
`bf16` or `BF16` (since C++23)	`std::bfloat16_t`

If the value doesn’t fit the type, the program is ill-formed.

Carbon literals

Integer literals

Carbon has decimal, hexadecimal and binary integer literals. Example:

123 (decimal)
0x1FE (hexadecimal)
0b10 (binary)

There are no suffixes for the integer literal types.

Floating-point literals

Carbon supports decimal and hexadecimal floating-point literals. Example:

Decimal:
- 123.456
- 1.23456e791
Hexadecimal:
- 0x1.Ap123

Literal types

Carbon has literal types, currently Core.IntLiteral and Core.FloatLiteral. These are convertible to any type where they fit without any truncation or loss of precision.

At present, type deduction would retain the type of the literal. For example, let x: auto = 1; would result in x having type Core.IntLiteral. This is not the desired behavior, and so can be expected to change in the future.

No suffixes

Carbon literals have no suffix. The reasons for this are covered in Proposal #144’s alternative “Use an ordinary integer or floating-point type for literals”.

Proposal

Carbon literal to C++ type: A Carbon literal will be given a C++ type according to the C++ rules when used in C++ context, such as calling a C++ function.
C++ literal to Carbon type: A C++ literal will be given a C++ type following the C++ rules, which will then be mapped to a Carbon type as defined in primitive types mapping.
- For example, 1 becomes int according to C++ rules, and int maps to i32 in Carbon.

Details

Carbon to C++ type mapping

Integer literals

There are no suffixes in Carbon for the integer types, so a Carbon decimal integer literal will follow the C++ rules for decimal integers. Carbon also doesn’t make a distinction between hexadecimal and binary literals, so these literals will also follow the C++ rules for decimal integers without a suffix.

The first type from this list in which the value can fit will be selected:

int
long
long long
__int128

If the value doesn’t fit in any of these types, the program will be ill-formed and diagnosed with an error. This is intended to match the C++ standard behavior, instead of Clang’s -Wimplicitly-unsigned-literal behavior.

Decimal numbers are most commonly used as integer literals, so this should match most of the existing C++ calls. To match the C++ calls with a non-decimal literal argument, an explicit unsigned type will need to be provided. For example, 0xDEADBEEF as u32.

Floating-point literals

As there are no suffixes in Carbon for the floating-point literals, a Carbon floating-point literal will map to C++ double. If the value doesn’t fit to double, the program is ill-formed.

C++ to Carbon type mapping

Integer literals

A C++ integer literal will be given a C++ integer type following the C++ rules, which will then be mapped to a Carbon type as defined in primitive types mapping.

C++ literal suffix	Carbon type with decimal C++ integer literals	Carbon type with hexadecimal, binary and octal C++ integer literals
(none)	`Cpp.int`, `Cpp.long`, `Cpp.long_long`	`Cpp.int`, `Cpp.unsigned_int`, `Cpp.long`, `Cpp.unsigned_long`, `Cpp.long_long`, `Cpp.unsigned_long_long`
`u` or `U`	`Cpp.unsigned_int`, `Cpp.unsigned_long`, `Cpp.unsigned_long_long`	`Cpp.unsigned_int`, `Cpp.unsigned_long`, `Cpp.unsigned_long_long`
`l` or `L`	`Cpp.long`, `Cpp.long_long`	`Cpp.long`, `Cpp.unsigned_long`, `Cpp.long_long`, `Cpp.unsigned_long_long`
`u` or `U` and `l` or `L`	`Cpp.unsigned_long`, `Cpp.unsigned_long_long`	`Cpp.unsigned_long`, `Cpp.unsigned_long_long`
`ll` or `LL`	`Cpp.long_long`	`Cpp.long_long`, `Cpp.unsigned_long_long`
`u` or `U` and `ll` or `LL`	`Cpp.unsigned_long_long`	`Cpp.unsigned_long_long`
`z` or `Z` (since C++23)	`Cpp.uintptr_t`	`Cpp.uintptr_t`, `Cpp.size_t`
`u` or `U` and `z` or `Z` (since C++23)	`Cpp.size_t`	`Cpp.size_t`

Floating-point literals

A C++ floating literal will be given a C++ type following the C++ rules, which will then be mapped to a Carbon type as defined in primitive types mapping. That means:

C++ floating-point literal suffix	Carbon floating-point literal type
(none)	`f64`
`f` or `F`	`f32`
`l` or `L`	`Cpp.long_double`
`f16` or `F16` (since C++23)	`f16`
`f32` or `F32` (since C++23)	TBD
`f64` or `F64` (since C++23)	TBD
`f128` or `F128` (since C++23)	`f128`
`bf16` or `BF16` (since C++23)	TBD

Future work

More robust hexadecimal and binary literals in Carbon

We may want to use C++ standard rules for Carbon’s hexadecimal and binary literals, per the alternative below. However, that requires addressing how that should be handled in the integer literal space; for example, either adding more information to Core.IntLiteral or a new type, as well as addressing what kind of type is produced when performing arithmetic on mixed literal types. That may be desirable, but addressing it here would substantially increase the scope of this proposal.

Rationale

This work will contribute to Carbon’s goal for seamless interoperability with C++ (Interoperability with and migration from existing C++ code), by keeping the consistency with the existing C++ usage.

Alternatives considered

Carbon to C++ type mapping

Use the C++ standard rules for hexadecimal and binary literals

For example, when dealing with 0xFFFF_FFFF, we could make that be interpreted as unsigned instead of long. This will be discernible in cases of overload resolution and type deduction.

For example:

import Cpp inline '''
void f(unsigned);

void g(unsigned);
void g(long);

template<typename T> void h(T);
''';

fn CallF() {
  // OK in both Carbon and C++.
  // We select the f(unsigned) overload, and can call it.
  Cpp.f(0xFFFF_FFFF);
}

fn CallG() {
  // Would call g(unsigned) in C++.
  // Calls g(long) in Carbon.
  Cpp.g(0xFFFF_FFFF);
}

fn CallH() {
  // Would call h<unsigned> in C++.
  // Calls h<long> in Carbon.
  Cpp.h(0xFFFF_FFFF);
}

Advantages:

This would allow using the unsigned types for the cases where it’s important (for example bit manipulation). This affects understandability for interactions on interop boundaries, where developers might reasonably expect that 0xFFFF_FFFF would behave identically in Carbon and C++.

Disadvantages:

The current integer literal design doesn’t support this distinction.
These cases seem to be less often used than the decimal literals.

This decision is probably good enough for now, although we should consider it for future work.

Use the Clang way of determining the type

As described in background, we could use the biggest unsigned integer type currently supported for C++ (unsigned __int128) if the value doesn’t fit to int, long, or long long.

Advantages:

This will match the C++ calls for the users that use Clang.

Disadvantages:

Using __int128 gives a consistently signed type interpretation regardless of bit width. This is also consistent to how we plan to require an explicit unsigned conversion.

It’s not clear how much this issue would arise, so we believe a consistent signed type interpretation is a good default.

C++ to Carbon type mapping

Use Carbon literal types

In other words, import a literal to a Carbon literal type; Core.IntLiteral or Core.FloatLiteral.

Advantages:

This would allow determining the type later according to the Carbon rules.

Disadvantages:

This may not be always feasible. For example, sometimes a variable or constant needs to be initialized with the value of the literal.

Rather than trying to detect feasibility, we are choosing to assign a sized numeric type.

C++ interop type mapping for integer and floating-point literals

Table of contents

Abstract

Problem

Background

C++ literals

Integer literals

Type of C++ integer literals

Clang

Gcc

Floating-point literals

Carbon literals

Integer literals

Floating-point literals

Literal types

No suffixes

Proposal

Details

Carbon to C++ type mapping

Integer literals

Floating-point literals

C++ to Carbon type mapping

Integer literals

Floating-point literals

Future work

More robust hexadecimal and binary literals in Carbon

Rationale

Alternatives considered

Carbon to C++ type mapping

Use the C++ standard rules for hexadecimal and binary literals

Use the Clang way of determining the type

C++ to Carbon type mapping

Use Carbon literal types