Weaken digit separator placement rules
Table of contents
Abstract
Proposal #143: Numeric literals added digit separators with strict rules for placement. It missed some use-cases. In order to address this, remove placement rules for numeric literals.
Problem
Digit separator placement rules are too strict:
- For integers, while the original proposal mentioned Indian digit groupings, Chinese, Japanese, and Korean cultures use 4-digit groupings. This oversight is likely due to the description on wikipedia: “but the delimiter commonly separates every three digits”.
- There are microformats where different placement rules may be desirable. See alternatives for more specific examples.
Background
Proposal #143: Numeric literals added digit separators with strict rules for placement:
- For decimal integers, the digit separators shall occur every three digits starting from the right. For example,
2_147_483_648
. - For hexadecimal integers, the digit separators shall occur every four digits starting from the right. For example,
0x7FFF_FFFF
. - For real number literals, digit separators can appear in the decimal and hexadecimal integer portions (prior to the period and after the optional
e
or mandatoryp
) as described in the previous bullets. For example,2_147.483648e12_345
or0x1_00CA.FEF00Dp+24
- For binary literals, digit separators can appear between any two digits. For example,
0b1_000_101_11
.
This was asked on Issue #1485: Reconsider digit separators, where a proposal was requested.
Proposal
Switch to simple rules: a single digit separator can appear between any two digits. For example:
- Decimal literals:
1_2_3
- Hexadecimal literals:
0xA_B_C
- Real literals:
1_2.3_4e5_6
- Binary literals are unaffected.
Rationale
- Community and culture
- Removing restrictions on decimal literals provides flexibility for developers who don’t use 3-digit groupings.
- Code that is easy to read, understand, and write
- This allows for mistakes in groupings, which is undesirable. However, it’s useful for microformats to be able to provide their own particular groupings.
- Interoperability with and migration from existing C++ code
- Reduces migration hurdles for C++ code by providing a rule that’s more consistent with the C++ rule.
Alternatives considered
3-digit decimal groupings
For decimal separators, we could enforce 3-digit groupings if digit separators are used at all.
Advantages:
- More enforcement of digit groupings can prevent bugs and misreadings resulting from accidentally incorrect groupings.
Disadvantages:
- Indian, Chinese, Japanese, and Korean digit grouping conventions don’t fit under this rule.
- These are worth considering similar to how Carbon allows Unicode identifiers: although keywords are in English, we can be more flexible for both identifiers and literals.
- Microformats could also pose issues. For example, microseconds (
1_00000
), dates (01_12_1983
), credit cards (1234_5678_9012_3456
), or identity numbers such as US social security numbers (123_45_6789
).- Since the original proposal, more cases have been raised.
Note that any regular grouping rule can present similar issues for Indian digit grouping conventions.
Proposal #143: Numeric literals chose 3-digit decimal groupings.
Given there are overall advantages to not enforcing regular digit conventions, including for hex digits, it seems unnecessary to conform to the currently established decimal conventions. While this removes the enforcement, the resulting accidents are considered a reasonable risk.
In theory a code linter could be told to prefer certain formats with options to change behavior, although that may remain too low benefit to implement.
2-digit or 4-digit hexadecimal digit groupings
Hexadecimal digit groupings could be enforced along two axes:
- Every 2 (
F_FF
) or 4 (F_FFFF
) characters, corresponding to one or two bytes respectively. - Regular or irregular placement. For example, if a digit separator is placed every byte (every other digit), we could require regular placement every byte as in
FF_FF_FF_FF
, or irregular as inFF_FFFFF_FF
which skips one placement.
Proposal #143: Numeric literals chose 4-digit with regular placement.
Advantages:
- More enforcement of digit groupings can prevent bugs and misreadings resulting from accidentally incorrect groupings.
Disadvantages:
- Again, microformats become an issue.
- As noted in background, MAC addresses are typically in byte pairs (
00_B0_D0_63_C2_26
) and UUIDs have particular groupings (123E4567_E89B_12D3_A456_426614174000
).
- As noted in background, MAC addresses are typically in byte pairs (
- If all of the other literal formats (decimal, real, and binary) allow arbitrary digit separator placement, enforcing placement in hexadecimal numbers would be an outlier.
While it may be that enforcing 2 character (one byte) groupings with irregular placement would prevent sufficient errors versus the developer inconvenience risks, it risks becoming an outlier and for only very loose enforcement.
Similar to the preceding conclusion, leaving these issues to code linters may be best.
Disallow digit separators in fractions
Proposal #143: Numeric literals appears to disallow digit separators in fractions. That is, in 1.2345
, 1.23_45
is disallowed. This proposal changes that.
Advantages:
- Reduces the chance of confusion resulting from mistaking a
.
for another_
separator, as in1_234_567.890_123_456
.
Disadvantages:
- While sometimes commas aren’t written for digit separators, it’s notable that SI advice is to use digit spacing instead (Digit spacing, rule #16). As a consequence, it’s likely common to want some kind of digit separator after the decimal point.
- Would create an inconsistency in placement rules.
We’ll allow digit separators in fractions.