Whitespace

Table of contents

Overview

The exact lexical form of Carbon whitespace has not yet been settled. However, Carbon will follow lexical conventions for whitespace based on Unicode Annex #31. TODO: Update this once the precise rules are decided; see the Unicode source files proposal.

Unicode Annex #31 suggests selecting whitespace characters based on the characters with Unicode property Pattern_White_Space, which is currently these 11 characters:

  • Horizontal whitespace:
    • U+0009 CHARACTER TABULATION (horizontal tab)
    • U+0020 SPACE
    • U+200E LEFT-TO-RIGHT MARK
    • U+200F RIGHT-TO-LEFT MARK
  • Vertical whitespace:
    • U+000A LINE FEED (traditional newline)
    • U+000B LINE TABULATION (vertical tab)
    • U+000C FORM FEED (page break)
    • U+000D CARRIAGE RETURN
    • U+0085 NEXT LINE (Unicode newline)
    • U+2028 LINE SEPARATOR
    • U+2029 PARAGRAPH SEPARATOR

The quantity and kind of whitespace separating tokens is ignored except where otherwise specified.

References