Semantic Identity and Order-Dependent Resolution for Rewrite Constraints

Pull request

Table of contents

Abstract

The rules for rewrite constraint resolution say that when there are multiple assignments to the same associated constant in a facet type, they must all be identical. But it does not clearly identify whether that means syntactically identical or semantically. And if semantically, whether that is before or after substituting from the RHS of other rewrite constraints.

Problem

Previously we had been working with the intention of requiring syntactical equivalence for all rewrite constraints within a facet type that assign to the same associated constant. However this is highly problematic for implementation in the toolchain, since we lose syntactic information when we create the semir of the rewrite constraints. In particular, once a facet type is fully constructed, all syntactic information is lost, as individual instructions are evaluated and their canonical constant values are what is stored in the facet type in order to make facet types themselves canonical.

This problem becomes most visible when applying the & operation to combine two facet types with rewrite constraints of the same associated constant, such as: (I where .X = ()) & (I where .X = .Y and .Y = ()). Here the facet type on the right will be resolved to I where .X = () and .Y = () before applying the & operation, so we have lost syntactic knowledge that .X was being assigned .Y. That makes things inconsistent if we consider it an error to give .X two differently-written values in a single facet type, such as I where .X = () and .X = .Y and .Y = ().

Therefore we would like to consider a semantic identity requirement instead, however there are soundness issues, implementation-specific inconsistencies, or diagnostic inconsistencies, unless we carefully specify how rewrite constraints are applied through substitution when checking for conflicting assignments and cycles between rewrite constraints.

For example, given I where .Y = () and .Y = .X and .X = .Y:

  • If .X = .Y is resolved to .X = () first, then the two assignments to .Y are not in conflict and can be considered identical, and no cycle is found.
  • If .Y = .X is resolved to .Y = .Y first, however, we find a cycle and diagnose this as an error.
  • If .X = .Y is resolved to .X = .X by using the second constraint for .Y instead of the first one, we also find a cycle and diagnose an error.

That leaves us with the question of whether this facet type contains a cycle .Y = .Y after being resolved, or contains two identical assignments of .Y = () and one of .X = .Y.

One question is whether we should have any reliance on ordering of rewrite constraints in resolution. In other words, whether it’s okay that reordering the rewrite constraints changes whether a cycle is found or not.

If ordering should not affect the determination of cycles, then we have a choice for what to look for:

  • Should we always use an ordering that finds a cycle, if such an ordering exists?
  • Should we always use an ordering that avoids a cycle, if such an ordering exists?

If we are okay with the ordering of rewrite constraints affecting the detection of cycles, then we need to precisely specify how the ordering works in order to make implementations consistent:

  • In what order are the RHS of rewrite constraints resolved?
  • When substituting for a reference to another rewrite constraint, in what order do we choose a constraint to substitute from?

Background

Proposal

We propose to allow ordering of rewrite constraints to affect cycle detection, with the following orderings specified when resolving:

  • Rewrite constraints should be resolved left-to-right.
  • When a reference to another rewrite constraint is found, it should be substituted with the first matching rewrite constraint found from left-to-right.
  • When combining two facet types LHS & RHS, the LHS facet type’s rewrite constraints are ordered before the RHS facet type’s constraints when resolving the combined facet type. Effectively, the rewrite constraints from the RHS are concatenated onto the constraints from the LHS, then the facet type is resolved.

Rationale

We believe that order dependence is in line with the design direction of the language overall. In particular, this aligns with the information accumulation principle. The principle says:

If a program attempts to use information that has not yet been provided, the program is invalid.

This suggests that looking ahead for a rewrite constraint to avoid a cycle is an invalid operation.

However, it also states that:

Carbon programs are invalid if they would have a different meaning if more information were available.

This might seem to suggest that the meaning can not change when the order in which code is written changes. But here we take meaning to refer to the resolved state of all associated constants when there are no errors present. Using ordering to avoid cycles and prevent error diagnostics is a common practice in Carbon, such as with the use of forward declarations. This proposal makes the ordering effect whether a cycle is found and diagnosed, but in the absence of a cycle, the ordering does not affect the resulting values of the associated constants.

Using a prescribed ordering to resolve and find cycles simplifies the implementation, avoiding the need to consider all n! orderings of n constraints. It also simplifies explaining the language rules, as you can step through the resolving algorithm without excessive branching choices, much like you would step through lines of code in a debugger.

Using a left-to-right order is consistent with how Carbon is parsed and executed generally. If there were multiple statements written on the same line, they would also be executed left-to-right.

Canonicalization

A key part of toolchain implementation is the ability to canonicalize constant values, including facet types. This means that once resolved, the order of the rewrite constraints must be able to be effectively lost, as different orderings would all result in the same canonical facet type.

After being resolved, a facet type rewrite constraints have two key properties:

  • Each associated constant appears at most once on the LHS of a rewrite constraint. There are no duplicate assignments.
  • Each associated constant that appears on the RHS of a rewrite constraint does not appear on the LHS of any rewrite constraint. There are no relationships between rewrite constraints.

When combining two facet types with LHS & RHS we said that the rewrite constraints of the LHS come before the rewrite constraints of the RHS. Since the LHS and RHS are each fully resolved before being combined, canonicalization requires that the rewrite constraints within the LHS and the RHS can be ordered in arbitrary ways without changing whether a cycle is found.

Assume that we have two orderings such that one produces a cycle, and one does not. That means we have a branching point from an associated constant, so there are two rewrite constraints for that same constant, so one must appear on the LHS and one on the RHS of the & operator. Now their order is fixed, as we have specified that all rewrite constraints from the LHS come before those from the RHS. So we will always choose the same constraint when replacing that associated constant, and consistently find a cycle or not, regardless of how constraints are ordered within the LHS and RHS facet types.

Alternatives considered

Order independence

This has two possibilities: eagerly finding cycles or eagerly avoiding cycles. In either case, this complicates the toolchain implementation without making the language more expressive. The user can always write constraints in a way that does not produce a cycle in a left-to-right ordering if the constraints are valid.

Eagerly finding cycles produces errors when unnecessary, which seems more hostile than needed. Eagerly avoiding cycles could be seen as being helpful, as we find a way to complete the concrete assignments whenever possible. In that case, given a cycle of rewrite constraints, adding a concrete value for any associated constant in the cycle, written anywhere in the facet type, breaks the cycle. Whereas with left-to-right ordering, the concrete value needs to be the first rewrite constraint for the associated constant.

With eager cycle breaking, this cycle I where .Y = .X and .X = .Y is broken by adding .X = () anywhere in the facet type: I where .Y = .X and .X = .Y and .X = (). For a human reading left to right, we still read a cycle, but have to keep reading to understand why it’s not.

With left-to-right cycle breaking, the same expressivity is kept, but the concrete value must be written before other rewrites of the same associated constant: I where .Y = .X and .X = () and .X = .Y. Now for a human reading from left to right, we can read that .Y = () and .X = () and can disregard other later assignments to .X.

Syntactic equivalence

We could choose to diagnose I where .X = () and .X = .Y and .Y = () as a conflicting assignment to .X, since the RHS of the two assignments are written differently, even though the resolved value of the RHS can be found to be () in both of them.

This is perhaps the simplest rule for humans to understand, but requires maintaining syntactic information throughout the toolchain, complicating the implementation and is at odds with canonicalization.

Given the following example:

(I where .X = .Y and .Y = ()) & (I where .X = ())

The combined facet type combines .X = .Y and .X = () which are syntactically different. But the facet type on either side of the & operator are resolved and canonicalized. Canonicalization requires that I where .X = .Y and .Y = () is the same type as I where .X = () and .Y = (), but when combined with I where .X = () one violates the syntactic rule and one does not.

Thus we would need to only apply syntactic identity within a single facet type, but use a semantic identity when combining facet types. This makes for a more inconsistent experience and implementation.