Coalescing generic functions emitted when lowering to LLVM IR

Overview
Design details
Algorithm details
Alternatives considered
Opportunities for further improvement

Overview

When lowering Carbon generics to LLVM, it is possible we emit duplicate LLVM IR functions. This document describes the algorithm implemented in lowering for determining when and which generated specifics, while different at the Carbon language level, can be coalesced into a single one when lowering Carbon’s intermediate representation (SemIR), to LLVM IR.

The overall goal of this optimization is to avoid generating duplicate LLVM IR code where it is easy to determine this from the front-end. Such an optimization needs to be done after specialization, but there is some flexibility in when to do it afterwards: before lowering, through analysis of SemIR or during/after lowering.

The goal of this doc is to describe the algorithm implemented in specifics_coalescer, from putting it into context, to the overall goal, the challenges and where there is still room for improvement in subsequent iterations.

Determining the impact on compile-time is beyond the scope of this document, but an important problem to follow up on.

Design details

In order to determine if two specific functions are equivalent, and a single one of them can be used instead of the other, the following need to be considered as part of the algorithm and its implementation.

SemIR representation and why to coalesce during lowering

In SemIR, a specific function is defined by an unique tuple: (function_id, specific_id). There is a single in-memory representation of a generic function’s body (not one for each specific), where the instructions that are different between specifics can be determined, on-demand, based on a given specific_id. Hence, determining if two specifics are equivalent needs to analyze if these specific-dependent instructions are equivalent at the LLVM IR level. This can only be determined after the eval phase is complete and using information on how Carbon types map to llvm::Types.

The algorithm described below does coalescing of specifics during lowering. Also see alternatives considered.

Recursion and strongly connected components (SCCs)

Comparing if two different specific functions contain (access, invoke, etc.) the same specific-dependent instruction is not straightforward when recursion is involved. The simplest example is when A and B each are recursive functions, and are equivalent. The check “are A and B equivalent” needs to start by assuming they are equivalent, and when a self-recursive call is found in each, that call is still equivalent. In practice this requires comparison of specific_ids, which in SemIR are distinct.

In the general case, this analysis needs to analyze the call graph for all functions and build strongly connected components (SCCs). The call graph could either be created before lowering or built while lowering. The current implementation does the latter, and in a post-processing phase we can conclude equivalence and simplify the emitted LLVM IR by deleting unnecessary parts.

A non-viable option is building the call graph based on the information “what are all call sites of myself, where I am a specific function”, because this information is not available until processing the function bodies of all specific functions. This is an optimization done so that the definition of a specific isn’t emitted until a use of it is found. Building that information would duplicate all the lowering logic, minus the LLVM IR creation.

Function fingerprints

Even with limiting the comparison of specific functions to those defined from the same generic, a comparison algorithm would still end up with quadratic complexity in the number of specifics for that generic.

We define two fingerprints for each specific:

specific_fingerprint: Includes all specific-dependent information.
common_fingerprint: Includes the same except for specific_id information, as specific_ids can only be determined to be equivalent after building an equivalence SCC.

Two specific functions are equivalent if their specific_fingerprints are equal and are not equivalent if their common_fingerprints differs. If the common_fingerprints are equal but the specific_fingerprints are not, the two functions may still be equivalent.

Ideally, the specific_fingerprint can be used as a unique hash to first coalesce all specific functions with this same fingerprint, with no additional checks. Then, all remaining functions may use the common_fingerprint as another unique hash to group remaining potential candidates for coalescing. Then, only those with this same common_fingerprint are processed in a quadratic pass walking all calls instructions and comparing if the specific_id information is equivalent. These optimizations are not currently implemented.

Note that this does not coalesce non-specifics.

Canonical specific to use

For determining the canonical specific to use, we use a disjoint set.

Algorithm details

Below is a pseudocode of the existing algorithm in toolchain/lower/specific_coalescer.*.

The implementation can be found in specifics_coalescer.h and specifics_coalescer.cpp.

At the top level, the current algorithm first generates all function definitions, and once this is complete, it performs the logic to coalesce specifics and delete the redundant LLVM function definitions.

LowerToLLVM () {
  for all non_generic_functions
    CreateLLVMFunctionDefinition (function, no_specific_id);
  PerformCoalescingPostProcessing ();
}

The lowering starts with all non-generic functions. While lowering these, when calls to specifics are encountered, it also generates definitions for those specific functions.

For each lowered specific function definition, we create the SpecificFunctionFingerprint, which includes the two fingerprints, and a list of calls to other specific functions.

CreateLLVMFunctionDefinition (function, specific_id) {
  For each SemIR instruction in the function:
    Step 1: Emit LLVM IR for the instruction
    Step 2: If the instruction is specific-dependent, hash it and add to its `common_fingerprint`
    Step 3: If the SemIR instruction is a call to a specific,
      a) Create a definition for this specific_id if it doesn't exist:
        CreateLLVMFunctionDefinition (function, specific_id);
      b) Hash the specific_id to the current function's `specific_fingerprint`
      c) Add the non-hashed specific_id to list of calls performed
}

The logic that performs the actual coalescing analyzes all specifics. For each pair of two specifics, it first checks if the LLVM function types match (using a third hash-like fingerprint: function_type_fingerprint for storage optimization), then if these are equivalent based on the SpecificFunctionFingerprint. For each pair of equivalent functions found (in a callgraph SCC), one function will be marked non-canonical: its uses are replaced with the canonical one and its definition will ultimately be deleted.

PerformCoalescingPostProcessing () {
  for each two specifics of the same generic {
    if function_type_fingerprints differ {
      track as non-equivalent
      continue
    }

    add the two specifics to assumed equivalent specifics list
    if (CheckIfEquivalent(two specifics, assumed equivalent specifics list)) {
      for each two equivalent specifics found {
        find the canonical specific & mark the duplicates for replacement/deletion
    }
  }
  replace all duplicate specifics with the respective canonical specifics
  and delete all replaced LLVM function definitions.
}

The equivalence check for specifics based on the constructed SpecificFunctionFingerprint can make an early non-equivalence determination based on the common_fingerprints, and an early equivalence determination based on the specific_fingerprints. Otherwise, it uses the call list and recurses to make the determination for all functions in the SCC call graph (in practice the implementation uses a worklist to avoid the recursion).

CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool {
  if common_fingerprints are non-equal {
    track as non-equivalent specifics
    return false
  }
  if specific_fingerprints are equal {
    track as equivalent specifics
    return true
  }
  if already tracked as equivalent or assumed equivalent specifics {
    return true
  }

  for each of the calls in each of the specifics {
    if the functions called are the same or already equivalent or assumed equivalent specifics {
      continue
    }
    if the functions called are already non-equivalent specifics {
      return false
    }
    add <pair of calls> to assumed equivalent specifics
    if !CheckIfEquivalent(specifics in <pair of calls>, assumed equivalent specifics) {
      return false;
    }
  }
}

Alternatives considered

Coalescing in the front-end vs back-end?

An alternative considered was not doing any coalescing in the front-end and relying on LLVM to make the analysis and optimization. The current choice was made based on the expectation that such an LLVM pass would be more costly in terms of compile-time. The relative cost has not yet been evaluated.

When to do coalescing in the front-end?

The analysis and coalescing could be done prior to lowering, after specialization. The advantage of that choice would be avoiding to lower duplicate LLVM functions and then removing the duplicates. The disadvantage of that choice would be duplicating much of the lowering logic, currently necessary to make the equivalence determination.

Compile-time trade-offs

Not doing any coalescing is also expected to increase the back-end codegen time more than performing the analysis and deduplication. This can be evaluated in practice and the feature disabled if found to be too costly.

Coalescing duplicate non-specific functions

We could coalesce duplicate functions in non-specific cases, similar to lld’s Identical Code Folding or LLVM’s MergeFunctions pass. This would require fingerprinting all instructions in all functions, whereas specific coalescing can focus on cases that only Carbon’s front-end knows about. Carbon would also be restricted to coalescing functions in a single compilation unit, which would require replacing function definitions that allow external calls with a placeholder that calls the coalesced definition. We don’t expect sufficient advantages over existing support.

Opportunities for further improvement

The current implemented algorithm can be improved with at least the following:

The specific_fingerprint can be used to already bucket specifics that can be coalesced right away.
The remaining ones can be pre-bucketed such that only the specifics with the same common_fingerprint have their list of calls further compared (linear in the number of specific calls inside the functions) to determine SCCs that may be equivalent.

This should reduce the complexity from the current O(N^2), with N=number of specifics for a generic, to O(M^2), with M being the number of specifics for a generic that have different specific_fingerprint and equal common_fingerprint (expectation is that M « N).

An additional potential improvement is defining the function fingerprints in a manner that is translation-unit independent, so this can be used in the mangled name, and the same function name emitted. This does not currently occur, as the two fingerprints use internal SemIR identifiers (function_id and specific_id respectively).