C++ Interop: Toolchain Implementation for Function Calls

Pull request

Table of contents

Abstract

This proposal details the toolchain implementation for calling imported C++ functions from Carbon. It covers how C++ overload sets are handled, the process of overload resolution leveraging Clang, and the generation of “thunks” (intermediate functions) when necessary to bridge Application Binary Interface (ABI) differences between Carbon and C++.

Problem

Seamless, high-performance interoperability with C++ is a fundamental goal of Carbon. The Carbon language design for C++ interoperability, particularly for function calls, is described in the Carbon calling convention design. To implement that design, the Carbon toolchain must be able to translate Carbon-side calls to C++ functions into instructions the C++ side can understand. Several challenges arise at the toolchain level:

  • C++ supports function overloading, requiring the toolchain to resolve calls to the correct C++ function within an overload set.
  • C++ types do not always have identical representations or ABIs to their Carbon counterparts (see Carbon <-> C++ Interop: Primitive Types). For example, parameter passing conventions (by value, by pointer) or return value handling (direct return versus return slot) might differ. This may require the toolchain to synthesize adapter code.
  • C++ member functions require special handling of the this pointer.
  • C++ supports features like default arguments which need a defined mapping.

A clear, robust implementation strategy is needed to handle these complexities, ensuring both correctness and performance.

Background

Carbon’s C++ interoperability philosophy aims to minimize bridge code and provide unsurprising mappings. When Carbon code imports a C++ header, the functions declared within become potentially callable entities. C++ overload resolution rules are complex, and replicating them perfectly within Carbon would be difficult and likely divergent over time. Furthermore, direct calls are only possible when the ABI conventions of the Carbon call site precisely match the expectations of the C++ callee.

Proposal

  1. Import: C++ functions and methods, including overload sets, are imported into Carbon and represented internally (conceptually, as specific overload set instructions in SemIR).
  2. Overload Resolution: When a call to an imported C++ function or overload set occurs in Carbon, Carbon leverages Clang’s overload resolution mechanism. Carbon argument types are mapped to hypothetical C++ types / expressions, and Clang’s Sema determines the best viable function.
  3. ABI Bridging (Thunks):
    • If the selected C++ function’s ABI (parameter types, return type handling, calling convention) matches the Carbon call site’s ABI based on defined type mappings, a direct call is generated.
    • If the ABIs mismatch, Carbon generates an intermediate function, called a C++ thunk. This thunk has a “simple” ABI callable directly from Carbon (typically using only pointers and basic integer types like i32/i64). The thunk internally calls the actual C++ function, performing necessary argument conversions (for example, loading a value from a pointer) and handling return value conventions (for example, managing a return slot).
  4. Call Execution: The Carbon code either calls the C++ function directly or calls the generated C++ thunk.

Details

Importing C++ functions

When a C++ header is imported using import Cpp, declarations within that header are made available. Function declarations, including member functions and overloaded functions, are represented internally within Carbon’s SemIR. An overload set from C++ is represented as a single callable entity in Carbon, associated with the set of C++ candidate functions.

Overload resolution

To resolve a call like Cpp.MyNamespace.MyFunc(arg1, arg2) where MyFunc might be an overload set imported from C++:

  1. Map Arguments: Carbon argument instructions (arg1, arg2) are mapped to placeholder C++ expressions (conceptually similar to clang::OpaqueValueExpr). The types of these expressions are determined by mapping the Carbon argument types to corresponding C++ types (Carbon <-> C++ Interop: Primitive Types).
  2. Invoke Clang Sema: Carbon invokes Clang’s overload resolution logic (clang::OverloadCandidateSet::BestViableFunction()) with the mapped C++ name, the candidate functions from the imported overload set, and the placeholder argument expressions.
  3. Select Candidate: Clang determines the best viable C++ function based on C++ rules (implicit conversions, template argument deduction if applicable later, etc.). If resolution fails (no viable function, ambiguity), Clang’s diagnostics are surfaced as Carbon diagnostics.
  4. Access Check: After selecting a function, Carbon checks if the function is accessible based on C++ access specifiers (public, protected, private) in the context of the call.

Direct calls versus thunks

A direct call from Carbon to C++ is possible only if the ABI matches exactly. A C++ thunk is required if:

  • Type Representation Mismatch: A parameter or the return type has a different representation in Carbon than expected by the C++ ABI, requiring conversion. For example, a Carbon bool (i1) passed to a C++ bool (often i8), or complex struct types.
  • Return Convention Mismatch: The C++ function returns a non-trivial type by value, which typically requires a hidden return slot parameter in the ABI, whereas Carbon might expect a direct return value.
  • Parameter Convention Mismatch: C++ expects a parameter by way of pointer/reference where Carbon provides a value, or vice-versa.
  • Default Arguments: The Carbon call omits arguments that have default values in C++. The thunk provides the default values.
  • Variadic arguments: (Future work) Calling C++ variadic arguments functions.

If a thunk is not required, Carbon emits a direct call instruction targeting the mangled name of the C++ function.

Thunk generation

If a thunk is required for a C++ function CppOriginalFunc(), Carbon generates a new internal function, conceptually CppOriginalFunc__carbon_thunk():

  1. Signature: The thunk has an ABI that is simple and directly callable from Carbon.
    • Parameters corresponding to C++ parameters with complex ABIs are passed by pointer (T*).
    • Parameters with simple ABIs (like i32, i64, raw pointers) are passed directly.
    • If CppOriginalFunc uses a return slot, the thunk takes a pointer parameter for the return slot. Its LLVM return type becomes void.
    • If CppOriginalFunc returns a simple type directly, the thunk returns the same simple type directly.
  2. Body: The thunk body performs the following:
    • Loads values from pointer arguments passed by Carbon where necessary.
    • Performs necessary type conversions between Carbon simple ABI types and C++ expected types (for example, i1 to i8 for bool).
    • Calls CppOriginalFunc with the converted arguments, potentially passing the return slot address.
    • If CppOriginalFunc returned directly, the thunk returns that value. If it used a return slot, the thunk returns void.
  3. Attributes: The thunk is typically marked always_inline to encourage the optimizer to remove the indirection. It is given a predictable mangled name based on the original function’s mangled name plus a suffix.

The Carbon call site then calls the thunk instead of the original C++ function.

Parameter and return value handling

  • Arguments: When calling a C++ function (directly or by way of a thunk), Carbon arguments undergo implicit conversions as needed to match the parameter types determined by overload resolution. For calls requiring a thunk, additional conversions might occur at the call site (for example, taking the address of an object to pass by pointer to the thunk) and within the thunk (for example, loading the object from the pointer).
  • Return Values: If the C++ function returns void, the Carbon call expression has type (). If it returns a simple type directly, the Carbon call has the corresponding mapped Carbon type. If the C++ function uses a return slot, the Carbon call is modeled as initializing the storage designated by the return slot argument (often a temporary created at the call site), and the overall call expression typically results in the initialized value.

Member function calls

  • Instance Methods: When object.CppMethod() is called, object becomes the implicit this argument. Clang’s overload resolution handles the qualification (for example, const). The this pointer is passed as the first argument, either directly or to the thunk.
  • Static Methods: Calls like CppClass::StaticMethod() are treated like free function calls; no this pointer is involved.

Operator calls

Calls to overloaded C++ operators are handled similarly to function calls. Carbon identifies the operator call, looks up potential C++ operator functions (both member and non-member), and uses Clang’s overload resolution to select the best candidate. Thunks may be generated if required by the selected operator function’s ABI.

Rationale

  • Leverages Clang: Reusing Clang’s overload resolution avoids reimplementing complex C++ rules and ensures consistency.
  • Performance: Direct calls are used when possible. Thunks are designed to be minimal and aggressively inlined, minimizing overhead.
  • Correctness: Thunks handle ABI mismatches systematically, ensuring correct data marshalling between Carbon and C++.
  • Developer Experience: Aims for C++ calls to feel natural in Carbon, hiding much of the complexity of ABI bridging.
  • Interop Goal: Directly supports the core goal of seamless C++ interoperability.

Alternatives considered

Require manual C++ wrappers

Instead of generating thunks automatically, Carbon could require developers to write C++ wrapper functions with simple C-like ABIs for any C++ function whose ABI doesn’t directly match Carbon’s expectations.

  • Rejected because: This places a significant burden on the developer, increases boilerplate, hinders rapid iteration, and makes C++ libraries feel less integrated. It violates the goal of minimizing bridge code.

Mandate Carbon ABI compatibility with C++

Carbon could define its types and calling conventions to always match a specific C++ ABI (for example, Itanium).

  • Rejected because: This would heavily constrain Carbon’s own evolution and design choices. It wouldn’t solve the problem entirely, as C++ ABIs themselves vary (for example, between platforms, compilers, or even libraries like libc++ vs libstdc++ for string_view). It conflicts with the goal of software and language evolution.