Singular extern declarations

Pull request

Table of contents

Abstract

An entity may be declared extern (such as extern class Foo;); this means that its type is only complete if the definition is directly imported. It also allows for a single declaration in a different library, which must be marked as extern library "<owning_library>" (such as extern library "Bar" class Foo;).

Also, establish a different rule of thumb for when modifier keywords are required: modifier keywords are required when, if prior optional declarations were removed, the lack of the modifier keyword would change behavior.

Problem

In the extern model from #3762: Merging forward declarations, multiple extern declarations are allowed. #3763: Matching redeclarations further evolved the extern keyword.

The prior extern model assumed that the extern and non-extern declarations of a class formed two different types, which could be merged. As discussed on #packages-and-libraries, this runs into an issue with code such as:

library "a";
class C {}
library "b";
extern class C;
extern fn F() -> C*;
library "c";
import library "a";
extern fn F() -> C*;

Here, the return types of F differ.

This proposal aims to address the differing return types by unifying the type of C regardless of whether it’s extern. This could be done under multiple different approaches, and this proposal aims for one which enables efficient implementation strategies.

Background

Proposals:

Discussions:

Proposal

Declarations

A given entity may have up to three declarations:

  • An optional, non-owning extern library "<owning_library>" declaration
    • It must be in a separate library from the definition.
    • The owning library’s API file must import the extern declaration, and must also contain a declaration.
  • An optional, owning forward declaration
    • This must come before the definition. The API file is considered to be before the implementation file.
  • A required, owning definition

The consequential changes to the problem example are then:

library "a";

// This proposal makes the import required.
import library "b";

// This proposal makes `extern` required here.
extern class C {}
library "b";
// This proposal makes `library "a"` required here.
extern library "a" class C;
extern fn F() -> C*;
library "c";
import library "a";
extern fn F() -> C*;

Owning extern declarations

On an owning extern declaration, such as extern class C {}, there are two key effects:

  1. The declaration must be explicitly imported in order to be complete.
    • An “explicit import” means some import path exists where the name is available to name lookup, including export import and export <name>.
  2. A non-owning extern library "<owning_library"> declaration is allowed, but not required.

If either owning declaration has the extern modifier, both must have it.

Details

Type coherency

In the context of the example that is the problem, C will produce the same type regardless of whether C is the owning or non-owning declaration. This means that both function signatures have identical types.

We do this by only producing a complete type if the owning definition of C is imported by name: either directly through import library "a", or indirectly through a chain of export import library "a" and export C;. Otherwise, an incomplete type is used.

This does mean that adding extern to an owning declaration changes the import semantic. As a consequence, it is a potentially breaking change for API consumers that didn’t explicitly import the time.

In the presence of extern library "a" class C;, the required import library "b" means that all owning extern class C declarations are able to see the extern library "a" class C declaration as a name collision, which is merged. This allows the compiler to easily apply the same type to all declarations. That in turn will be used to ensure libraries which import both understand the type equality.

Impact on indirect imports

An entity marked as extern is only complete when the definition is explicitly imported. In the following, examples of indirect, non-explicit uses are given inside library "o".

library "m";

extern class C { fn Member(); }
library "n";
import library "m";

fn F() -> C;
var c: C = {};
var pc: C* = &c;
library "o";
import library "n";

// Invalid: The return type of `C` is incomplete, making the function signature
// invalid.
fn G() { F(); }

// Invalid: Accessing members requires `C` to be complete.
fn UseC() { c.Member(); }

// Valid: Taking the address of `C` doesn't require it to be complete. This is
// possible because `&` doesn't have an extension point.
var indirect_pc: auto = &c;

// Invalid: Copying `C` requires the complete type.
var copy_c: auto = c;

// Valid: Pointer-to-pointer copies are okay.
var copy_pc: auto = pc;

Indirect imports of non-extern types

The above rules explicitly do not apply for non-extern types, as decided in Issue #4025. In other words:

library "a";

class C { fn F(); }
library "b";
import library "a";

fn G() -> C;
library "c";
import library "b";

// Valid: `C` is complete here, even though it's not in name lookup.
G().F();

Using imported declarations

Since extern library "a" class C; must be imported by the owning library, we now allow uses of the imported name prior to its declaration within the same file. This is a divergence from #3762. It means the following now works:

library "extern";

extern library "use_extern" class MyType;
library "use_extern";
import library "extern"

// Uses the `extern library` declaration.
fn Foo(val: MyType*);

extern class MyType {
  fn Bar[addr self: Self*]() { Foo(self); }
}

private extern

Previously, in #3762, a non-owning private extern was valid to declare something as extern without exposing the name. In this proposal, that would be a non-owning private extern library "<owning_library>" for an owning public extern declaration. However, rather than supporting this version of the syntax, it will instead be invalid because the name would never be visible to the owning library. Instead, visibility must match between an extern library "<owning_library>" declaration and the owning extern declaration.

Note, because an owning extern declaration can be used independently of extern library "<owning_library>", an owning private extern declaration is valid in an API file. It has no special behaviors about it, and is merged as normal.

Validation for non-owning extern library declarations

We should offer some validation that the library in extern library is correct. When the owning library is incorrect, it’s very likely to be detected in two cases:

  • A compile-time error when the owning library imports the non-owning library, when the owning declaration is evaluated.
  • A link-time error as a fallback.

Other cases, such as when both libraries are independently imported, may or may not be caught, dependent upon the cost of validation.

No syntactic matching for extern library declarations

The non-owned extern library declarations will only use semantic matching for redeclarations, not syntactic matching. Details of syntactic matching laid out in #3763 will only apply to owned declarations in the same library, which may include owned extern declarations.

Versus proposal #3762

Versus proposal #3762, the extern feature is essentially rewritten. No part of extern should be assumed to still apply.

Rationale

  • Software and language evolution
    • Unifying the type of extern entities addresses a type coherency issue.
    • The extern behavior of requiring an explicit import is intended to assist library authors in carefully managing the dependencies on their API.
  • Fast and scalable development
    • Requiring the non-owning extern library declaration be imported by the owning library should improve compiler performance.

This proposal makes a trade-off with Interoperability with and migration from existing C++ code. The restriction of a unique extern declaration is expected to require additional work in migration, because C++ extern declarations will need to be consolidated. This is currently counter-balanced by the trade-offs involved, although it may result in a reevaluation of that aspect of this proposal.

Future work

extern and template interactions

We’ve only loosely discussed template interactions with extern. Right now, what we expect is that when a template declaration uses an extern type, the instantiation still occurs in the calling file. Thus, the extern type’s name would need to be imported in both the file declaring the template, and the file calling the template.

When the template is in the same package as the extern type, it could re-export it. However, we don’t support re-exporting names cross-package, and something like let template ExternType:! auto = OwningPackage.ExternType; would not actually forward the completeness of ExternType.

This is expected to be inconvenient, but it may be okay if extern sees limited use. It may also be that the template model ends up different from expected.

Alternatives considered

Allow multiple non-owning declarations, remove the import requirement, or both

We limit to one non-owning extern library declaration. Continuing to allow multiple extern library declarations (the previous state) is feasible. Similarly, we could not require the owning extern declaration to import the non-owning extern library declaration; this could be done with or without multiple non-owning extern library declarations. For this set of alternatives, the issues which would arise are similar.

In the compiler, we want to be able to determine that two types are equal through a unique identifier, such as a 32-bit integer. When one declaration sees another directly, as through an import, we identify the redeclaration by name, and reuse the unique identifier. This deduplication can occur once per declaration. Indirect imports can continue to use the unique identifier.

We could instead support unifying declarations that did not see each other. However, this would require canonicalizing all types by name instead of by unique identifier. For example, consider:

package Other library "type";
extern class MyType {
  fn Print();
};
package Other library "use_type";
import library "type";
fn Make() -> MyType*;
package Other library "extern";
extern library "type" class MyType;
package Other library "use_extern";
import library "extern";
fn Print(val: MyType*);
library "merge";
import Other library "use_type";
import Other library "use_extern";
Other.Print(Other.Make());

Here, the “merge” library doesn’t see either declaration of MyType directly. However, Print(Make()) requires that both declarations of MyType be determined as equivalent. This particular indirect use also means that the names will not have been added to name lookup, so there is no reason for the two declarations to be associated by name.

In order to do merge these declarations, we would need to identify that fully qualified names and other structural details are equivalent when the type is used (including non-explicit uses, such as interface lookup). We could achieve this, for example, by having a name lookup table for in-use types, managed per library. Each library would also need to validate that declarations were semantically equivalent, versus the current approach validating as part of the redeclaration. The cost of a per-library approach is expected to have a significant impact on the amount of work done as part of semantic analysis.

We may end up wanting to do similar work in order to improve diagnostics for invalid cases where the non-owning extern library is not correctly declared and imported. However, additional work building good diagnostics for already-identified invalid code is less of a concern than additional work on fully valid code.

In order to maintain a high-performance compiler, we are taking a restrictive approach that makes it simpler to associate type information.

Total number of allowed declarations (owning and non-owning)

A few options were considered regarding the number of allowed declarations.

We limit to two owning declarations: the optional forward declaration, and required definition. The need to provide interface implementations (for example, impl MyType as Add) is considered to constrain this choice.

In this category, alternatives considered were:

  • Do not restrict the number of declarations
  • Allow up to two declarations total
  • Allow up to four declarations total

Details for why each alternative was declined are below.

Do not restrict the number of forward declarations

We could not restrict the number of forward declarations, allowing an arbitrary amount – possibly also after the definition. This would be consistent with C++.

One thing to consider here is modifier keyword behavior. If we require modifier keywords to match across all declarations, that could become a maintenance burden for developers. If we don’t, it makes the meaning of a given forward declaration more ambiguous.

This option is declined due to the lack of clear benefit.

Allow up to two declarations total

Under this option, we would only allow one forward declaration, treating the non-owning extern library declaration as a forward declaration. This would mean two declarations overall, instead of three.

For this, the main concern was interactions between file placement of the definition, and file placement of interface implementations. Interface implementations must generally be in API files in order to be seen by other libraries.

For example:

library "i";
interface I {}
library "e";
import library "i";
extern library "o" class C;
extern library "o" impl C as I;
library "o";
import library "e";
extern class C { }
extern impl C as I;
impl library "o";
extern impl C as I { }

If the definition is required to be in the API file in order to allow the interface implementations in the API file, the API file would need to import libraries required to construct the definition. That could create issues for separation of build dependencies, and could also make it more difficult to unravel some dependency cycles between libraries.

If the definition was allowed to be in the implementation file even when there were interface implementations in the API file, the ambiguity of seeing a non-owning extern library declaration and being unsure of whether this was the owning library could have negative consequences for evaluation of interface constraints.

The purpose of allowing a forward declaration when there is a non-owning extern declaration is to make it clear for interface implementations that they exist in the owning library, while processing the API file.

Allow up to four declarations total

The four declarations would be:

  1. Non-owning extern library declaration
  2. Forward declaration in API file
  3. Forward declaration in implementation file
  4. Definition

The number of forward declarations allowed is consistent with the current state from #3762.

This would allow for clarity when defining in the implementation file, to also be able to put a forward declaration above – even when the forward declaration is pulled from the API file.

If we’re allowing declarations from another file (including the non-owning extern library declaration) to be used before an entity is declared in the same file, the motivating factor for allowing a repeat forward declaration in an implementation file is removed. Previously, that was required for an entity to be referenced prior to its definition.

In discussion of this option, it was considered unclear why we would allow two forward declarations, but not allow even more. The more popular choice seemed to be not restricting, which was also declined.

Don’t require a modifier on the owning declarations

Instead of requiring an extern modifier on owning declarations, we could infer from the presence of a non-owning extern library declaration.

We had declined allowing a definition to control whether extern library was allowed in discussion of #3762, although this is not directly mentioned in the proposal. At the time, it was dropped because the owning library didn’t need to include extern declarations, and so having the definition opt-in to allowing extern was viewed as low benefit. However, now that the owning library must import the extern declaration, there is a tighter association and so we reevaluated.

The extern modifier offers a benefit for being able to verify the association between non-owning and owning declarations, and offers additional parity in modifiers. It also makes it easy for a tool to know if it’s missing a declaration.

Only require extern on the first owning declaration

At present, we require extern on all owning declarations. We could instead only require extern on the first owning declaration and, if there’s a separate forward declaration and definition, infer it for the definition. For example:

// `extern` on the forward declaration.
extern class C;
// Infer `extern` for the definition.
class C {}

The decision to require extern on all owning declarations is based on wanting the forward declaration to be optional. A rule of thumb was discussed wherein if a forward declaration could be removed without breaking the definition (as defined by it being in the same lexical scope), keywords should be duplicated to the definition. This is not proposed as a rule because it’s not clear whether we’ll generally follow it, but it’s why this particular choice is taken.

Separate require-direct-import from non-owning declarations

At present, an extern modifier on an owning declaration serves two purposes:

  1. Indicates that a non-owning extern library declaration can exist.
  2. Indicates the declaration must be directly imported in order to be complete.

This means that:

  • The presence of extern on an owning declaration cannot be used to determine whether a non-owning declaration exists.
    • Because the location of a non-owning declaration isn’t explicit in the owning code, this may lead to a developer failing to find the non-owning declaration and misunderstanding that as the non-existence of a non-owning declaration.
  • Libraries which happen to be imported by the owning declaration may freely add or remove non-owning extern library declarations without modifying the owning library.

We could give distinct syntax to the two purposes, so that they could be managed separately. The preference at present is to use a single syntax for both purposes, rather than emphasizing control or correspondence.

Other extern syntaxes

Issue #3986 discussed other syntaxes for extern + extern library. These were mainly has_extern/is_extern/externed + extern.

Breaking down extern, there are two features which could have been provided separately:

  1. Declaring an entity has a forward declaration in a separate library.
    • Also, declaring that forward declaration in a separate library: extern library "<owning_library>".
  2. Declaring an entity must be imported directly.

Although (1) must depend on (2), a different design could provide (2) without making (1) possible, for example with different keywords to differentiate between intended usage (has_extern class C; meaning (1) and (2), must_import meaning (2) only). However, the extern keyword approach means developers have all or nothing.

Considering that, the trade-offs are viewed as:

  • The primary motivation is to provide feature (1).
  • Leads wanted a syntax on the owning declaration that states something positive about the owning declaration itself, rather than expressing that other declarations exist, which suggests that the syntax on the owning declaration should provide feature (2).
  • Leads consider it valuable, though secondary, to support (2) separate from (1), and find it acceptable to make (1) optional to achieve this (in other words, making the extern library "<owning_library>" declaration optional).
    • It’s okay that that extern library "<owning_library>" can be added and removed from imported libraries without modifying the owning library.
    • If a developer considers it important to disambiguate the intended use of a declaration extern class C; and whether there should be a declaration in a separate library, they can add comments.
  • extern seemed like an acceptable name for this approach, and alternative names seemed significantly less good.
  • Using extern for both features still only creates one new keyword, versus multi-keyword approaches.
  • Adding the owning library with extern library "<owning_library>" will hopefully improve diagnostics and human understandability of the code.
    • It is very verbose, but this verbosity goes on the forward declaration in the non-owning library. When it’s read, which will hopefully be less often than the actual declaration, it will provide the reader directions to find the actual declaration.
    • If in practice we find the verbosity becomes a significant issue, we can revisit syntaxes to address that specifically. For example, if we have significant repetiton, we might consider a grouping structure such as extern library "..." { <many forward declarations> }.

Have types with extern members re-export them

We expect there will be types that have extern members; these types are only truly complete if their members are complete.

We discussed having such types automatically re-export the extern members, possibly requiring the types to also be extern in order to be allowed to have extern members. For example:

library "a"
extern library "b" class A;
library "b"
import library "a"
extern class A {}
// B re-exports A so that it's complete on use.
class B { var a: A; }
library "c"
import library "b"
// Importing this function declaration gets B, which again, re-exports A so that
// it's complete on use.
fn F() -> B { ... }
library "d"
// This import loads the incomplete name for A.
import library "a"
// This import loads F, which loads B, which loads the definition of A.
import library "c"

// Because of the import behaviors, this is valid.
var a: A;

We consider this action-at-a-distance. Type coherency means the A member of B is the same as the A in name lookup; we could make them behave slightly differently, but then we get into provenance tracking of type information. Several various forms of this have been discussed as part of the extern design, and it’s something we’ve decided to avoid.

Although it’s more inconvenient, we will require A to be deliberately imported in order for B to be complete.

Require syntactic matching for extern library declarations

We will not require syntactic matching for extern library declarations, but we could.

When a redeclaration is in the same library, we’ve designed name lookup in a way such that syntactic matching is effectively a superset of semantic matching. However, that relies on poisoning entries in name lookup, with later redeclarations seeing identical name lookup data. Because different libraries have different name lookup data, syntactic matching not a superset of semantic matching cross-library. We address this schism by only requiring semantic matching.

Semantic matching will include parameter names. The difference is primarily in whether different ways of producing the same type information are considered invalid or not.

For example:

library "a";

class A {}
namespace NS;
extern library "c" fn NS.F() -> A;
library "b";

namespace NS;
class A {}
library "c"; import library "a" import library "b"

extern fn NS.F() -> NS.A {}

Semantically, NS.F in libraries “a” and “c” are identical. Syntactically, they differ because of NS.A in “c”. Writing A in “c” is invalid because it would use NS.A from “b”. But in “a”, there is nothing to make the declaration invalid: it would only be invalid after completing cross-library compilation.

However, we could also have code such as:

library "d";

class D {}
namespace NS;
extern library "e" fn NS.G() -> D;
library "e";

namespace NS;
alias NS.D = D;
extern fn NS.G() -> D {}

Here, the semantics and syntax match, but this would be invalid in a normal redeclaration due to the different name lookup result for D.

This additionally gets into a different statement made in #3763 to justify synactic matching: “The intention is that whenever the syntax matches, the semantics must also match.” Due to the differences in name lookup, syntax matching does not mean semantics must match; instead of alias NS.D = D;, that could have been alias NS.D = i32; and the syntax would have still matched. This only works in a library because “…we persist syntactic information from the API file to implementation files.” We cannot persist syntactic information cross-library, across imports.

Due to the differences in the guarantees that syntactic matching provides for owned declarations versus non-owned declarations, we will not enforced syntactic matching on the non-owned extern library declarations.