Code and name organization
Table of contents
- Problem
- Proposal
- Open questions for the decision
- Justification
- Alternatives considered
- Packages
- Libraries
- Allow exporting namespaces
- Allow importing implementation files from within the same library
- Alternative library separators and shorthand
- Collapse API and implementation file concepts
- Collapse file and library concepts
- Collapse the library concept into packages
- Collapse the package concept into libraries
- Different file type labels
- Function-like syntax
- Inlining from implementation files
- Library-private access controls
- Managing API versus implementation in libraries
- Multiple API files
- Name paths as library names
- Imports
- Namespaces
- Rationale
Problem
How do developers store code for the compiler, and access it for reuse?
Proposal
Adopt an approach with tiered files, libraries, packages and namespaces in the design.
Out-of-scope issues
Related issues that are out-of-scope for this proposal are:
-
Access control: while there is some implicit access control from interface vs implementation considerations for libraries, they are more about addressing circular dependencies in code.
-
Aliasing implementation: while the
alias
keyword is critical to how easy or difficult refactoring is, it should be designed on its own. -
Compilation details: this proposal sets up a framework that should enable well-designed compilation, but does not set about to design how compilation will work.
-
File-private identifiers: Something similar to C++
static
functions may exist. However, that will be addressed separately. -
Incremental migration and unused imports: incrementally migrating a declaration from one library to another might require an intermediate state where callers import both libraries, with consequent issues. However, it may also not require such. Whether it does, or whether tooling needs to be added to support the specific intermediary state of transitional, unused imports, is out of scope.
-
Name lookup, including addressing conflicting names between imports and names in the current file: the name lookup design is likely to address this better, including offering syntax that could refer to it if needed.
- After discussion, we believe we do not need to support package renaming. However, that final decision should be based on name lookup addressing the issue, as implications need to be considered more deeply.
-
Package management: while we want to choose syntax that won’t impose barriers on building package management into Carbon, we should not make assumptions about how package management should work.
-
Prelude package, or fundamentals: while we’ve discussed how to handle name lookup for things like
Int32
, this proposal mainly lays out a framework where options for addressing that are possible.
This proposal should not be interpreted as addressing these issues. A separate discussion of these issues will remain necessary.
Open questions for the decision
Extended open question comparisons may be found in the examples doc in addition to the code_and_organization.md
alternatives section.
Should we switch to a library-oriented structure that’s package-agnostic?
Decision: No.
Right now, the package
syntax is very package-oriented. We could instead eliminate package semantics from code and organization, relying only on libraries and removing the link to distribution. This is the collapse the package concept into libraries alternative.
Does the core team agree with the approach to packages and libraries? If not, does the alternative capture what the core team wants to be turned into the proposal, or is some other approach preferred?
Should there be a tight association between file paths and packages/libraries?
Decision: Make paths correspond to libraries for API files, not impl files. Keep package
.
Right now, the package
syntax requires the package’s own name be repeated through code. This touches on a couple alternatives:
- Strict association between the file system path and library/namespace
- Referring to the package as
package
The end result of taking both alternatives would be that:
- The
package
andlibrary
would no longer need to be specified on the first line.- The
import
would still need alibrary
.
- The
- The
package
keyword would always be used to refer to the current package.- Referring to the current package by name would be disallowed, to allow for easier renames of conflicting package names.
Justification
-
Software and language evolution:
-
The syntax and interactions between
package
andimport
should enable moving code between files and libraries with fewer modifications to callers, easing maintenance of large codebases.- In C++ terms,
#include
updates are avoidable when moving code around.
- In C++ terms,
-
-
Code that is easy to read, understand, and write:
-
By setting up imports so that each name in a file is unique and refers to the source package, we make the meaning of symbols clear and easier to understand.
-
The proposed
namespace
syntax additionally makes it clear when the package’s default namespace is not being used.- This is in contrast to C++ namespaces, where the entire body of code above the line of code in question may be used to start a namespace.
-
Clearly marking interfaces will make it easier for both client code and IDE tab completion to more easily determine which APIs can be used from a given library.
-
-
Fast and scalable development:
- The structure of libraries and imports should help enable separate compilation, particularly improving performance for large codebases.
-
Interoperability with and migration from existing C++ code:
- The syntax of
import
should enable extending imports for use in interoperability code.
- The syntax of
Alternatives considered
Packages
Name paths for package names
Right now, we only allow a single identifier for the package name. We could allow a full name path without changing syntax.
Advantages:
- Allow greater flexibility and hierarchy for related packages, such as
Database.Client
andDatabase.Server
. - Would allow using GitHub repository names as package names. For example,
carbon-language/carbon-toolchain
could becomecarbon_language.carbon_toolchain
.
Disadvantages:
- Multiple identifiers is more complex.
- Other languages with similar distribution packages don’t have a hierarchy, and so it may be unnecessary for us.
- In other languages that use packages for distribution, they apply similar restrictions. For example, Node.JS/NPM, Python PyPi, or Rust Crates.
- In Rust Crates, we can observe an example
winapi-build
andwinapi-util
.
- We can build a custom system for reserving package names in Carbon.
At present, we are choosing to use single-identifier package names because of the lack of clear advantage towards a more complex name path.
Referring to the package as package
Right now, we plan to refer to the package containing the current file by name. What’s important in the below example is the use of Math.Stats
:
package Math library "Stats" api;
api struct Stats { ... }
struct Quantiles {
fn Stats();
fn Build() {
...
var Math.Stats: b;
...
}
}
We could instead use package
as an identifier within the file to refer to the package, giving package.Stats
.
It’s important to consider how this behaves for impl
files, which expect an implicit import of the API. In other words, for impl
files, this can be compared to an implicit import Math;
versus an implicit import Math as package;
. However, there may also be explicit imports from the package, such as import Math library "Trigonometry";
, which may or may not be referable to using package
, depending on the precise option used.
Advantages:
- Gives a stable name to refer to the current library’s package.
- This reduces the amount of work necessary if the current library’s package is renamed, although imports and library consumers may still need to be updated. If the library can also refer to the package by the package name, even with imports from other libraries within the package, work may not be significantly reduced.
- The same syntax can be used to refer to entities with the same name as the package.
- For example, in a package named
DateTime
,package.DateTime
is unambiguous, whereasDateTime.DateTime
could be confusing.
- For example, in a package named
Disadvantages:
- We are likely to want a more fine-grained, file-level approach proposed by name lookup.
- Allows package owners to name their packages things that they rarely type, but that importers end up typing frequently.
- The existence of a short
package
keyword shifts the balance for long package names by placing less burden on the package owner.
- The existence of a short
- Reuses the
package
keyword with a significantly different meaning, changing from a prefix for the required declaration at the top of the file, to an identifier within the file.- We don’t need to have a special way to refer to the package to disambiguate duplicate names. In other words, there is likely to be other syntax for referring to an entity
DateTime
in the packageDateTime
. - Renaming to a
library
keyword has been suggested to address concerns withpackage
. Given thatlibrary
is an argument topackage
, it does not significantly change the con.
- We don’t need to have a special way to refer to the package to disambiguate duplicate names. In other words, there is likely to be other syntax for referring to an entity
- Creates inconsistencies as compared to imports from other packages, such as
package Math; import Geometry;
, and imports from the current package, such aspackage Math; import Math library "Stats";
.- Option 1: Require
package
to be used to refer to all imports fromMath
, including the current file. This gives consistent treatment for theMath
package, but not for other imports. In other words, developers will always writepackage.Stats
from withinMath
, andMath.Stats
will only be written in other packages. - Option 2: Require
package
be used for the current library’s entities, but not other imports. This gives consistent treatment for imports, but not for theMath
package as a whole. In other words, developers will only writepackage.Stats
when referring to the current library, whether inapi
orimpl
files.Math.Stats
will be used elsewhere, including from within theMath
package. - Option 3: Allow either
package
or the full package name to refer to the current package. This allows code to say eitherpackage
orMath
, with no enforcement for consistency. In other words, bothpackage.Stats
andMath.Stats
are valid within theMath
package.
- Option 1: Require
Because name lookup can be expected to address the underlying issue differently, we will not add a feature to support name lookup. We also don’t want package owners to name their packages things that even they find difficult to type. As part of pushing library authors to consider how their package will be used, we require them to specify the package by name where desired.
Remove the library
keyword from package
and import
Right now, we have syntax such as:
package Math library "Median" api;
package Math library "Median" namespace Stats api;
import Math library "Median";
We could remove library
, resulting in:
package Math.Median api;
package Math.Median namespace Math.Stats api;
import Math.Median;
Advantages:
- Reduces redundant syntax in library declarations.
- We expect libraries to be common, so this may add up.
Disadvantages:
- Reduces explicitness of package vs library concepts.
- Creates redundancy of the package name in the namespace declaration.
- Instead of
package Math.Median namespace Math.Stats
, could instead useStats
, orthis.Stats
to elide the package name.
- Instead of
- Potentially confuses the library names, such as
Math.Median
, with namespace names, such asMath.Stats
. - Either obfuscates or makes it difficult to put multiple libraries in the top-level namespace.
- This is important because we are interested in encouraging such behavior.
- For example, if
package Math.Median api;
uses theMath
namespace, the presence ofMedian
with the same namespace syntax obfuscates the actual namespace. - For example, if
package Math.Median namespace Math api
is necessary to use theMath
namespace, requiring thenamespace
keyword makes it difficult to put multiple libraries in the top-level namespace.
As part of avoiding confusion between libraries and namespaces, we are declining this alternative.
Rename package concept
In other languages, a “package” is equivalent to what we call the name path here, which includes the namespace
. We may want to rename the package
keyword to avoid conflicts in meaning.
Alternative names could be ‘bundle’, ‘universe’, or something similar to Rust’s ‘crates’; perhaps ‘compound’ or ‘molecule’.
Advantages:
- Avoids conflicts in meaning with other languages.
Disadvantages:
- The meaning of
package
also overlaps a fair amount, and we would lose that context.- Package management systems in general.
- NPM/Node.js, as a distributable unit.
- Python, as a distributable unit.
- Rust, as a collection of crates.
- Swift, as a distributable unit.
No association between the file system path and library/namespace
Several languages create a strict association between the method for pulling in an API and the path to the file that provides it. For example:
- In C++,
#include
refers to specific files without any abstraction.- For example,
#include "PATH/TO/FILE.h"
means there’s a filePATH/TO/FILE.h
.
- For example,
- In Java,
package
andimport
both reflect file system structure.- For example,
import PATH.TO.FILE;
means there’s a filePATH/TO/FILE.java
.
- For example,
- In Python,
import
requires matching file system structure.- For example,
import PATH.TO.FILE
means there’s a filePATH/TO/FILE.py
.
- For example,
- In TypeScript,
import
refers to specific files.- For example,
import {...} from 'PATH/TO/FILE';
means there’s a filePATH/TO/FILE.ts
.
- For example,
For contrast:
- In Go,
package
uses an arbitrary name.- For example,
import "PATH/TO/NAME"
means there is a directoryPATH/TO
that contains one or more files starting withpackage NAME
.
- For example,
In Carbon, we are using a strict association to say that import PACKAGE library "PATH/TO/LIBRARY"
means there is a file PATH/TO/LIBRARY.carbon
under some package root.
Advantages:
- The strict association makes it harder to move names between files without updating callers.
- If there were a strict association of paths, it would also need to handle file system-dependent casing behaviors.
- For example, on Windows,
project.carbon
andProject.carbon
are conflicting filenames. This is exacerbated by paths, wherein a fileconfig
and a directoryConfig/
would conflict, even though this would be a valid structure on Unix-based filesystems.
- For example, on Windows,
Disadvantages:
- A strict association between file system path and import path makes it easier to find source files. This is used by some languages for compilation.
- Allows getting rid of the
package
keyword by inferring related information from the file system path.
We are choosing to have some association between the file system path and library for API files to make it easier to find a library’s files. We are not getting rid of the package
keyword because we don’t want to become dependent on file system structures, particularly as it would increase the complexity of distributed builds.
Libraries
Allow exporting namespaces
We propose to not allow exporting namespaces as part of library APIs. We could either allow or require exporting namespaces. For example:
package Checksums;
api namespace Sha256;
While this approach would mainly be syntactic, a more pragmatic use of this would be in refactoring. It implies that an aliased namespace could be marked as an api
. For example, the below could be used to share an import’s full contents:
package Translator library "Interface" api;
import Translator library "Functions" as TranslatorFunctions;
api alias Functions = TranslatorFunctions;
Advantages:
- Avoids any inconsistency in how entities are handled.
- Reinforces whether a namespace may contain
api
entities. - Enables new kinds of refactorings.
Disadvantages:
- Creates extra syntax for users to remember, and possibly forget, when declaring
api
entities.- Makes it possible to have a namespace marked as
api
that doesn’t contain anyapi
entities.
- Makes it possible to have a namespace marked as
- Allowing aliasing of entire imports makes it ambiguous which entities are being passed on through the namespace.
- This may impair refactoring.
- This can be considered related to broader imports, either all names or arbitrary code.
This alternative is declined because it’s not sufficiently clear it’ll be helpful, versus impairment of refactoring.
Allow importing implementation files from within the same library
The current proposal is that implementation files in a library implicitly import their API, and that they cannot import other implementation files in the same library.
We could instead allow importing implementation files from within the same library. There are two ways this could be done:
-
We could add a syntax for importing symbols from other files in the same library. This would make it easy to identify a directed acyclic graph between files in the library. For example:
package Geometry; import file("point.6c");
-
We could automatically detect when symbols from elsewhere in the library are referenced, given an import of the same library. For example:
package Geometry; import this;
Advantages:
- Allows more separation of implementation between files within a library.
Disadvantages:
- Neither approach is quite clean:
- Using filenames creates a common case where filenames must be used, breaking away from name paths.
- Detecting where symbols exist may cause separate parsing, compilation debugging, and compilation parallelism problems.
- Libraries are supposed to be small, and we’ve chosen to only allow one API file per library to promote that concept. Encouraging implementation files to be inter-dependent appears to support a more complex library design again, and may be better addressed through inter-library ACLs.
- Loses some of the ease-of-use that some other languages have around imports, such as Go.
- Part of the argument towards
api
andimpl
, particularly with a singleapi
, has been to mirror C++.h
and.cc
. Wherein a.cc
#include
-ing other.cc
files is undesirable, allowing aimpl
to import anotherimpl
could be considered similarly.
The problems with these approaches, and encouragement towards small libraries, is how we reach the current approach of only importing APIs, and automatically.
Alternative library separators and shorthand
Examples are using /
to separator significant terms in library names, and //
to separate the package name in shorthand. For example, package Time library "Timezones/Internal";
with shorthand Time//Timezones/Internal
.
Note that, because the library is an arbitrary string and shorthand is not a language semantic, this won’t affect much. However, users should be expected to treat examples as best practice.
We could instead use .
for library names and /
for packages, such as Time/Timezones.Internal
.
Advantages:
- Clearer distinction between the package and library, increasing readability.
- We have chosen not to enforce file system paths in order to ease refactoring, and encouraging a mental model where they may match could confuse users.
Disadvantages:
- Uses multiple separators, so people need to type different characters.
- There is a preference for thinking of libraries like file system paths, even if they don’t actually correspond.
People like /
, so we’re going with /
.
Single-word libraries
We could stick to single word libraries in examples, such as replacing library "Algorithms/Distance"
with library "Distance"
.
Advantages:
- Encourages short library names.
Disadvantages:
- Users are likely to end up doing some hierarchy, and we should address it.
- Consistency will improve code understandability.
We might list this as a best practice, and have Carbon only expose libraries following it. However, some hierarchy from users can be expected, and so it’s worthwhile to include a couple examples to nudge users towards consistency.
Collapse API and implementation file concepts
We could remove the distinction between API and implementation files.
Advantages:
- Removing the distinction between API and implementation would be a language simplification.
- Developers will not need to consider build performance impacts of how they are distributing code between files.
Disadvantages:
- Serializes compilation across dependencies.
- May be exacerbated because developers won’t be aware of when they are adding a dependency that affects imports.
- In large codebases, it’s been necessary to abstract out API from implementation in languages that similarly consolidate files, such as Java. However, the lack of language-level support constrains potential benefit and increases friction for a split.
- Whereas an
api
/impl
hierarchy gives a structure for compilation, if there are multiple files we will likely need to provide a different structure, perhaps explicit file imports, to indicate intra-library compilation dependencies.- We could also effectively concatenate and compile a library together, reducing build parallelism options differently.
- Makes it harder for users to determine what the API is, as they must read all the files.
Requiring users to manage the api
/impl
split allows us to speed up compilation for large codebases. This is important for large codebases, and shouldn’t directly affect small codebases that choose to only use api
files.
Automatically generating the API separation
We could try to address the problems with collapsing API and implementation files by automatically generating an API file from the input files for a library.
For example, it may preprocess files to split out an API, reducing the number of imports propagated for actual APIs. For example:
- Extract
api
declarations within theapi
file. - Remove all implementation bodies.
- Add only the imports that are referenced.
Even under the proposed model, compilation will do some of this work as an optimization. However, determining which imports are referenced requires compilation of all imports that may be referenced. When multiple libraries are imported from a single package, it will be ambiguous which imports are used until all have been compiled. This will cause serialization of compilation that can be avoided by having a developer split out the impl
, either manually or with developer tooling.
The impl
files may make it easier to read code, but they will also allow for better parallelism than api
files alone can. This does not mean the compiler will or will not add optimizations – it only means that we cannot wholly rely on optimizations by the compiler.
Automatically generating the API separation would only partly mitigate the serialization of compilation caused by collapsing file and library concepts. Most of the build performance impact would still be felt by large codebases, and so the mitigation does not significantly improve the alternative.
Collapse file and library concepts
We could collapse the file and library concepts. What this implies is:
- Collapse API and implementation file concepts.
- As described there, this approach significantly reduces the ability to separate compilation.
- Only support having one file per library.
- The file would need to contain both API and implementation together.
This has similar advantages and disadvantages to collapse API and implementation file concepts. Differences follow.
Advantages:
- Offers a uniformity of language usage.
- Otherwise, some developers will use only
api
files, while others will always useimpl
files.
- Otherwise, some developers will use only
- The structure of putting API and implementation in a single file mimics other modern languages, such as Java.
- Simplifies IDEs and refactoring tools.
- Otherwise, these systems will need to understand the potential for separation of interface from implementation between multiple files.
- For example, see potential refactorings.
Disadvantages:
- Avoids the need to establish a hierarchy between files in a library, at the cost of reducing build parallelism options further.
- While both API and implementation is in the same file, it can be difficult to visually identify the API when it’s mixed with a lengthy implementation.
As with collapse API and implementation file concepts, we consider the split to be important for large codebases. The additional advantages of a single-file restriction do not outweigh the disadvantages surrounding build performance.
Collapse the library concept into packages
We could only have packages, with no libraries. Some other languages do this; for example, in Node.JS, a package is often similar in size to what we currently call a library.
If packages became larger, that would lead to compile-time bottlenecks. Thus, if Carbon allowed large packages without library separation, we would undermine our goals for fast compilation. Even if we combined the concepts, we should expect it’s by turning the “package with many small libraries” concept into “many small packages”.
Advantages:
- Simplification of organizational hierarchy.
- Less complexity for users to think about on imports.
Disadvantages:
- Coming up with short, unique package names may become an issue, leading to longer package names that overlap with the intent of libraries.
- These longer package names would need to be used to refer to contained entities in code, affecting brevity of Carbon code. The alternative would be to expect users to always rename packages on import; some organizations anecdotally see equivalent happen for C++ once names get longer than six characters.
- For example, boost could use per-repository packages like
BoostGeometry
and child libraries likealgorithms-distance
under the proposed approach. Under the alternative approach, it would use either a monolithic package that could create compile-time bottlenecks, or packages likeBoostGeometryAlgorithmsDistance
for uniqueness.
- While a package manager will need a way to specify cross-package version compatibility, encouraging a high number of packages puts more weight and maintenance cost on the configuration.
- We expect libraries to be versioned at the package-level.
We prefer to keep the library separation to enable better hierarchy for large codebases, plus encouraging small units of compilation. It’s still possible for people to create small Carbon packages, without breaking it into multiple libraries.
Collapse the package concept into libraries
Versus collapse the library concept into packages, we could have libraries without packages. Under this model, we still have libraries of similar granularity as what’s proposed. However, there is no package grouping to them: there are only libraries which happen to share a namespace.
References to imports from other top-level namespaces would need to be prefixed with a ‘.
’ in order to make it clear which symbols were from imports.
For example, suppose Boost
is a large system that cannot be distributed to users in a single package. As a result, Random
functionality is in its own distribution package, with multiple libraries contained. The difference between approaches looks like:
package
vslibrary
:- Trivial:
- Proposal:
package BoostRandom;
- Alternative:
library "Boost/Random" namespace Boost;
- Proposal:
- Multi-layer library:
- Proposal:
package BoostRandom library "Uniform";
- Alternative:
library "Boost/Random.Uniform" namespace Boost;
- Proposal:
- Specifying namespaces:
- Proposal:
package BoostRandom namespace Distributions;
- Alternative:
library "Boost/Random.Uniform" namespace Boost.Random.Distributions;
- Proposal:
- Combined:
- Proposal:
package BoostRandom library "Uniform" namespace Distributions;
- Alternative:
library "Boost/Random.Uniform" namespace Boost.Random.Distributions;
- Proposal:
- Trivial:
import
changes:- Trivial:
- Proposal:
import BoostRandom;
- Alternative:
import "Boost/Random";
- Proposal:
- Multi-layer library:
- Proposal:
import BoostRandom library "Uniform";
- Alternative:
import "Boost/Random.Uniform";
- Proposal:
- Namespaces have no effect on
import
under both approaches.
- Trivial:
- Changes to use an imported entity:
- Proposal:
BoostRandom.UniformDistribution
- Alternative:
- If the code is in the
Boost.Random
namespace:Uniform
- If the code is in the
Boost
package but a different namespace:Random.Uniform
- If the code is outside the
Boost
package:.Boost.Random.Uniform
- If the code is in the
- Proposal:
We assume that the compiler will enforce that the root namespace must either match or be a prefix of the library name, followed by a /
separator. For example, Boost
in the namespace Boost.Random.Uniform
must either match a library "Boost"
or prefix as library "Boost/..."
; library "BoostRandom"
does not match because it’s missing the /
separator.
There are several approaches which might remove this duplication, but each has been declined due to flaws:
- We could have
library "Boost/Random.Uniform";
implynamespace Boost
. However, we want name paths to use things listed as identifiers in files. We specifically do not want to use strings to generate identifiers in order to support understandability of code. - We could alternately have
namespace Boost;
syntax implylibrary "Boost" namespace Boost;
.- This approach only helps with single-library namespaces. While this would be common enough that a special syntax would help some developers, we are likely to encourage multiple libraries per namespace as part of best practices. We would then expect that the quantity of libraries in multi-library namespaces would dominate cost-benefit, leaving this to address only an edge-case of duplication issues.
- This would create an ambiguity between the file-level
namespace
and othernamespace
keyword use. We could then rename thenamespace
argument forlibrary
to something likefile-namespace
. - It may be confusing as to what
namespace Boost.Random;
does. It may createlibrary "Boost/Random"
becauselibrary "Boost.Random"
would not be legal, but the change in characters may in turn lead to developer confusion.- We could change the library specification to use
.
instead of/
as a separator, but that may lead to broader confusion about the difference between libraries and namespaces.
- We could change the library specification to use
Advantages:
- Avoids introducing the “package” concept to code and name organization.
- Retains the key property that library and namespace names have a prefix that is intended to be globally unique.
- Avoids coupling package management to namespace structure. For example, it would permit a library collection like Boost to be split into multiple repositories and multiple distribution packages, while retaining a single top-level namespace.
- The library and namespace are pushed to be more orthogonal concepts than packages and namespaces.
- Although some commonality must still be compiler-enforced.
- For the common case where packages have multiple libraries, removing the need to specify both a package and library collapses two keywords into one for both
import
andpackage
. - It makes it easier to draw on C++ intuitions, because all the concepts have strong counterparts in C++.
- The prefix
.
on imported name paths can help increase readability by making it clear they’re from imports, so long as those imports aren’t from the current top-level namespace. - Making the
.
optional for imports from the current top-level namespace eliminates the boilerplate character when calling within the same library.
Disadvantages:
- The use of a leading
.
to mark absolute paths may conflict with other important uses, such as designated initializers and named parameters. - Declines an opportunity to align code and name organization with package distribution.
- Alignment means that if a developer sees
package BoostRandom library "Uniform";
, they know installing a packageBoostRandom
will give them the library. Declining this means that users seeinglibrary "Boost/Random.Uniform"
, they will still need to do research as to what package containsBoost/Random.Uniform
to figure out how to install it because that package may not be namedBoost
. - Package distribution is a project goal, and cannot be avoided indefinitely.
- This also means multiple packages may contribute to the same top-level namespace, which would prevent things like tab-completion in IDEs from producing cache optimizations based on the knowledge that modified packages cannot add to a given top-level namespace. For example, the ability to load less may improve performance:
- As proposed, a package
BoostRandom
only adds to a namespace of the same name. If a user is editing libraries in a packageBoostCustom
, thenBoostRandom
may be treated as unmodifiable. An IDE could optimize cache invalidation ofBoostRandom
at the package level. As a result, if a user typesBoostRandom.
and requests a tab completion, the system need only ensure that libraries from theBoostRandom.
package are loaded for an accurate result. - Under this alternative, a library
Boost.Random
similarly adds to the namespaceBoost
. However, if a user is editing libraries, the IDE needs to support them adding to bothBoost
andMyProject
simultaneously. As a result, if a user typesBoost.
and requests a tab completion, the system must have all libraries from all packages loaded for an accurate result. - Although many features can be restricted to current imports, some features, such as auto-imports, examine possible imports. Large codebases may have a memory-constrained quantity of possible imports.
- As proposed, a package
- Alignment means that if a developer sees
- The string prefix enforcement between
library
andnamespace
forces duplication between both, which would otherwise be handled bypackage
. - For the common case of packages with a matching namespace name, increases verbosity by requiring the
namespace
keyword. - The prefix
.
on imported name paths will be repeated frequently through code, increasing overall verbosity, versus the package approach which only affects import verbosity. - Making the
.
optional for imports from the current top-level namespace hides whether an API comes from the current library or an import.
We are declining this approach because we desire package separation, and because of concerns that this will lead to an overall increase in verbosity due to the preference for few child namespaces, whereas this alternative benefits when namespace
is specified more often.
Different file type labels
We’re using api
and impl
for file types, and have test
as an open question.
We’ve considered using interface
instead of api
, but that introduces a terminology collision with interfaces in the type system.
We’ve considered dropping api
from naming, but that creates a definition from absence of a keyword. It also would be more unusual if both impl
and test
must be required, that api
would be excluded. We prefer the more explicit name.
We could spell out impl
as implementation
, but are choosing the abbreviation for ease of typing. We also don’t think it’s an unclear abbreviation.
We expect impl
to be used for implementations of interface
. This isn’t quite as bad as if we used interface
instead of api
because of the api
export syntax on entities, such as api fn DoSomething()
, which could create ambiguities as interface fn DoSomething()
. It may still confuse people to see an interface impl
in an api
file. However, we’re touching on related concepts and don’t see a great alternative.
Function-like syntax
We could consider more function-like syntax for import
, and possibly also package
.
For example, instead of:
import Math library "Stats";
import Algebra as A;
We could do:
import("Math", "Stats").Math;
alias A = import("Algebra").Algebra;
Or some related variation.
Advantages:
- Allows straightforward reuse of
alias
for language consistency. - Easier to add more optional arguments, which we expect to need for interoperability and URLs.
- Avoids defining keywords for optional fields, such as
library
.- Interoperability and package management may add more fields long-term.
Disadvantages:
- It’s unusual for a function-like syntax to produce identifiers for name lookup.
- This could be addressed by requiring alias, but that becomes verbose.
- There’s a desire to explicitly note the identifier being imported some way, as with
.Math
and.Algebra
above. However, this complicates the resulting syntax.
The preference is for keywords.
Inlining from implementation files
An implicit reason for keeping code in an api
file is that it makes it straightforward to inline code from there into callers.
We could explicitly encourage inlining from impl
files as well, making the location of code unimportant during compilation. Alternately, we could add an inline
file type which explicitly supports separation of inline code from the api
file.
Advantages:
- Allows moving code out of the main API file for easier reading.
Disadvantages:
- Requires compilation of
impl
files to determine what can be inlined from theapi
file, leading to the transitive closure dependency problems whichimpl
files are intended to avoid.
We expect to only support inlining from api
files in order to avoid confusion about dependency problems.
Library-private access controls
We currently have no special syntax for library-private APIs. However, non-exported APIs are essentially library-private, and may be in the api
file. It’s been suggested that we could either provide a special syntax or a new file type, such as shared_impl
, to support library-private APIs.
Advantages:
- Allows for better separation of library-private APIs.
Disadvantages:
- Increases language complexity.
- Dependencies are still an issue for library-private APIs.
- If used from the
api
file, the dependencies are still in the transitive closure of client libraries, and any separation may confuse users about the downsides of the extra dependencies. - If only used from
impl
files, then they could be in theimpl
file if there’s only one, or shared from a separate library.
- If used from the
- Generalized access controls may provide overlapping functionality.
At this point in time, we prefer not to provide specialized access controls for library-private APIs.
Managing API versus implementation in libraries
At present, we plan to have api
versus impl
as a file type, and also .carbon
versus .impl.carbon
as the file extension. We chose to use both together, rather than one or the other, because we expect some parties to strongly want file content to be sufficient for compilation, while others will want file extensions to be meaningful for the syntax split.
Instead of the file type split, we could drift further and instead have APIs in any file in a library, using the same kind of API markup.
Advantages:
- May help users who have issues with cyclical code references.
- Improves compiler inlining of implementations, because the compiler can decide how much to actually put in the generated API.
Disadvantages:
- While allowing users to spread a library across multiple files can be considered an advantage, we see the single API file as a way to pressure users towards smaller libraries, which we prefer.
- May be slower to compile because each file must be parsed once to determine APIs.
- For users that want to see only APIs in a file, they would need to use tooling to generate the API file.
- Auto-generated documentation may help solve this problem.
Multiple API files
The proposal also presently suggests a single API file. Under an explicit API file approach, we could still allow multiple API files.
Advantages:
- More flexibility when writing APIs; could otherwise end up with one gigantic API file.
Disadvantages:
- Encourages larger libraries by making it easier to provide large APIs.
- Removes some of the advantages of having an API file as a “single place” to look, suggesting more towards the markup approach.
- Not clear if API files should be allowed to depend on each other, as they were intended to help resolve cyclical dependency issues.
We particularly want to discourage large libraries, and so we’re likely to retain the single API file limit.
Name paths as library names
We’re proposing strings for library names. We’ve discussed also using name paths (My.Library
) and also restricting to single identifiers (Library
).
Advantages:
- Shares the form between packages (identifiers) and namespaces (name paths).
- Enforces a constrained set of names for libraries for cross-package consistency of naming.
Disadvantages:
- Indicates that a library may be referred to in code, when only the package and namespace are used for name paths of entities.
- The constrained set of names may also get in the way for some packages that can make use of more flexibility in naming.
We’ve decided to use strings primarily because we want to draw the distinction that a library is not something that’s used when referring to an entity in code.
Imports
Block imports
Rather than requiring an import
keyword per line, we could support block imports, as can be found in languages like Go.
In other words, instead of:
import Math;
import Geometry;
We could have:
imports {
Math,
Geometry,
}
Advantages:
- Allows repeated imports with less typing.
Disadvantages:
- Makes it harder to find files importing a package or library using tools like
grep
.
One concern has been that a mix of import
and imports
syntax would be confusing to users: we should only allow one.
This alternative has been declined because retyping import
statements is low-cost, and grep
is useful.
Block imports of libraries of a single package
We could allow block imports of libraries from the same package. For example:
import Containers libraries({
"FlatHashMap",
"FlatHashSet",
})
The result of this api alias
allowing Containers.HashSet()
to work regardless of whether HashSet
is in "HashContainers"
or "Internal"
may be clearer if both import Containers
statements were a combined import Containers libraries({"HashContainers", "Internal"});
.
The advantages/disadvantages are similar to block imports. Additional advantages/disadvantages are:
Advantages:
- If we limit to one import per library, then any
alias
of the packageContainers
is easier to understand as affecting all libraries.
Disadvantages:
- If we allow both
library
andlibraries
syntax, it’s two was of doing the same thing.- Can be addressed by always requiring
libraries
, removinglibrary
, but that diverges frompackage
’slibrary
syntax.
- Can be addressed by always requiring
This alternative has been declined for similar reasons to block imports; the additional advantages/disadvantages don’t substantially shift the cost-benefit argument.
Broader imports, either all names or arbitrary code
Carbon imports require specifying individual names to import. We could support broader imports, for example by pulling in all names from a library. In C++, the #include
preprocessor directive even supports inclusion of arbitrary code. For example:
import Geometry library "Shapes" names *;
// Triangle was imported as part of "*".
fn Draw(var Triangle: x) { ... }
Advantages:
- Reduces boilerplate code specifying individual names.
Disadvantages:
- Loses out on parser benefits of knowing which identifiers are being imported.
- Increases the risk of adding new features to APIs, as they may immediately get imported by a user and conflict with a preexisting name, breaking code.
- As the number of imports increases, it can become difficult to tell which import a particular symbol comes from, or how imports are being used.
- Arbitrary code inclusion can result in unexpected code execution, a way to create obfuscated code and a potential security risk.
We particularly value the parser benefits of knowing which identifiers are being imported, and so we require individual names for imports.
Direct name imports
We could allow direct imports of names from libraries. For example, under the current setup we might see:
import Math library "Stats";
alias Median = Stats.Median;
alias Mean = Stats.Mean;
We could simplify this syntax by augmenting import
:
import Math library "Stats" name Median;
import Math library "Stats" name Mean;
Or more succinctly with block imports of names:
import Math library "Stats" names {
Median,
Mean,
}
Advantages:
- Avoids an additional
alias
step.
Disadvantages:
- With a single name, this isn’t a significant improvement in syntax.
- With multiple names, this runs into similar issues as block imports.
Optional package names
We could allow a short syntax for imports from the current library. For example, this code imports Geometry.Shapes
:
package Geometry library "Operations" api;
import library "Shapes";
Advantages:
- Reduces typing.
Disadvantages:
- Makes it harder to find files importing a package or library using tools like
grep
. - Creates two syntaxes for importing libraries from the current package.
- If we instead disallow
import Geometry library "Shapes"
from withinGeometry
, then we end up with a different inconsistency.
- If we instead disallow
Overall, consistent with the decision to disallow block imports, we are choosing to require the package name.
Namespaces
File-level namespaces
We are providing entity-level namespaces. This is likely necessary to support migrating C++ code, at a minimum. It’s been discussed whether we should also support file-level namespaces.
For example, this is the current syntax for defining Geometry.Shapes.Circle
:
package Geometry library "Shapes" api;
namespace Shapes;
struct Shapes.Circle;
This is the proposed alternative syntax for defining Geometry.Shapes.Circle
, and would put all entities in the file under the Shapes
namespace:
package Geometry library "Shapes" namespace Shapes api;
struct Circle;
Advantages:
- Reduces repetitive syntax in the file when every entity should be in the same, child namespace.
- Large libraries and packages are more likely to be self-referential, and may pay a disproportionate ergonomics tax that others wouldn’t see.
- Although library authors could also avoid this repetitive syntax by omitting the namespace, that may in turn lead to more name collisions for large packages.
- Note that syntax can already be reduced with a shorter namespace alias, but the redundancy cannot be eliminated.
- Reduces the temptation of aliasing in order to reduce verbosity, wherein it’s generally agreed that aliasing creates inconsistent names which hinder readability.
- Users are known to alias long names, where “long” may be considered anything over six characters.
- This is a risk for any package that uses namespaces, as importers may also need to address it.
Disadvantages:
- Encourages longer namespace names, as they won’t need to be retyped.
- Increases complexity of the
package
keyword. - Creates two ways of defining namespaces, and reuses the
namespace
keyword in multiple different ways.- We generally prefer to provide one canonical way of doing things.
- Does not add functionality which cannot be achieved with entity-level namespaces. However, the converse is not true: entity-level control allows a single file to put entities into multiple namespaces.
- Creates a divergence between code as written by the library maintainer and code as called.
- Calling code would need to specify the namespace, even if aliased to a shorter name. Library code gets to omit this, essentially getting a free alias.
We are choosing not to provide this for now because we want to provide the minimum necessary support, and then see if it works out. It may be added later, but it’s easier to add features than to remove them.
Scoped namespaces
Instead of including additional namespace information per-name, we could have scoped namespaces, similar to C++. For example:
namespace absl {
namespace numbers_internal {
fn SafeStrto32Base(...) { ... }
}
fn SimpleAtoi(...) {
...
return numbers_internal.SafeStrto32Base(...);
...
}
}
Advantages:
- Makes it easy to write many things in the same namespace.
Disadvantages:
- It’s not clear which namespace an identifier is in without scanning to the start of the file.
- It can be hard to find the end of a namespace. For examples addressing this, end-of-namespace comments are called for by both the Google and Boost style guides.
- Carbon may disallow the same-line-as-code comment style used for this. Even if not, if we acknowledge it’s a problem, we should address it structurally for readability.
- This is less of a problem for other scopes, such as functions, because they can often be broken apart until they fit on a single screen.
There are other ways to address the con, such as adding syntax to indicate the end of a namespace, similar to block comments. For example:
{ namespace absl
{ namespace numbers_internal
fn SafeStrto32Base(...) { ... }
} namespace numbers_internal
fn SimpleAtoi(...) {
...
return numbers_internal.SafeStrto32Base(...);
...
}
} namespace absl
While we could consider such alternative approaches, we believe the proposed contextless namespace approach is better, as it reduces information that developers will need to remember when reading/writing code.
Rationale
This proposal provides an organizational structure that seems both workable and aligns well with Carbon’s goals:
- Distinct and required top-level namespace – “package”s from the proposal – both matches software best practices for long-term evolution, and avoids complex and user-confusing corner cases.
- Providing a fine-grained import structure as provided by the “library” concept supports scalable build system implementations while ensuring explicit dependencies.
- The structured namespace facilities provide a clear mechanism to migrate existing hierarchical naming structures in C++ code.
Open questions
Should we switch to a library-oriented structure that’s package-agnostic?
- Decision: No.
- Rationale: While this would simplify the overall set of constructs needed, removing the concept of a global namespace remained desirable and would require re-introducing much of the complexity around top-level namespaces. Overall, the simplification trade-off didn’t seem significantly better.
Should there be a tight association between file paths and packages/libraries?
- Decision: Yes, for the API files in libraries. Specifically, the library name should still be written in the source, but it should be checked to match – after some platform-specific translation – against the path.
- Note: Sufficient restrictions to result in a portable and simple translation on different filesystems should be imposed, but the Core team was happy for these restrictions to be developed as part of implementation work.
- Rationale: This will improve usability and readability for users by making it obvious how to find the files that are being imported. Similarly, this will improve tooling by increasing the ease with which tools can find imported APIs.