The Linking phase of the Compilation Pipeline takes place after all nodes have been parsed and combines them to form a single Environment, where all nodes are uniquely identified and all References are traceable to their target Nodes.
Up to the Linking phase, the Compilation Pipeline throughput might consist of many independent ASTs with Package nodes as roots, either obtained from parsing sourcefiles or generated manually. As in any tree, the Nodes in these “fragments” of ASTs are only connected to their direct children, which makes them hard to navigate and limits their behavior. They might also contain duplicated or redundant elements that need to be consolidated, or reused Node instances that have to be splitted in order to work in their different contexts. Some Nodes can also be hard to distiguish from each other, making them nearly impossible to trace.
The Linker’s main purpose consist in joining these independent AST branches into a single intertwined structure that is consistent and can be easily navigated.
The complete linking process can be summarized in the following general steps:
All the isolated Package need to be combined into a single unified AST. To do this the Linker synthesizes an Environment node to act as the unifying Root. All the received Package structures are recursively combined and set as children of the Environment. Any two packages that would end having the same Fully Qualified Name are combined into a single one containing the union of the other two.
If two merged packages contain Entities with the same names, the former will be replaced by the later. This allows us to extend Packages from different file sources and substitute any unwanted definitions at linking time.
:bulb: The Linker can, optionally, receive one “base” Environment as parameter. All the contents of this Environment will be included at the very start of the merging process, allowing its content to be easily replaced by the given Packages. This is useful when you need to extend or change an already linked AST.
The linking process assigns each node an UUID(v4) on their id
attribute. These ids are meant to be unique identifiers for the nodes and can’t be shared, even by nodes on different Environments.
If a previous Environment is provided to the Linker to re-link, a deep copy with new ids is made to use during the process.
⚠️ Once a Node is assigned an id, its methods are susceptible of being cached, so copying a linked node without changing it’s id is ill-advised.
To optimize times and facilitate navigation, each Node is provided with a soft reference to its parent and environment. The Node is also stored in an id-indexed cache on the Environment for fast id-based queries.
The Scope of a node is a record of all possible referenceable names available for that node, associated to the Nodes those names target. This information is used primarily to identify the target of References and make navigation on the Environment smother, but might also be used as metadata to understand a Node’s Lexical Scope.
Constructing the scopes is, by far, the most complex and time-consuming process of the linking phase, and should be approached with caution. Here are some key aspects to understand it:
After Scopes are assigned, every Reference node in the Environment should be able to unmistakably identify the Node it targets. However, some References might be unable to do. There are many reasons why this could happen: from a typo in the Reference’s name to a missing import statement. Whatever the cause, the Reference is considered broken and a new entry is registered in it’s problems
attribute.
⚠️ Notice that a Broken Reference does not stop the Compilation Pipeline. This is done intentionally, to allow the creation of partially broken Environments that might be debugged and even executed. In order to ensure the resulting Environment is error-free it needs to be validated.
After the linking phase is finished the Linker outputs an Environment tree, which is considered to be in the Linked stage of the pipeline. From this point on, the AST should be considered frozen and no Nodes should be copied or modified in any way. If a change is needed on a Linked Node, that node (and thus, all the Environment) should be immediately re-linked to avoid miss-references and cache issues.
⚠️ A Linked Environment heavily depends on it’s internal cache! Do not copy or modify any node from a Linked Environment without re-linking the result.