The main goal of the Compilation Pipeline is to turn a text-based piece of Wollok code into more computer-friendly data representations, so it can be queried, manipulated and even executed with ease.
By far the most important of these structures is the Abstract Syntax Tree (or AST, for short). This immutable assortment of interconnected Nodes contains all the information distilled from the Static Model described in the source code. Each Node on the tree represents a core concept of Wollok’s syntax so any program can be represented with some combination of them.
Every Node has a unique id
, a kind
label identifying its type and a sourceMap
attribute that serves to link the Node to its original position in the source code. Some Nodes might also contain a list of problems
that arised during the compilation and indicate that some parts of it might be invalid or broken.
Even though all nodes represent a different syntactic concept, some of them can be grouped together based on their key characteristics. These aggrupations of node types are called Categories and are a nifty way to think about similar Nodes.
Next is a list of each Node type, grouped by their Categories.
This Category includes the top-level definitions of Wollok. These Nodes represent any declaration that can exists at package or file level and have a Fully Qualified Name that can be used to uniquely identify them.
Package Nodes represent all forms of Wollok packages. This includes the ones explicitly created using the package
keyword and the ones implicitly created by files and folder structure. They are the main containers of Entities.
Program Nodes represent all Wollok programs created with the program
keyword.
Test Nodes represent all Wollok tests created with the test
keyword. They can also be found as Describe children.
Variable Nodes represent both Wollok Constants (created using the const
keyword) and Wollok Variables (created using the var
keyword). These Nodes are, at the same time Entities and Sentences, but should not be confused with Fields, which are created using the same keywords but can only be defined in the context of a Module.
The Module category is a sub-category of Entity (meaning all Modules are Entities). These nodes act as Object definitions and Method Providers and can be linearized to take part in the Method Lookup process.
Class Nodes represent Wollok Classes defined with the class
keyword.
Singleton Nodes represent Wollok Stand-Alone Objects. This includes named and anonymous objects explicitly created using the object
keyword and some synthetic elements derived from other grammar constructions, but only named singletons are considered Entities.
Mixin Nodes represent Wollok Mixins defined with the mixin
keyword.
Describe Nodes represent Wollok test evaluation contexts defined with the describe
keyword. It might seem a bit odd to think of Describes as Modules since the don’t really represent object descriptions, but under the hood they have many of the same needs and behaviors because Describes are also method providers of a sort.
This Category includes all logic computations and conform the bulk of any Wollok AST. They are usually contained within the scope of a Body and constitute the building blocks for Methods, Tests and Programs.
See Variables.
Return nodes represent Wollok return statements created with the return
keyword.
Assignment nodes represent Wollok assignations statements created with the =
keyword or any of the Special Assignation Operators.
The Expression category is a sub-category of Sentence (meaning all Expressions are Sentences) and include all Node types representing statements which are guaranteed to return a value (as oposite of regular Sentences that might only produce an effect and return nothing).
Reference nodes represent any non-keyword identifier used to refer to some other term by its name. Everytime you use a previously defined Variable, Field or Parameter you do it through a Reference.
References are also commonly used as a sort of “pointer” between Nodes that need to be connected but don’t directly contain each other. Manifesting these relations is one of the main goals of the Linker Stage.
Self nodes represent Wollok self-reference created with the self
keyword.
Literal nodes represent any Wollok literal value. Its main purpose is to serve as a wrapper for all primitive values that we represent with abstractions from the host language:
number
.string
.boolean
.null
.Literals can also contain some non-native constructions:
[Reference<Class>, List<Expression>]
pair with a Reference to either wollok.lang.List
and wollok.lang.Set
and a list of Expression members.Send nodes represent a message chain. Each one of this nodes contain the Name of the sent message, along with the Expressions that conform the arguments and receiver (which, of course, could also be a Send node, thus the term “message chain”).
Super nodes represent a super-call statement created by the use of the super
keyword.
New nodes represent a Class instantiation created by the use of the new
keyword.
If nodes represent an “if” statement created by the use of the if-else
composed keywords.
Try nodes represent an “try” statement created by the use of the try-catch-finally
composed keyword. It contains a collection of Catch nodes representing the potential exception handlers, but these are not Sentence themselves.
Throw nodes represent the raise of an Exception explicitly created by the use of the throw
keyword. Even though this sentence does not strictly return a value it is considered an Expression because its evaluation inmediately stops the execution of the current frame, thus allowing it to be used in any place where a value is expected.
Some nodes are just too unique to be grouped in any way and don’t belong to any Category.
Catch nodes represent an exception handler defined by the catch
keyword. Even though their are not Expressions by themselves, they are always contained within a parent Try node.
Import nodes represent the inclussion into a package’s scope of an externally defined Entity by using the import
keyword.They can be marked with the isGeneric
flag that denotes that all the children of the referenced Entity are meant to be included instead of itself.
Body nodes represent a sequence of Sentences and are one of the main forms to define a lexical scope. Every Entity or Expression that contain a fragment of code potentially bigger than a single sentence, has a Body to contain it.
Parameter nodes represent the declared parameters of Methods or Catches. In the case of Methods, the last Parameter of the declaration can be marked as varArgs
, meaning the node represent a variable number of parameters and should be considered to contain a List.
NamedArgument nodes represent a Name-Expression pair, used to pass values bound to a name to New and ParameterizedType nodes.
ParameterizedType nodes are used to define the linearization of Modules and consist of a Module Reference and a list of NamedArguments.
Environment nodes are a special kind of node that does not relate to any syntactic construction. They act as the Root of the AST and can only be sintheticly created by the Linker.
Some Nodes are not directly derived from a syntactic element and cannot be directly mapped to a source file. Some of them, like the Environment or the accesor methods of Property Fields, are created as part of the Compilation Pipeline; others can be the result of direct manipulation of the AST by a program or IDE. Whatever the reason, these nodes are usually called Synthetic Nodes and can be identified by their lack of sourceMap
.
Some syntactic elements can easily be expressed in terms of others and don’t require their own kind of Node. These abstractions (such as Closures and some Special Assignation Operators), that are compiled into a combination of other constructions instead of having their own, are often refered as “Surrogated Nodes”.
The following diagram shows all the different nodes types, how they relate to each other, and a general overview of their most important attributes and responsibilities.