2. Language Reference

This section gives a systematic overview of the NaBL2 language.

2.1. Lexical matters

2.1.1. Identifiers

Most identifiers in NaBL2 fall into one of two categories, which we will refer to as:

  • Lowercase identifiers, that start with a lowercase character, and must match the regular expression [a-z][a-zA-Z0-9\_]*.
  • Uppercase identifiers, that start with an uppercase character, and must match the regular expression [A-Z][a-zA-Z0-9\_]*.

2.1.2. Comments

Comments in NaBL2 follow the C-style:

  • // ... single line ... for single-line comments
  • /* ... multiple lines ... */ for multi-line comments

Multi-line comments can be nested, and run until the end of the file when the closing */ is omitted.

2.2. Terms and patterns

term = ctor-id "(" {term ","}* ")"
     | "(" {term ","}* ")"
     | "[" {term ","}* "]"
     | "[" {term ","}* "|" term "]"
     | namespace-id? "{" term ("@" var-id)? "}"

pattern = ctor-id "(" {pattern ","}* ")"
        | "(" {pattern ","}* ")"
        | "[" {pattern ","}* "]"
        | "[" {pattern ","}* "|" pattern "]"
        | "_"
        | var-id

2.3. Modules

module module-id

  section*

NaBL2 specifications are organized in modules. A module is identified by a module identifier. Module identifiers consist of one or more names seperated by slashes, as in {name "/"}+. The names must match the regular expression [a-zA-Z0-9\_][a-zA-Z0-9\_\.\-]*.

Every module is defined in its own file, with the extensions .nabl2. The module name and the file paths must coincide.

Example. An empty module analysis/main, defined in a file .../analysis/main.nabl2.

module analysis/main

// work on this

Modules consist of sections for imports, signatures, and rule definitions. The rest of this section describes imports, and subsequents sections deal with signatures and rules.

2.3.1. Imports

imports

  module-ref*

A module can import definitions from other modules be importing the other module. Imports are specified in an imports section, which lists the modules being imported. A module reference can be:

  • A module identifier, which imports a single module with that name.
  • A wildcard, which imports all modules with a given prefix. A wildcard is like a module identifier, but with a dash as the last part, as in {name "/"}+ "/-".

A wildcard import does not work recursively. For example, analysis/- would imports analysis/functions, and analysis/classes, but not analysis/lets/recursive.

Example. A main module importing several submodules.

module main

imports

   builtins
   functions/-
   classes/-
   types

2.4. Signatures

signatures

  signature*

Signatures contain definitions and parameters used in the specification. In the rest of this section, signatures for terms, name binding, functions and relations, and constraint rules are described.

2.4.1. Terms

Terms in NaBL2 are multi-sorted, and are defined in the sorts and constructors signatures.

Sorts

sorts

  sort-id*

Available since version 2.3.0

The sorts signature lists the sorts that are available. Sort are identified by uppercase identifiers.

Example. Module declaring a single sort Type.

module example

signature

  sorts Type

Constructors

constructors

  ctor-def*

Constructors are defined in a constructors signature, and identified by uppercase identifiers. Constructor definitions are written as follows:

  • Nullary constructors are defined using ctor-id ":" sort-id.
  • N-ary constructors are defined using ctor-id ":" {sort-ref "*"}+ "->" sort-id.

Sort references can refer to sorts defined in the signature, or to several builtin sorts. One can refer to the following sorts:

  • User-defined sorts using its sort-id.
  • Tuples using "(" {sort-ref "*"}* ")".
  • Lists using "list(" sort-ref ")".
  • Maps using "map(" sort-ref "," sort-ref ")".
  • Generic terms using the "term" keyword. The term sort contains all possible terms, and can be seen as a supertype of all other sorts.
  • Strings using the "string" keyword.
  • Scopes using the "scope" keyword.
  • Occurrences using the "occurrence" keyword.
  • Sort variables are written using lowercase identifiers.

For example, a module specifying the types for a language with numbers, functions, and records identified by scopes, might look like this:

module example

signature

   sorts Type

   constructors
     NumT : Type
     FunT : Type * Type -> Type
     RecT : scope -> Type

2.4.2. Name binding

Two signatures are relevant for name binding. One describes namespaces, that are used for occurrences, and one describes the parameters for name resolution.

Namespaces

namespaces

  namespace-def*

Namespaces are defined in the namespaces signature. Namespaces are identified by uppercase identifiers. A namespace definition has the following form: namespace-id (":" sort-ref)? properties?. The optional ":" sort-ref indicates the sort used for the types of occurrences in this namespace.

Other properties of occurrences in this namespace, are specified as a block of the form "{" {(prop-id ":" sort-ref) ","}* "}". Properties are identified by lowercase identifiers, and type is a reserved property keyword that cannot be used.

The following example defines three namespaces: 1) for modules, without a type or properties, 2) for classes, which has a property to record the body of the class, and 3) for variables, which has a type property, of sort Type. For completeness the sort declaration for Type is shown as well.

module example

signature

  sorts Type

  namespaces
    Module
    Class { body : term }
    Var : Type

Name resolution

name resolution
  labels
    label-id*
  order
    {label-order ","}*
  well-formedness
    label-regexp

Name resolution parameters are specified in a name-resolution signature. Note that this block can only be specified once per project.

Edge labels are specified using the labels keyword, followed by a list of uppercase label identifiers. The label "D" is reserved and signifies a declaration in the same scope.

The specificity order on labels is specified using the order keyword, and a comma-separated list of label-ref "<" label-ref pairs. Label references refer to a label identifier, or the special label D.

Finally, the well-formedness predicate for paths is specified as a regular expression over edge labels, after the well-formedness keyword. The regular expression has the following syntax:

  • A literal label using its label-id.
  • Empty sequence using "e".
  • Concatenation with regexp regexp.
  • Optional (zero or one) with regexp "?".
  • Closure (zero or more) with regexp "*".
  • Non-empty (one or more) with regexp "+".
  • Logical or with regexp "|" regexp.
  • Logical and with regexp "&" regexp.
  • Empty language using "0", i.e., this will not match on anything.
  • Parenthesis, written as "(" regexp ")" , can be used to group complex expressions.

The following example shows the default parameters, that are used if no parameters are specified:

name resolution
  labels
    P I

  order
    D < P,
    D < I,
    I < P

  well-formedness
    P* I*

2.4.3. Functions and relations

Functions

functions

  ( function-id (":" sort-ref "->" sort-ref )?
     ("{" {function-case ","}* "}")? )*
function-case = pattern "->" term

Functions available at constraint time are defined in a functions signature. A function is identified by a name, followed by a type and the function cases. The cases are rewrite rules from the match in the left, to the term on the right. The function cases need to be linear, which all the variables mentioned in the right-hand side term have to be bound in the left-hand side pattern.

The type is currently not checked, but can be used to document to sorts of the elements in the function.

Example. A module that defines the left and right projection functions for pairs.

module example

signature

  functions
    left : (Type * Type) -> Type {
      (x, y) -> x
    }
    right : (Type * Type) -> Type {
      (x, y) -> y
    }

Relations

relations

  ( relation-option* relation-id
     (":" sort-ref "*" sort-ref)?
     ("{" {variance-pattern ","}* "}")? )*
relation-option = "reflexive" | "irreflexive"
                | "symmetric" | "anti-symmetric"
                | "transitive" | "anti-transitive"

variance-pattern = ctor-id "(" {variance ","}* ")"
                 | "[" variance  "]"
                 | "(" {variance ","}* ")"

variance = "="
         | "+"relation-id?
         | "-"relation-id?

The relations that are available are defined in a relations signature. A relation is identified by a name, possibly preceded by properties of the relation, and followed by an optional type and special cases for specific constructors.

The properties that are specificied are enforced at runtime. The positive properties (reflexive, symmetric, and transitive) ensure that all pairs that were not explicitly added to the relation are inferred. The negative properties (irreflexive, anti-symmetric, and anti-transitive) are checked when adding a pair to the relation, and result in an error in the program if violated. The positive and negative properties are mutually exclusive. For example, it is not allowed to specify both reflexive and irreflexive at the same time.

The type specified for the relation is currently not checked, but can be used to document the sorts of the elements in the relation.

Variance patterns are used to specify general cases for certain constructors. This can be used, for example, to add support for lists, that are checked pair-wise.

Example. Module below defines a reflexive, transitive, anti-symmetric subtype relation sub, with the common variance on function types, and covariant type lists.

module example

signature

  relations
     reflexive, transitive, anti-symmetric sub : Type * Type {
       FunT(-sub, +sub),
       [+sub]
     }

2.4.4. Rules

constraint generator

  rule-def*

The type signatures for constraint generation rules are defined in a constraint generator signature. Rule signatures describe the sort being matched, the sorts of any parameters, and optionally the sort of the type. A rule signature is written as rule-id? "[[" sort-ref "^" "(" {sort-ref ","}* ")" (":" sort-ref)?  "]]". Rules are identified by uppercase identifiers.

The following example shows a module that defines a default rule for expressions, and rules for recursive and parallel bindings. The rule for expressions has one scope parameter, and expressions are assigned a type of sort Type. The bind rules are named, and match on the same AST sort Bind. They take two scope parameters, and do not assign any type to the bind construct.

module example

signature

  constraint generator
    [[ Expr ^ (scope) : Type ]]
    BindPar[[ Bind ^ (scope, scope) ]]
    BindRec[[ Bind ^ (scope, scope) ]]

NaBL2 supports higher-order rules. In those cases, the rule-id is extended with a list of parameters, written as rule-id "(" {rule-id ","}* ")".

For example, the rule that applies some rule, given as a parameter X, to the elements of a list has signature Map1(X)[[ a ^ (b) ]]. Note that we use variables a and b for the AST and parameter sort respectively, since the map rule is polymorphic.

2.5. Rules

rules

  rule*

The rules section of a module defines syntax-directed constraint generation rules.

2.5.1. Init rule

init ^ ( {parameter ","}* ) (":" type)? := {clause ","}+ .
init ^ ( {parameter ","}* ) (":" type)? .

Constraint generation starts by applying the default rule to the top-level constructor. The init rule, which must be specified exactly once, provides the initial values to the parameters of the default rule.

Init rules come in two variants. The first variant outputs rule clauses. These can create new scopes, or defined constraints on top-level declarations. If the rule has no clauses, the rule can be closed without a clause definition. For example, init ^ (). is shorthand for init ^ () := true.

In the example module below, the default rule takes one scope parameter. The init rule creates a new scope, which will be used as the initial value for constraint generation.

module example

rules

  init ^ (s) := new s.

  [[ t ^ (s) ]].

2.5.2. Generation rules

rule-def? [[ pattern ^ ( {parameter ","}* ) (":" type)? ]] := {clause ","}+ .
rule-def? [[ pattern ^ ( {parameter ","}* ) (":" type)? ]] .
rule-def = rule-id ("(" {rule-id ","}* ")")?

Constraint generation rules are defined for the different syntactic constructs in the object language. Rules can accept a number of parameters and an optional type. The parameters are often used to pass around scopes, but can be used for other parameters as well. The rule clause consists of a comma-separated list of constraints.

Rules can be named to distinguish different versions of a rule for the same syntactic construct. Named rules can also accept rule parameters, which makes it possible to write higher-order rules. For example, the Map(X)[[ list(a) ^ (b) ]] rule accepts as argument the rule that will be applied to the elements in the list. Note that only a single rule with a certain name can be defined per AST pattern.

Rules are distinguished by name and arity, so Map1 is different from Map1(X). There is no overloading based on the number of parameters, or the presence or absence of a type.

All variables in the rule’s clauses that are not bound in the pattern, the parameters, the type, or a new directive, are automatically inferred to be unification variables.

The rule form without clauses is equal to a rule that simply return true. For example, [[ Int(_) ^ (s) : IntT() ]]. is shorthand for [[ Int(_) ^ (s) : IntT() ]] := true..

Recursive calls

clause = rule-ref "[[" var "^" "(" {var ","}* ")" (":" term)? "]]"

rule-ref = rule-id ("(" {rule-ref ","}* ")")?
         | "default"

Recursive calls are used to invoke constraint generation for subterms of the current term. Recursive calls can only be made on parts of the program AST, therefore the term argument needs to be a variable that is bound in the current match pattern.

If no rule name is specified, the default rule will be called. Rules that are applied are selected based on the name and the term argument. To pass the default rule as an argument to a higher-order rule, the default keyword is used.

There is no overloading on the number of parameters or the presence or absence of a type. Calling a rule with the wrong number of parameters will result in errors during constraint collection.

Delegating to other rules is only supported if the delegate has the same parameters and type as the rule that is delegating.

Example. A module defining and calling different rules.

module example

rules

  [[ Int(_) ^ (s) : IntT() ]].

  Map1[[ [x|xs] ^ (s) : [ty|tys] ]] :=
    [[ x ^ (s) : ty ]],        // call default rule on head
    Map1[[ xs ^ (s) : tys ]].  // recurse on tail

  Map1(X)[[ [x|xs] ^ (s) : [ty|tys] ]] :=
    X[[ x ^ (s) : ty ]],         // call rule X on head
    Map1(X)[[ xs ^ (s) : tys ]]. // recurse on tail, passing on X

The rule Map1 could also be defined in terms of Map1(X) as follows:

module example

rules

  Map1[[ xs ^ (s) : tys ]] :=
    Map1(default)[[ xs ^ (s) : tys ]].

2.6. Constraints

This section gives an overview of the different constraints that can be used in clauses of constraint rules.

2.6.1. Base constraints & error messages

clause = "true"
       | "false" message?

message = "|" message-kind message-content message-position?

message-kind     = "error" | "warning" | "note"
message-content  = "\"" chars "\""
                 | "$[" (chars | "[" term "]")* "]"
message-position = "@" var-id

The two basic constraints are true and false. The constraint true is always satisfied, while false is never satisfied.

The message argument to a constraint specifies the error message that is displayed if the constraint is not satisfied. The severity of the error can be specified to be error, warning or note. The message itself can either be a simple string, or an interpolated string that can match terms and variables used in the rule. By default the error will appear on the match term of the rule, but using the @t syntax the location can be changed to t. The variable t needs to be bound in the AST pattern of the rule.

Example. Some constraints with different ways of specifying error messages.

false | error @t1                    // generic error on whole term
false | note "Simple note"           // specific note on whole term
false | warning $[Consider [t2]] @t1 // formatted warning on first subterm

2.6.2. Term equality

clause = term "==" term message?
       | term "!=" term message?

Equality of terms is specified used equality and inequality constraints. An equality constraint t1 == t2 specifies that the two terms need to be equal. If the terms contain variables, the solver infers values for them using unificiation. If unification leads to any conflicts, an error will be reported.

Inequality is specified with a constraint of the form t1 != t2. Inequality constraints cannot always be solved if both sides contain variables. The inequality between two variables depends on the values that will be inferred for them. Only after a value is assigned, will the inequality be tested. If the constraint cannot be solved, because some variables have remained free, an error is reported as well.

Example. A few constraints for term (in)equality.

ty == FunT(ty1, ty2) | error $[Expected function type, but got [ty]]
ty != NilT()

2.6.3. Name binding

Name binding is concerned with constraints for building a scope graph, constraints for resolving references and accessing properties of declarations, and name sets that talk about sets of declarations or references in the scope graph.

Occurrences

occurrence = namespace-id? "{" term position? "}"

position   = "@" var-id

References and declarations in the scope graph are not simply names, but have a richer representation called an occurrence. An occurrence consists of the name, the namespace, and a position.

The name can be any term, although usually it is a term from the AST. Names are not restricted to strings, and can contain terms with subterms if necessary. However, it is required that the name contains only literals, or variables that are bound in the match pattern.

Namespaces allow us to separate different kinds of names, so that type names do not resolve to variables or vice versa. If there is only one namespace in a language, it can be omitted and the default namespace will be used.

The position is necessary to differentiate different occurrences of the same name in the program, and is the connection between the AST and the scope graph. The position can usually be omitted, in which case the position of the name is taken if it is an AST term, or the position of the match term, if the name is a literal.

Example. Several of occurrences, with explicit and implicit compontents.

Var{x}         // variable x, x must be bound in match
{y}            // no namespace, y must be bound in match
Type{a}        // type a, a must e bound in match
Var{"this" @c} // this variable with explicit position
Var{This()}    // this variable using a constructor instead of a string
Type{"Object"} // literal type occurrence

Scope graph

clause = occurrence "<-" scope
       | occurrence "->" scope
       | scope "-"label-id"->" scope
       | occurrence "="label-id"=>" scope
       | occurrence "<="label-id"=" scope
       | "new" var-id*

Scope graph constraints construct a scope graph. Names in the graph are represented by occurrences. Scopes in the graph are abstract, but can be created using the new directive. Rules usually receive scope parameters that allows them to extend and connect to the surrounding scope graph.

The following scope graph constraints are available:

  • Declarations are introduced with d <- s. The arrow points from the scope to the declaration, which indicates that the occurrence is reachable from scope s.
  • References are introduced with r -> s. The arrow points from the reference to the scope, indicating that the reference should be resolved in scope s. Note that a reference in the scope graph is not automatically required to resolve to a declaration. To check that, a resolution constraint needs to be specified.
  • A direct edge from one scope to another is written as s1 -l-> s2. This indicates that the scope s2 is reachable from s1. If the label is omitted, as in s1 ---> s2, the label P is used implicitly.
  • An associated scope, written as d =l=> s, exports the declarations visible in s via the declaration d. If the label is omitted, as in d ===> s, the label I is used implicitly.
  • Declarations from an associated scope can be imported from reference r with an import edge, written as r <=l= s. Given that the reference r resolves to a declaration d, and d has an associated scope edge labeled l to a scope s', the declarations visible in s' will become visible in s. Similarly to the associated scope edge, if the label on an import edge is omitted, as in r <=== s, the label I is used implicitly. Note that the import edge does not specify where the imported reference has to be resolved. Make sure that the reference itself is also added to the graph with a reference edge.
  • Scopes need to be explicitly created. This is done with the the new s constraint, which creates a new scope and binds it to the variable s.

A few restrictions apply to the scope graph. An occurrence can only be a reference in one scope. Similarly, an occurrence can only be a declaration in one scope. However, it is possible to use an occurrence as both a declaration and a reference.

It is important to note that all scope graph constraints define a scope graph. This means that the scopes or occurrences generally should not contain constraint variables. The exception is the target scope in a direct edge, or the reference in an import edge. This allows modeling type-dependent name resolution.

Example. Rules that build a scope graph.

[[ Module(x,imps,defs) ^ (s) ]] :=
  Mod{x} <- s,                       // module declaration
  new ms,                            // new scope for the module
  Mod{x} ===> ms,                    // associate module scope with the declaration
  Map2[[ imps ^ (ms, s) ]],          // recurse on imports
  Map1[[ defs ^ (ms) ]].             // recurse on statements

[[ Import(x) ^ (ms, s) ]] :=
  Mod{x} -> s,                       // module reference
  Mod{x} <=== ms.                    // import in the module scope

[[ TypeDef(x,_) ^ (s) ]] :=
  Type{x} <- s.                      // type declaration

[[ TypeRef(x) ^ (s) ]] :=
  Type{x} -> s.                      // type reference

[[ VarDef(x, t) ^ (s) ]] :=
  Var{x} <- s,                       // variable declaration
  [[ t ^ (s) ]].                     // recurse on type annotation

[[ VarRef(x) ^ (s) ]] :=
  Var{x} -> s.                       // variable reference

Name resolution

clause = occurrence "|->" occurrence message?
       | occurrence "?="label-id"=>" scope message?
       | occurrence"."prop-id ":=" term priority? message?
       | occurrence ":" type priority? message?

priority = "!"*

The following constraints are available to resolve references to declarations, and specify properties of declarations:

  • Resolution of a reference occurrence r to a declaration occurrence d is written as r |-> d. Resolution constraints can infer the declaration given a reference, however not the other way around. A resolution constraint requires that a reference resolves to exactly one declaration, or else it will fail.
  • A lookup of the associated scope s of a declaration d is done with d ?=l=> s, where l is the edge label. If the label is omitted, as in d ?===> s, the label I is used implicitly.
  • A property p of a declaration d is specified with d.p := t. If multiple constraints for the same property of the same declaration exist, their values will be unified. Priorities can be used to guide where errors will be reported. If two constraints for the same property of the same declaration are in conflict, the error will more likely be reported on the constraint with the least priority annotations.
  • A special form exists for the type property, which is preferably written as d : t.

Example. Rules that include resolution constraints and declaration types.

[[ TypeDef(x,t) ^ (s) ]] :=
  Type{x} <- s,                      // type declaration
  Type{x} : t !.                     // semantics type of the type declaration

[[ TypeRef(x) ^ (s) : ty ]] :=
  Type{x} -> s.                      // type reference
  Type{x} |-> d,                     // resolve reference
  d : ty.                            // semantic type of the resolved declaration

[[ VarDef(x, t) ^ (s) ]] :=
  Var{x} <- s,                       // variable declaration
  [[ t ^ (s) : ty ]],                // recurse on type annotation
  Var{x} : ty !,                     // type of the variable declaration
  [[ t ^ (s) ]].                     // recurse on type annotation

[[ VarRef(x) ^ (s) : ty ]] :=
  Var{x} -> s,                       // variable reference
  Var{x} |-> d,                      // resolve variable reference
  d : ty.                            // type of resolved declaration

Name sets

name-set = "D(" scope ")"("/"namespace-id)?
         | "R(" scope ")"("/"namespace-id)?
         | "V(" scope ")"("/"namespace-id)?
         | "W(" scope ")"("/"namespace-id)?

set-proj = "name"

message-position = "@NAMES"

Name sets are set expressions (see Sets) that are based on the scope graph. They can be used in set constraints to test for properties such as duplicate names, shadowing, or complete coverage.

The expression D(s) represents the set of all declarations in scope s, and similarly R(s) refers to all references in scope s. The set of all visible declarations in s is represented by V(s), and the set of all reachable declarations in s is represented by W(s). The difference between visible and reachable declarations is that de former takes the label order into account to do shadowing, while the latter only considers path well-formedness but does not shadow.

All the name sets can also be restricted to declarations or references in a particular namespace by appending a forward-slash and the namespace. For example, all variable declarations in scope s would be represented by D(s)/Var.

Name sets support the name set projection that makes it possible to compare occurrences by name only, ignoring the namespace and position.

When using set constraints on name sets, there are some options to relate the error messages to the elements in the set, instead of the term where the constraint was created. First, the position in the message can be set to @NAMES. This makes error messages appear on the names in the set. For example, distinct/name D(s)/Var | error @NAMES will report errors on all the names that are duplicate. If you want to refer to the name in the error message, use the NAME keyword. For example, in the message error $[Duplicate name [NAME]] @NAMES, the keyword will be replaced by the correct name.

2.6.4. Sets

clause = "distinct"("/"set-proj)? set-expr message?
       | set-expr "subseteq"("/"set-proj)? set-expr message?
       | set-expr "seteq"("/"set-proj)? set-expr message?

set-expr = "0"
         | "(" set-expr "union" set-expr ")"
         | "(" set-expr "isect"("/"set-proj)? set-expr ")"
         | "(" set-expr "minus"("/"set-proj)? set-expr ")"
         | name-set

Set constraints are used to test for distinct elements in a set, or subset relations between sets. Set expressions allow the usual set operations such as union and intersection.

The constraint distinct S check whether any elements in the set S appears multiple times. Note that this works because the sets behave like multisets, so every element has a count associated with it as well. If the set S supports projections, it is possible to test whether the set contains any duplicates after projection. Which projections are available depends on the sets involved. For example, when working with sets of occurrences (see Name sets), the name projection can be used.

Constraints S1 subseteq S2 and S1 seteq S2 test whether S1 is a subset of S2, or if the sets are equal, respectively. Both constraints also support projections, written as S1 subseteq/proj S2 and S1 seteq/proj S2.

The basic set expressions that are supported are 0 for the empty set, (S1 union S2) for set union, (S1 isect S2) for intersection, and (S1 minus S2) for set difference. Set intersection and difference can also be performed under projection, written as (S1 isect/proj S2) and (S1 minus/proj S2). This means the comparison to check if elements from the two sets match is done after projecting the elements. However, the resulting set will still contain the original elements, not the projections.For example, this can be used to compute sets of occurrences where a comparison by name is necessary.

Example. Set constraints over name sets.

distinct/name D(s)/Var // variable names in s must be unique

2.6.5. Functions and relations

clause = term "<"relation-id?"!" term message?
       | term "<"relation-id?"?" term message?
       | term "is" function-ref "of" term message?

function-ref = function-id
             | relation-id".lub"
             | relation-id".glb"

2.6.6. Symbolic

clause = "?-" term
       | "!-" term