2. SDF3 Reference Manual

2.1. Modules

An SDF3 specification consists of a number of module declarations. Each module may import other modules and define sections that include sort, start symbols, syntax, restrictions, priorities, and template options.

Imports

Modules may import other modules for reuse or separation of concerns. A module may extend the definition of a non-terminal in another module. A module may compose the definition of a language by importing the parts of the language. The structure of a module is as follows:

module <ModuleName>

<ImportSection>*

<Section>*

The module keyword is followed by the module name, then a series of imports can be made, followed by sections that contain the actual definition of the syntax. An import section is structured as follows:

imports <ModuleName>*

Note that SDF3 does not support parameterized modules.

2.2. Symbols

The building block of SDF3 productions is a symbol. SDF3 symbols can be compared to terminals and non-terminals in other grammar formalisms. The elementary symbols are literals, sorts and character classes.

Intrinsically, only character classes are real terminal symbols. All other symbols represent non-terminals. SDF3 also support symbols that capture BNF-like notation such as lists and optionals. Note that these symbols are also non-terminals, and are just shorthands for common structures present in context-free grammars.

Character classes

Character classes occur only in lexical syntax and are enclosed by [ and ]. A character class consists of a list of zero or more characters (which stand for themselves) such as [x] to represent the character x, or character ranges, as an abbreviation for all the characters in the range such as [0-9] representing 0, 1, …, 9. A valid range consists of [c1-c2], where the character c2 has a higher ASCII code than c1. Note that nested character classes can also be concatenated within the same character class symbol, for example [c1c2-c3c4-c5] includes the characters c1 and the ranges c2-c3, c4-c5. In this case, the nested character classes do not need to be ordered, as SDF3 orders them when performing a normalization step.

Escaped Characters: SDF3 uses a backslash (\) as a escape for the quoting of special characters. One should use \c whenever c is not a digit or a letter in a character class.

Arbitrary Unicode code points can be included in a character class by writing an escaped integer, which is particularly useful for representing characters outside the printable ASCII range. The integer can be a binary, octal, decimal, or hexadecimal number, for example: \0b101010, \052, \42, and \0x2A all represent the code point 42, or the '*' character.

Additionally, special ASCII characters are represented by:

  • \t : horizontal tabulation
  • \n : newline character
  • \v : vertical tabulation
  • \f : form feed
  • \r : carriage return

Character Class Operators: SDF3 provides the following operators for character classes:

  • (complement) ~ : Accepts all the characters that are not in the original class.
  • (difference) / : Accepts all the characters in the first class unless they are in a second class.
  • (union) \/ : Accepts all the characters in either character classes.
  • (intersection) /\ : Accepts all the characters that are accepted by both character classes.

Note that the first operator is unary and the other ones are left associative binary operators. Furthermore, such operators are not applicable to other symbols in general.

Literals

A literal symbol defines a fixed length word. This usually corresponds to a terminal symbol in ordinary context-free grammars, for example "true" or "+". Literals must always be quoted and consist of (possibly escaped) ASCII characters.

As literals are also regular non-terminals, SDF3 automatically generates productions for them in terms of terminal symbols.

"definition" = [d][e][f][i][n][i][t][i][o][n]

Note that the production above defines a case-sensitive implementation of the defined literal. Case-insensitive literals are defined using single-quoted strings as in 'true' or 'else'. SDF3 generates a different production for case-insensitive literals as

'definition' = [dD][eE][fF][iI][nN][iI][tT][iI][oO][nN]

The literal above accepts case-insensitive inputs such as definition, DEFINITION, DeFiNiTiOn or defINITION.

Sorts

A sort correspond to a plain non-terminal, for example, Statement or Exp. Sort names start with a capital letter and may be follow by letters, digits, hyphens, or underscores. Note that unlike SDF2, SDF3 does not support parameterized sorts (yet!).

Sorts must be declared before they can be used. Context-free sorts are declared in a context-free sorts block:

context-free sorts
  Statement
  Exp

Lexical sorts must be declared in a lexical sorts block:

lexical sorts
  ID
  INT
  STRING

Optionals

SDF3 provides a shorthand for describing zero or exactly one occurrence of a sort by appending the sort with ?. For example, the sort Extends? can be parsed as Extends or without consuming any input. Internally, SDF3 generates the following productions after normalizing the grammar:

Extends?.None =
Extends?.Some = Extends

Note that using ? adds the constructors None and Some to the final abstract syntax tree.

Lists

Lists symbols as the name says, indicate that a symbol should occur several times. In this way, it is also possible to construct flat structures to represent them. SDF3 provides support for two types of lists, with and without separators. Furthermore, it is also possible to indicate whether a list can be empty (*) or should have at least one element (+). For example, a list Statement* indicates zero or more Statement, whereas a list with separator {ID ","}+ indicates one or more ID separated by ,. Note that SDF3 only supports literal symbols as separators.

Again, SDF3 generates the following productions to represent lists, when normalizing the grammar

Statement* =
Statement* = Statement+
Statement+ = Statement+ Statement
Statement+ = Statement

{ID ","}* =
{ID ","}* = {ID ","}+
{ID ","}+ = {ID ","}+ "," {ID ","}
{ID ","}+ = {ID ","}

When parsing a context-free list, SDF3 produces a flattened list as an AST node such as [Statement, ..., Statement] or [ID, ..., ID]. Note that because the separator is a literal, it does not appear in the AST.

Alternative

Alternative symbols express the choice between two symbols, for example, ID | INT. That is, the symbol ID | INT can be parsed as either ID or INT. For that reason, SDF3 normalizes alternatives by generating the following productions:

ID | INT = ID
ID | INT = INT

Note that SDF3 only allow alternative symbols to occur in lexical syntax. Furthermore, note that the alternative operator is right associative and binds stronger than any operator. That is, ID "," | ID ";" expresses ID ("," | ID) ";". To express (ID ",") | (ID ";"), we can use a sequence symbol.

Sequence

A sequence operator allows grouping of two or more symbols. Sequences are useful when combined with other symbols such, lists or optionals, for example ("e" [0-9]+)?. Like alternative symbols, sequences can only occur in lexical syntax. A sequence symbol is normalized as:

("e" [0-9]+) = "e" [0-9]+

Labeled symbols

SDF3 supports decorating symbols with labels, such as myList:{elem:Stmt ";"}*. The labels have no semantics but can be used by other tools that use SDF3 grammars as input.

LAYOUT

The LAYOUT symbol is a reserved sort name. It is used to indicate the whitespace that can appear in between context-free symbols. The user must define the symbol LAYOUT such as:

LAYOUT = [\ \t\n]

Note that the production above should be defined in the lexical syntax.

2.3. Syntax

As seen before, a SDF3 module may constitute of zero or more sections. All sections contribute to the final grammar that defines a language. Sections can define production rules, priorities, restrictions, or simply specify some characteristics of the syntax definition.

Sort declarations

Sorts are declared by listing their name in the appropriate sorts section, which have the following forms. For context-free sorts:

context-free sorts

  <Sort>*

For lexical sorts:

lexical sorts

  <Sort>*

SDF3 also supports kernel sorts:

sorts

  <Sort>*
Note: Kernel sorts should be suffixed with -CF or -LEX, depending on whether they are context-free sorts or lexical sorts. When a sort in a sorts block does not have a suffix, it is treated as a context-free sort.

Writing a sort in these sections only indicates that a sort has been declared, even if it does not have any explicit production visible.

Start symbols

The lexical or context-free start symbols sections explicitly define the symbols which will serve as start symbols when parsing terms. If no start symbols are defined it is not possible to recognize terms. This has the effect that input sentences corresponding to these symbols can be parsed. So, if we want to recognize boolean terms we have to define explicitly the sort Boolean as a start symbol in the module Booleans. Any symbol and also lists, optionals, etc., can serve as a start-symbol. A definition of lexical start symbols looks like

lexical start-symbols

  <Symbol>*

while context-free start symbols are defined as:

context-free start-symbols

  <Symbol>*

SDF3 also supports kernel start-symbols:

start-symbols

  <Symbol>*

In contrast to lexical and kernel start-symbols, context-free start symbols can be surrounded by optional layout. A lexical start-symbol should have been defined by a production in the lexical syntax; a context-free symbol should have been defined in the context-free syntax. Both symbols can also be defined in kernel syntax using the suffix -LEX or -CF.

Lexical syntax

The lexical syntax usually describes the low level structure of programs (often referred to as lexical tokens.) However, in SDF3 the token concept is not really relevant, since only character classes are terminals. The lexical syntax sections in SDF3 are simply a convenient notation for the low level syntax of a language. The LAYOUT symbol should also be defined in a lexical syntax section. A lexical syntax consists of a list of productions.

Lexical syntax is described as follows:

lexical syntax

  <Production>*

An example of a production in lexical syntax:

lexical syntax

  BinaryConst = [0-1]+

Context-free syntax

The context-free syntax describes the more high-level syntactic structure of sentences in a language. A context-free syntax contains a list of productions. Elements of the right-hand side of a context-free production are pre-processed in a normalization step before parser generation that adds the LAYOUT? symbol between any two symbols. Context-free syntax has the form:

context-free syntax

  <Production>*

An example production rule:

context-free syntax

  Block.Block = "{" Statement* "}"

SDF3 automatically allows for layout to be present between the symbols of a rule. This means that a fragment such as:

{

}

will still be recognized as a block (assuming that the newline and line-feed characters are defined as layout).

Kernel syntax

The rules from context-free and lexical syntax are translated into kernel syntax by the SDF3 normalizer. When writing kernel syntax, one has more control over the layout between symbols of a production.

As part of normalization, among other things, SDF3 renames each symbol in the lexical syntax to include the suffix -LEX and each symbol in the context-free syntax to include the suffix -CF. For example, the two productions above written in kernel syntax look like

syntax

  Block-CF.Block  = "{" LAYOUT?-CF Statement*-CF LAYOUT?-CF "}"
  BinaryConst-LEX = [0-1]+

Literals and character-classes are lexical by definition, thus they do not need any suffix. Note that each symbol in kernel syntax is uniquely identified by its full name including -CF and -LEX. That is, two symbols named Block-CF and Block are different, if both occur in kernel syntax. However, Block-CF is the same symbol as Block if the latter appears in a context-free syntax section.

As mentioned before, layout can only occur in between symbols if explicitly specified. For example, the production

syntax

  Block-CF.Block  = "{" Statement*-CF LAYOUT?-CF "}"

does not allow layout to occur in between the opening bracket and the list of statements. This means that a fragment such as:

{
  x = 1;
}

would not be recognized as a block.

Productions

The basic building block of syntax sections is the production. The left-hand side of a regular production rule can be either just a symbol or a symbol followed by . and a constructor name. The right-hand side consists of zero or more symbols. Both sides are separated by =:

<Symbol>               = <Symbol>*
<Symbol>.<Constructor> = <Symbol>*

A production is read as the definition. The symbol on the left-hand side is defined by the right-hand side of the production.

Productions are used to describe lexical as well as context-free syntax. Productions may also occur in priority sections, but might also be referred to by its <Symbol>.<Constructor>. All productions with the same symbol together define the alternatives for that symbol.

Attributes

The definition of lexical and context-free productions may be followed by attributes that define additional (syntactic or semantic) properties of that production. The attributes are written between curly brackets after the right-hand side of a production. If a production has more than one attribute they are separated by commas. Attributes have thus the following form:

<Sort>               = <Symbol>* { <Attribute1>, <Attribute2>, ...}
<Sort>.<Constructor> = <Symbol>* { <Attribute1>, <Attribute2>, ...}

The following syntax-related attributes exist:

  • bracket is an important attribute in combination with priorities. For example, the sdf2parenthesize tool uses the bracket attribute to find productions to add to a parse tree before pretty printing (when the tree violates priority constraints). Note that most of these tools demand the production with a bracket attribute to have the shape: X = "(" X ")" {bracket} with any kind of bracket syntax but the X being the same symbol on the left-hand side and the right-hand side. The connection with priorities and associativity is that when a non-terminal is disambiguated using either of them, a production rule with the bracket attribute is probably also needed.
  • left, right, non-assoc, assoc are disambiguation constructs used to define the associativity of productions. See associativity.
  • prefer and avoid are deprecated disambiguation constructs to define preference of one derivation over others. See preferences.
  • reject is a disambiguation construct that implements language difference. It is used for keyword reservation. See rejections.

2.4. Templates

Templates are a major change in SDF3 when comparing to SDF2. They are essential when aiming to generate a nice pretty printer or generate proper syntactic code completion templates. When generating such artifacts, a general production simply introduces a whitespace in between symbols.

For example, when writing a grammar rule

Statement.If = "if" "(" Exp ")" Exp "else" Exp

and pretty printing a valid program, we would get the text in a single line separated by spaces, as:

../../../../../_images/pp-no-template.png

Furthermore, code completion would consider the same indentation when inserting code snippets.

However, when using template productions such as

Statement.If = <
  if (<Exp>)
    <Exp>
  else
    <Exp>>

We would get the following program.

../../../../../_images/pp-template.png

Again, code completion would also consider this indentation for proposals.

That is, in template productions, the surrounding layout is used to nicely pretty print programs and its code completion suggestions.

Template Productions

Template productions are an alternative way of defining productions. Similarly, they consist of a left-hand side and a right-hand side separated by =. The left-hand side is the same as for productive rules. The right-hand side is a template delimited by < and >. The template can contain zero or more symbols:

<Sort>               = < <Symbol>* >
<Sort>.<Constructor> = < <Symbol>* >

Alternatively, square brackets can be used to delimit a template:

<Sort>               = [ <Symbol>* ]
<Sort>.<Constructor> = [ <Symbol>* ]

The symbols in a template can either be placeholders or literal strings. It is worth noting that:

  • placeholders need to be enclosed within the same delimiters (either <...> or [...]) as the template ;
  • literal strings need not not be enclosed within quotation marks;
  • literal strings are tokenized on space characters (whitespace, tab);
  • additionally, literal strings are tokenized on boundaries between characters from the set given by the tokenize option, see the tokenize template option;
  • placeholders translate literally. If a separator containing any layout characters is given, the placeholder maps to a list with separator that strips the layout.

An example of a template rule:

Exp.Addition = < <Exp> + <Exp> >

Here, the + symbol is a literal string and <Exp> is a placeholder for sort Exp.

Placeholders are of the form:

  • <Sort?>: optional placeholder
  • <Sort*>: repetition (0…n)
  • <Sort+>: repetition (1…n)
  • <{Sort ","}*>: repetition with separator

Case-insensitive Literals

As we showed before, SDF3 allows defining case-insensitive literals as single-quoted strings in regular productions. For example:

Exp.If = 'if' "(" Exp ")" Exp 'else' Exp

accepts case-insensitive keywords for if and else such as if, IF, If, else, ELSE or ELsE. However, to generate case-insensitive literals from template productions, it is necessary to add annotate these productions as case-insensitive. For example, a template production:

Exp.If = <
   if(<Exp>)
     <Exp>
   else
     <Exp>
> {case-insensitive}

accepts the same input as the regular production mentioned before.

Moreover, lexical symbols can also be annotated as case-insensitive to parse as such. The constructed abstract syntax tree contains lower-case symbols, but the original term is preserved via origin-tracking. For example:

ID = [a-zA-z][a-zA-Z0-9]* {case-insensitive}

can parse foo, Foo, FOo, fOo, foO, fOO or FOO. Whichever option generates a node "foo" in the abstract syntax tree. By consulting the origin information on this node, it is possible to know which term was used as input to the parser.

Template options

Template options are options that are applied to the current file. A template options section is structured as follows:

template options

  <TemplateOption*>

Multiple template option sections are not supported. If multiple template option sections are specified, the last one is used.

There are three kinds of template options.

keyword

Convenient way for setting up lexical follow restrictions for keywords. See the section on follow restrictions for more information. The structure of the keyword option is as follows:

keyword -/- <Pattern>

This will add a follow restriction on the pattern for each keyword in the language. Keywords are automatically detected, any terminal that ends with an alphanumeric character is considered a keyword.

Multiple keyword options are not supported. If multiple keyword options are specified, the last one is used.

Note that this only sets up follow restrictions, rejection of keywords as identifiers still needs to be written manually.

tokenize

Specifies which characters may have layout around them. The structure of a tokenize option is as follows:

tokenize : "<Character*>"

Consider the following grammar specification:

template options

  tokenize : "("

context-free syntax

  Exp.Call = <<ID>();>

Because layout is allowed around the ( and ) characters, there may be layout between () and ; in the template rule. If no tokenize option is specified, it defaults to the default value of ().

Multiple tokenize options are not supported. If multiple tokenize options are specified, the last one is used.

reject

Convenient way for setting up reject rules for keywords. See the section on rejections for more information. The structure of the reject option is as follows:

Symbol = keyword {attrs}

where Symbol is the symbol to generate the rules for. Note that attrs can be include any attribute, but by using reject, reject rules such as ID = "true" {reject} are generated for all keywords that appear in the templates.

Multiple reject template options are not supported. If multiple reject template options are specified, the last one is used.

2.5. Disambiguation

As we showed before, the semantics of SDF3 can be seen as two-staged. First, the grammar generates all possible derivations. Second, the disambiguation constructs remove a number of derivations that are not valid. Note that SDF3 actually performs some disambiguation when generating the parse table or during parsing.

Rejections

Rejections filter derivations. The semantics of a rejection is that the set of valid derivations for the left-hand side of the production will not contain the construction described on the right-hand side. In other words, the language defined by the sort on the left-hand side has become smaller, removing all the constructions generated by the rule on the right-hand side. Disambiguation by reject occurs at parse time (mostly).

A rule can be marked as rejected by using the attribute {reject} after the rule:

<Sort> = ... {reject}

The {reject} attribute works well for lexical rejections, especially keyword reservation in the form of productions like:

ID = "keyword" {reject}

Preferences

The preferences mechanism is another disambiguation filter that provides a post parse filter to parse forests. The attributes prefer and avoid are the only disambiguation constructs that compare alternative derivations after parsing.

Warning

prefer and avoid are deprecated and will be removed in a future version of Spoofax.

The following definition assumes that derivations are represented using parse forests with “packaged ambiguity nodes”. This means that whenever in a derivation there is a choice for several sub-derivations, at that point a special choice node (ambiguity constructor) is placed with all alternatives as children. We assume here that the ambiguity constructor is always placed at the location where a choice is needed, and not higher (i.e. a minimal parse forest representation). The preference mechanism compares the top nodes of each alternative:

  • All alternative derivations that have avoid at the top node will be removed, but only if other alternatives derivations are there that do not have avoid at the top node.
  • If there are derivations that have prefer at the top node, all other derivations that do not have prefer at the top node will be removed.

The preference attribute can be used to handle the case when two productions can parse the same input. Here is an example:

Exp.FunctionApp = <<Expr> <Expr*>>
Exp.Constructor = <<ID> <Expr>>  {prefer}

Priorities

Priorities are one of SDF3’s most often used disambiguation constructs. A priority section defines the relative priorities between productions. Priorities are a powerful disambiguation construct because it occurs at parse generation time. The idea behind the semantics of priorities is that productions with a higher priority “bind stronger” than productions with a lower priority. The essence of the priority disambiguation construct is that certain parse trees are removed from the ‘forest’ (the set of all possible parse trees that can be derived from a segment of code). The basic priority syntax looks like this:

context-free priorities

  <ProductionRef> >  <ProductionRef>

Where <ProductionRef> can either be <Sort>.<Cons> or the entire production itself.

Several priorities in a priority grammar are separated by commas. If more productions have the same priority they may be grouped between curly braces on each side of the > sign.

context-free priorities

  {<ProductionRef> <ProductionRef>}
                >  <ProductionRef>,
   <ProductionRef>
                >  <ProductionRef>

By default, the priority relation is automatically transitively closed (i.e. if A > B and B > C then A > C). To specify a non-transitive priority relation it is necessary to include a dot before the > sign (.>).

SDF3 provides safe disambiguation, meaning that priority relations only remove ambiguous derivations. Furthermore, SDF3 also allows tree filtering by means of indexed priorities such as:

context-free priorities

  <ProductionRef> <idx> >  <ProductionRef>

where the symbol at position idx (starting with 0) in the first production should not derive the second production.

An example defining priorities for the addition, subtraction and multiplication operators is listed below. Because addition and subtraction have the same priority, the are grouped together between brackets.

context-free priorities

  {Exp.Times} >
  {Exp.Plus Exp.Minus}

Associativity

Like with priorities, the essence of the associativity attribute is that certain parse trees are removed from the ‘forest’.

  • The left associativity attribute on a production P filters all occurrences of P as a direct child of P in the right-most argument. This implies that left is only effective on productions that are recursive on the right (as in A B C -> C).
  • The right associativity attribute on a production P filters all occurrences of P as a direct child of P in the left-most argument. This implies that right is only effective on productions that are recursive on the left ( as in C A B -> C).
  • The non-assoc associativity attribute on a production P filters all occurrences of P as a direct child of P in any argument. This implement that non-assoc is only effective if a production is indeed recursive (as in A C B -> C).
  • The assoc attribute means the same as left

Associativity declarations occur in two places in SDF3. The first is as production attributes. The second is as associativity declarations in priority groups.

An example on how to mention associativity as a production attribute is given below:

Exp.Plus = <<Exp> + <Exp>> {left}

In priority groups, the associativity has the same semantics as the associativity attributes, except that the filter refers to more nested productions instead of a recursive nesting of one production. The group associativity attribute works pairwise and commutative on all combinations of productions in the group. If there is only one element in the group the attribute is reflexive, otherwise it is not reflexive.

context-free priorities

  {left: Exp.Times} >
  {left: Exp.Plus Exp.Minus}

Restrictions

The notion of restrictions enables the formulation of lexical disambiguation strategies. Examples are “shift before reduce” and “longest match”. A restriction filters applications of productions for certain non-terminals if the following character (lookahead) is in a certain class. The result is that specific symbols may not be followed by a character from a given character class. A lookahead may consist of more than one character class (multiple lookahead). Restrictions come in two flavors:

  • lexical restrictions that apply to lexical non-terminals
  • context-free restrictions that apply to context-free non-terminals.

The general form of a restriction is:

<Symbol>+ -/- <Lookaheads>

The semantics of a restriction is to remove all derivations that produce a certain <Symbol>. The condition for this removal is that the derivation tree for that symbol is followed immediately by something that matches the lookahead declaration. Note that to be able to check this condition, one must look past derivations that produce the empty language, until the characters to the right of the filtered symbol are found. Also, for finding multiple lookahead matches, one must ignore nullable sub-trees that may occur in the middle of the matched lookahead.

In case of lexical restrictions <Symbol> may be either a literal or sort. In case of context-free restrictions only a sort or symbol is allowed. The restriction operator -/- should be read as may not be followed by. Before the restriction operator -/- a list of symbols is given for which the restriction holds.

As an example, the following restriction rule implements the “longest match” policy: an identifier can not be followed by an alpha-numeric character.

ID -/- [a-zA-Z0-9\_]

Layout-sensitive parsing

SDF3 supports definition of layout sensitive syntax by means of layout constraints. While we haven’t covered this feature in this documentation, the paper [S2] describes the concepts.

Layout Declarations

In the paper [S2], the authors describe layout constraints in terms of restrictions involving the position of the subtree involved in the constraint (0, 1, …), token selectors (first, left, last and right), and position selectors as lines and columns (line and col). This mechanism allows writing layout constraints to express alignment, offside and indentation rules, but writing such constraints is rather cumbersome and error prone. Alternatively, one may write layout constraints using layout declarations, which are more declarative specifications and abstract over lines, columns and token selectors as the original layout constraints from [S2]. Note: if you want to use layout declarations, you should specify the jsglr-version: layout-sensitive parameter for SDF3, see JSGLR configuration.

  • tree selectors

To specify which trees should be subject to a layout constraint, one may use: tree positions, SDF3 labeled non-terminals, or unique literals that occurs in the production. For example:

context-free syntax

  Stmt.IfElse = "if" Exp "then" Stmts "else" else:Stmts  {layout(
     indent "if" 3, else &&
     align 3 else &&
     align "if" "else"
  )}

In the layout constraint for the production above, else refers to the tree for the labeled non-terminal else:Stmts, "if" refers to the tree corresponding to the "if" literal and the number 3 correspond to the tree at position 3 in the parse tree (starting at 0, ignoring trees for LAYOUT?).

  • align

The layout constraint layout(align x y1, ..., yn) specifies that the trees indicated by the tree selectors yi should be aligned with the tree indicated by the tree selector x, i.e., all these trees should start in the same column. For example, if we consider the production above, the following program is correct according to the align constraints:

if x < 0 then
··x = 0
else
··y = 1

Whereas, the following program is incorrect because neither the if and else keyword align (align "if" "else"), nor the statements in the branches (align 3 else):

if x < 0 then
··x = 0
·else
···y = 1
  • align-list

The constraint align-list can be used to indicate that all subtrees within a list should be aligned. That is, a constraint layout(align-list x), where x is a tree selector for a list subtree, can be used to enforce such constraint. For example, consider the following production and its layout constraint:

context-free syntax

  Stmt.If = "if" Exp "then" then:Stmt*  {layout(
     align-list then
  )}

This constraint indicates that statements inside the list should be aligned. Therefore, the following program is correct according to this constraint:

if x < 0 then
··x = 0
··y = 4
··z = 2

And the following program is invalid, as the second statement is misaligned:

if x < 0 then
··x = 0
···y = 4
··z = 2
  • offside

The offside rule is very common in layout-sensitive languages. It states that all lines after the first one should be further to the right compared to the first line. For a description of how the offside rule can be modelled with layout constraints, refer to [S2]. An example of a declarative specification of the offside rule can be seen in the production below:

context-free syntax

  Stmt.Assign = <<ID> = <Exp>> {layout(offside 3)}

The layout constraint specifies that when the expression in the statement spams multiple lines, all following lines should be indented with respect to the column where the expression started. For example, the following program is valid according to this constraint:

x = 4 * 10
·····+ 2

However, the following program is not valid, as the second line of the expression starts at the same column as the first line:

x = 4 * 10
····+ 2

Note that if the expression is written on a single line, the constraint is also verified. That is, the following program successfully parses:

x = 4 * 10 + 2

It is also possible to use the offside relation on different trees. For example, consider the constraint in the following production:

context-free syntax

  Stmt.If = "if" Exp "then" then:Stmt*  {layout(
     offside "if" then
  )}

This constraint states that all lines (except the first) of the statements in the then branch should be indented with respect to the if literal. Thus, the following program is invalid according to this layout constraint, because the statement x = 2 should be indented with relation to the topmost if.

if x < 0 then
··if y < 0 then
x = 2

In general, an offside constraint involving more than a single tree is combined with indent constraint to enforce that the column of the first and all subsequent lines should be indented.

  • indent

An indent constraint indicates that the column of the first line of a certain tree should be further to the right with respect to another tree. For example, consider the following production:

context-free syntax

  Stmt.If = "if" Exp "then" then:Stmt*  {layout(
     indent "if" then
  )}

This constraint indicates that the first line of the list of statements should be indented with respect to the if literal. Thus, according to this constraint the following program is valid:

if x < 0 then
··x = 2

Note that if the list of statements in the then branch spams multiple lines, the constraint does not apply to its subsequent lines. For example, consider the following program:

if x < 0 then
··x = 2 + 10
* 4
y = 3

This program is still valid, since the column of the first line of the first assignment is indented with respect to the if literal. To indicate that the first and all subsequent lines should be indented, an offside constraint should also be included.

context-free syntax

  Stmt.If = "if" Exp "then" then:Stmt*  {layout(
     indent "if" then &&
     offside "if" then
  )}

With this constraint, the remainder of the expression * 4 should also be further to the right compared to the “if” literal. The following program is correct according to these two constraints, since the second line of the first assignment and the second assignment are also indented with respect to the if literal:

if x < 0 then
··x = 2 + 10
·* 4
·y = 3

Finally, all these layout declarations can be ignored by the parser and used only when generating the pretty-printer. To do that, prefix the constraint with pp- writing, for example, pp-offside or pp-align.

Todo

Part of this documentation is not yet written.