Building the AST

Parse::Eyapp facilitates the construction of abstract syntax trees (AST) through the %tree directive. Nodes in the AST are blessed in the production name. A rhs can be named using the %name IDENTIFIER directive. For each rhs name a class/package with name IDENTIFIER is created.

Symbolic tokens (like NUM PRINT or VAR) are considered by default semantic tokens. String literals (like '+', '/', etc.) are - unless explictly declared using the semantic token directive - considered syntactic tokens. When building the AST syntactic tokens do not yield new nodes. Semantic tokens however have their own. Thus when feed with input b=2*a the generated parser produces the following AST2:

EXPS(
  ASSIGN(
    VAR(TERMINAL[b]),
    TIMES(
      NUM(TERMINAL[2]),
      VAR(TERMINAL[a]))
  )
)
Nodes of the AST are hashes that can be decorated with new keys/attributes. The only reserved field is children which is a reference to the array of children. Nodes named TERMINAL are built from the tokens provided by the lexical analyzer. The couple ($token, $attribute) returned by the lexical analyzer is stored under the keys token and attr. TERMINAL nodes also have the attribute children which is set to an anonymous empty list. Observe the absence of TERMINAL nodes corresponding to tokens '=' and '*'. If we change the status of '*' and '=' to semantic using the %semantic token directive:
1   %semantic token '*' '='
2   %right  '='
3   ....  etc.
we get a - concrete - syntax tree:
EXPS(
  ASSIGN(
    VAR(TERMINAL[b]),
    TERMINAL[=],
    TIMES(
      NUM(TERMINAL[2]),
      TERMINAL[*],
      VAR(TERMINAL[a])
    ) # TIMES
  ) # ASSIGN
)
Let us now consider the input 2*(a+1). The parser yields the tree:
EXPS(
  TIMES(
    NUM(
     TERMINAL[2]),
     exp_14(
       PLUS(
         VAR(TERMINAL[a]),
         NUM(TERMINAL[1]))
       ) # PLUS
  ) # TIMES
)
Two features are noticeable: the parenthesis rule exp: '(' exp ')' had no name and got automatically one: exp_14. The name of a rhs by default results from concatenating the left hand side of the rule with the ordinal number of the rule3. The second is that node exp_14 is useless and can be suppressed.

The %tree directive can be accompanied of the %bypass clause. A %tree bypass produces an automatic bypass of any node with only one child at tree-construction-time. A bypass operation consists in returning the only child of the node being visited to the father of the node and re-typing (re-blessing) the node in the name of the production4.

Changing the line %tree by %tree bypass in file Infix.eyp we get a more suitable AST for input 2*(a+1):

EXPS(TIMES(NUM[2],PLUS(VAR[a],NUM[1])))

The node exp_14 has disapeared in this version since the bypass operation applies to the rhs of the rule exp: '(' exp ')': Tokens '(' and ')' are syntactic tokens and therefore at tree construction time only one child is left. Observe also the absence of TERMINAL nodes. Bypass clearly applies to rules exp: NUM and exp: VAR since they have only one element on their rhs. Therefore the TERMINAL node is re-blessed as NUM and VAR respectively.

A consequence of the global scope application of %tree bypass is that undesired bypasses may occur. Consider the tree rendered for input -a*2:

EXPS(TIMES(NEG,NUM))

What happened? The bypass is applied to the rhs '-' exp. Though the rhs has two symbols, token '-' is a syntactic token and at tree-construction-time only exp is left. The bypass operation applies when building this node. This undesired bypass can be avoided applying the no bypass directive to the production:

 exp : %no bypass NEG
       '-' exp %prec NEG
Now the AST for -a*2 is correct:
EXPS(TIMES(NEG(VAR),NUM))

Eyapp provides operators +, * and ? for the creation of lists and optionals as in:

line: sts <EXPS + ';'>
which states that a line is made of a non empty list of EXPS separated by semicolons. By default the class name for such list is _PLUS_LIST. The %name directive can be used to modify the default name:
line: sts <%name EXPS + ';'>

Explicit actions can be specified by the programmer. They are managed as anonymous subroutines that receive as arguments the attributes of the symbols in the rule and are executed each time a reduction by that rule occurs. When running under the %tree directive this provides a mechanism to influence the shape of the AST. Observe however that the grammar in the example is clean of actions: Parse::Eyapp allowed us to produce a suitable AST without writing any explicit actions.

Procesadores de Lenguaje 2007-03-01