Compiler-compiler

Shen-YACC is a compiler-compiler based on earlier work by Dr Gyorgy Lajos on his METALISP Ph.D. project which was in turn based on even earlier work by Alexander Birman on TDPL parsing. The syntax of YACC is based on the BNF notation of Backus. Used at its most basic level, YACC is a generator for recognisors based on grammars. More usefully, YACC can be used to develop efficient compilers for programming languages and transducers for natural language processing.

YACC as a Recognisor Generator

Consider the following grammar.

<sent> := <np> <vp>
<np> := <det> <n> | <name>
<det> := the | a
<n> := cat | dog
<name> := Bill | Ben
<vp> := <vtrans> <np>
<vtrans> := likes | chases

In YACC, this grammar would be represented as:

Shen 2010, copyright (C) 2010 Mark Tarver
www.lambdassociates.org, version 1.8
running under Common Lisp, implementation: CLisp 2.49
port 1.0 ported by Mark Tarver


(0-)(defcc <sent>
<np> <vp>;)
warning: <np> <vp> has no semantics.
<sent>

(1-)(defcc <det>
the; a;)
warning: the has no semantics.
warning: a has no semantics.
<det>

(2-) (defcc <np>
<det> <n>;
<name>;)
warning: <det> <n> has no semantics.
warning: <name> has no semantics.
<np>

(3-)(defcc <n>
cat; dog;)
warning: cat has no semantics.
warning: dog has no semantics.
<n>

(4-) (defcc <name>
bill; ben;)
warning: bill has no semantics.
warning: ben has no semantics.

(5-) (defcc <vp>
<vtrans> <np>;)
warning: <vtrans> <np> has no semantics.
<vp>

(6-) (defcc <vtrans>
likes; chases;)
warning: likes has no semantics.
warning: chases has no semantics.
<vtrans>

If semantic actions (i.e instructions on how to process the input) are not included, YACC warns the user and inserts a default semantic action (semantic completion). This default action appends non-terminals and conses terminals to form an output. The spacing is left to the judgement of the programmer, but ;s separate rules. Unlike Shen, Shen-YACC gives no significance to symbols beginning in uppercase (i.e. Bill is just a symbol like cat, and not a variable). When one of these definitions (e.g. for <sent>) is entered to the top level, it is compiled into Common Lisp by YACC with the message warning; no semantics in <np> <vp>.

The compiler generated acts as a recogniser for sentences of the language characterised by this grammar. If it is not a sentence of the language, then the failure object (fail) is returned. If the input to the compiler belongs to this language, then (fail) is not returned by the compiler and without semantic actions the program behaves as an identity function. The compiler is invoked by the higher-order function compile, that receives the name of a YACC function and parses its second input by that function.

(10-) (compile (function <sent>) [the cat likes the dog]) 
[the cat likes the dog]

(11-) (compile (function <sent>)[the cat likes the canary])
parse error

(12-) (compile (function <vp>) [chases the cat])
[chases the cat]

Note that names of YACC functions should always be enclosed in angles. YACC is sensitive to left-recursion which will force an infinite regress. YACC code is not type checked, but the code can be tracked just like regular code. Lists are constructed in YACC using […] or cons or list or any of the conventional methods. Unlike Shen, the constructor | cannot be used in the syntax of an expansion (i.e. to the left of :=), though it can be used to the right (in a semantic action) to perform consing. However […] can be used to the left of :=. <bcs>, below, recognises the inputs belonging to [ b m ][ c n ]. Variables are uppercase as in Shen.

(16-) (defcc <bcs>
[<bs>] [<cs>];)
warning: [cons <bs> []] [cons <cs> []] has no semantics.
<bcs>

(17-)
(defcc <bs>
b <bs>;
b;)
warning: b <bs> has no semantics.
warning: b has no semantics.
<bs>

(18-)
(defcc <cs>
c <cs>;
c;)
warning: c <cs> has no semantics.
warning: c has no semantics.
<cs>

(19-) (compile <bcs> [[b b b] [c c]])
[[[b b b]] [[c c]]]

Semantic Actions in YACC

Semantic actions are attached to grammar rules by following each rule by a :=. This YACC definition receives a list of as and changes them to bs.

(20-) (defcc <as>
a <as> := [b | <as>];
a := [b];)
<as>

(21-) (compile <as> [a a a a a])
[b b b b b]

The first rule says that any input of the form a <as> is to be translated into an output consisting of b consed to the translation of <as>. The syntax of <as> requires that the input be a non-empty list of as. So (compile <as> [a a a]) gives [b b b]. The second rule is the base case.

As in Shen, round brackets signify function applications and square ones form lists. The following reformulation is an example:

(24-) (define question
NP VP -> (append [(protect Is) it true that] NP VP [?]))
question

(25-) (compile <sent> [the cat likes the dog])
[Is it true that the cat likes the dog ?]

Note returning (kill) as the value of a semantic action causes the rule in which it is embedded to fail and no following rules in the YACC definition are considered. This corrsponds to the cut and fail combination found in Prolog.

Reserved Non-Terminals, Pattern Matching

<!> and <e> are both reserved non-terminals. <e> always succeeds consuming none of the input and under semantic completion returns the empty list. <!> always succeeds and consumes all of the input and under semantic completion returns that remaining input.

Variables and wildcards are allowed to pattern match under Shen-YACC as in Shen and lists can be embedded in the input. The | notation is not used in the parsing (to the left of :=) but can occur to the right.

Further Reading