Computing leading and trailing sets for context-free grammar

TeoTN picture TeoTN · Feb 8, 2015 · Viewed 10.1k times · Source

I am seeking for a detailed algorithm describing how to generate leading and trailing sets for non-terminal symbols in context-free grammars.

I've found something like this: https://pl.scribd.com/doc/51358638/16/Operator-Precedence-Relations but I'm unsure about how it works. (See page 20)

Assume, that we have productions:

A -> YaZ | B

B -> b

then, it is said, that Leading(A) = {a}, Leading(A) = Leading(B) and Leading(B)={b}. And I have my doubts about this:

  1. Does it mean, that Leading(A) = {a, b} ? - I assume that in every step we do not replace the set already generated because of '=', but make sum of two sets.
  2. Does it mean, that Leading(B) is {b} or {a, b}? - Does the third rule mean, that we add Leading(B) to Leading(A) or that Leading(A) and Leading(B) are equivalent?

Answer

rici picture rici · Feb 8, 2015

Leading and Trailing are functions specific to generating an operator-precedence parser, which is only applicable if you have an operator precedence grammar. An operator precedence grammar is a special case of an operator grammar, and an operator grammar has the important property that no production has two consecutive non-terminals.

(An operator precedence grammar is, loosely speaking, an operator grammar which can be parsed with an operator precedence parser :-). But that's not important for now.)

Given an operator grammar, the function Leading (resp. Trailing) of a non-terminal produces the set of terminals which could be (recursively) the first (resp. last) terminal in some sentential form derived from that non-terminal.

Another possibly more intuitive definition is that a terminal is in the Leading set for a non-terminal if the terminal is "visible" from the beginning of a production. Non-terminals are "transparent", so a terminal could be visible either by looking through a leading non-terminal or by looking into a visible non-terminal.

For example, a standard expression grammar (which is an operator grammar; no production has two consecutive non-terminals):

expr   -> factor '*' expr
expr   -> factor
factor -> term '+' factor
factor -> term
term   -> ID
term   -> '(' expr ')'

From term, ID and ( are visible from the beginning, and ID and ) are visible from the end. expr is not visible from either side, because it is hidden by terminals, so we don't need to consider it.

From factor, + is visible from both ends, and factor also inherits the Leading and Trailing sets of term because term is visible from both ends. (factor is also visible from the end in itself, but that cannot add anything new to the Trailing set.)

Finally, from expr, * is visible from both ends, and expr inherits from factor.

So we end up with:

Non-terminal            Leading             Trailing
expr                    *, +, ID, (         *, +, ID, )
factor                  +, ID, (            +, ID, )
term                    ID, (               ID, )

From that, we're going to construct precedence relations. Basically, if you find

nonterminal TERMINAL

in any production, then you add the precedence relations TRAIL ⋗ TERMINAL for every TRAIL in Trailing(nonterminal). Similarly, every occurrence of

TERMINAL nonterminal

generates the relationships TERMINAL ⋖ LEAD for every LEAD in Leading(nonterminal). And finally, if you find

TERMINAL1 TERMINAL2

or

TERMINAL1 nonterminal TERMINAL2

then you generate the relationship TERMINAL1 ·=· TERMINAL2.

Once you've generated all the precedence relationships, you look at every pair of terminals T, U. If at most one precedence relation holds -- that is, T ⋖ U, T ⋗ U, T ·=· U or there is no relationship from T to U -- then you have an operator precedence grammar. (There is no connection between T, U and U, T. Precedence relationships are not antisymmetric and it is unfortunate that they are traditionally spelled with symbols that look like numeric comparison.)