I am seeking for a detailed algorithm describing how to generate leading and trailing sets for non-terminal symbols in context-free grammars.
I've found something like this: https://pl.scribd.com/doc/51358638/16/Operator-Precedence-Relations but I'm unsure about how it works. (See page 20)
Assume, that we have productions:
A -> YaZ | B
B -> b
then, it is said, that Leading(A) = {a}, Leading(A) = Leading(B) and Leading(B)={b}. And I have my doubts about this:
Leading
and Trailing
are functions specific to generating an operator-precedence parser, which is only applicable if you have an operator precedence grammar. An operator precedence grammar is a special case of an operator grammar, and an operator grammar has the important property that no production has two consecutive non-terminals.
(An operator precedence grammar is, loosely speaking, an operator grammar which can be parsed with an operator precedence parser :-). But that's not important for now.)
Given an operator grammar, the function Leading
(resp. Trailing
) of a non-terminal produces the set of terminals which could be (recursively) the first (resp. last) terminal in some sentential form derived from that non-terminal.
Another possibly more intuitive definition is that a terminal is in the Leading set for a non-terminal if the terminal is "visible" from the beginning of a production. Non-terminals are "transparent", so a terminal could be visible either by looking through a leading non-terminal or by looking into a visible non-terminal.
For example, a standard expression grammar (which is an operator grammar; no production has two consecutive non-terminals):
expr -> factor '*' expr
expr -> factor
factor -> term '+' factor
factor -> term
term -> ID
term -> '(' expr ')'
From term
, ID
and (
are visible from the beginning, and ID
and )
are visible from the end. expr
is not visible from either side, because it is hidden by terminals, so we don't need to consider it.
From factor
, +
is visible from both ends, and factor
also inherits the Leading and Trailing sets of term
because term
is visible from both ends. (factor
is also visible from the end in itself, but that cannot add anything new to the Trailing set.)
Finally, from expr
, *
is visible from both ends, and expr
inherits from factor
.
So we end up with:
Non-terminal Leading Trailing
expr *, +, ID, ( *, +, ID, )
factor +, ID, ( +, ID, )
term ID, ( ID, )
From that, we're going to construct precedence relations. Basically, if you find
nonterminal TERMINAL
in any production, then you add the precedence relations TRAIL ⋗ TERMINAL
for every TRAIL
in Trailing(nonterminal)
. Similarly, every occurrence of
TERMINAL nonterminal
generates the relationships TERMINAL ⋖ LEAD
for every LEAD
in Leading(nonterminal)
. And finally, if you find
TERMINAL1 TERMINAL2
or
TERMINAL1 nonterminal TERMINAL2
then you generate the relationship TERMINAL1 ·=· TERMINAL2
.
Once you've generated all the precedence relationships, you look at every pair of terminals T, U
. If at most one precedence relation holds -- that is, T ⋖ U, T ⋗ U, T ·=· U
or there is no relationship from T
to U
-- then you have an operator precedence grammar. (There is no connection between T, U
and U, T
. Precedence relationships are not antisymmetric and it is unfortunate that they are traditionally spelled with symbols that look like numeric comparison.)