I have trouble understanding how to compute the lookaheads for the LR(1)-items.
Lets say that I have this grammar:
S -> AB
A -> aAb | a
B -> d
A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:
S -> .AB , {lookahead}
A -> .aAb, {lookahead}
A -> .a, {lookahead}
State: 1
A -> a.Ab, {lookahead}
A -> a. ,{lookahead}
A -> .aAb ,{lookahead}
A ->.a ,{lookahead}
Can somebody explain how to compute the lookaheads ? What is the general approach ?
Thank you in advance
The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form
S -> .w ($)
for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.
Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).
Let's try this out on your grammar. We start off by creating an item set containing
S -> .AB ($)
Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:
A -> a.Ab (d)
A -> a. (d)
Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:
[0]
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
[1]
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
[2]
A -> a.Ab (b)
A -> a. (b)
A -> .aAb (b)
A -> .a (b)
[3]
A -> aA.b (d)
[4]
A -> aAb. (d)
[5]
S -> A.B ($)
B -> .d ($)
[6]
B -> d. ($)
[7]
S -> AB. ($)
[8]
A -> aA.b (b)
[9]
A -> aAb. (b)
In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!
Hope this helps!