Since I have just understood what this is all about, and it’s fresh in my mind I’m writing this WAY ahead of time if I ever get this far in describing the dragon book. Anyways.

I have had problems understanding the LR1 closure set algorithm as described in every book, every googling result and every powerpoint presentation I’ve found while googling. So now that I’ve understood it, I’m going to describe it for the greater benefit of anyone else googling it. Take this grammar:

- ‘S -> S
- S -> L = R
- S -> R
- L -> * R
- L -> id
- R -> L

This is the books example of a non SLR grammar. It is a whole lot better than the grammar used to describe the algorithm, since with that grammar you can implement LR1 closure wrong and still get it to run (which I did). Anyway, here is the correct I0 closure set for this grammar

- ‘S -> •S, $
- S -> •L = R, $
- S -> •R, $
- L -> •* R, =/$
- L -> •id, =/$
- R -> •L , $

I’m assuming you understand how to do a LR0 closure and that you have the FIRST function implemented. You’ll also need to know for every terminal if it is nullable. So the main difference which is so vaguely described in the book is how you do to carry the proper lookahead symbols around with you at all times. Starting with the augmented grammars start state ‘S -> •S you first add an end of input token to it ($). Now you do the closure algorithm. This is done like this if done without any optimizations whatsoever:

- Create an output list of LR1 states, add your input state to it. We call this list Closure
- For each LR1Item in Closure
- Look at the item to the right of the dot, lets call it SymbolRightOfDot. If there is no symbol, go to step 2 for the next item in our Closure list
- To generate the lookaheads, we are going to iterate through the symbols in the production rule in our Lr1Item that we are generating closure for. Starting with the item
*one step to the right*of SymbolRightOfDot. If there is such a symbol add the contents of FIRST for the symbol right of the dot. If the symbol was nullable, continue with the next symbol else exit the loop - If the previous loop looped through to the end of the production rule without encountering any non-nullable symbols then you add every previously computed lookahead of the Lr1Item that you were investigating to the list
- Ok, so now you have all the lookaheads find every rule in the grammar that has SymbolRightOfDot as the resulting symbol. Add a LR1 item to Closure with the dot at the beginning, the production rule and every lookahead to output if a corresponding Lr1Item hasn’t been already added.
- Repeat from step 2 until the list is stable, i.e. when nothing gets added any more

Note that when you calculate LR1 items you cannot as you can when generating SLR grammars use a set of nonterminals to determine if you have investigated a nonterminal and added all the new items to the closure. If you do that you’ll generate incorrect lookaheads but the damn example in the dragon book *will work*.

Let’s do this for the example grammar. For reference, the contents of FIRST are S={*,id}, L={*,id}, R={*,id}.

None of the symbols are nullable.

Starting with the first rule, we generate an LR1 item. It looks like this ‘S->•S, $. SymbolRightOfDot is S. Doing the lookahead calculations means looking at the symbol one step to the right of SymbolRightOfDot. There is no such symbol. This means we’ve iterated to the end of the list of production symbols and need to add everything in the lookahead list of the LR1 item to our new lookahead list. This list now contains only one symbol, $.

As per step 6, we find every rule that has S as the result. There are two of those, S -> L = R and S -> R. We now add those to Closure with the lookahead that we’ve found. So in closure we have this:

- ‘S -> •S, $
- S -> •L = R, $
*new* - S -> •R, $
*new*

We added things this time, so we repeat every step again. Item 1 doesn’t yield anything new. Item 2 is yields something. Doing the lookahead, we look at the symbol to the right of SymbolRightOfDot. In this case it’s an equals sign (=). This is a terminal, those are never nullable, so we add = to the list of lookaheads. Note that we do not add $ to the lookahead list since we never reached the end of the rule. Again, as per step 6, we find every rule that results in a L symbol, which is the symbol to the right of the dot of rule 2. There are two of those, L -> * R and L -> id. We can now add two new items to our output which looks like this:

- ‘S -> •S, $
- S -> •L = R, $
- S -> •R, $
- L -> •* R, =
*new* - L -> •id, =
*new*

Note that the rules 4 and 5 doesn’t yet have the $ lookahead that they will have when we are done.

Since we added stuff, we continue again from the top. Items 1 and 2 doesn’t give any new items. Item 3 however does. We look at the symbol to the right of the dot, in this case it is an R. To the right of the R we find nothing. So we add $ as a lookahead. There is only one rule with R as the result: R -> L. This gives us one new rule to add, and our output list now looks like this:

- ‘S -> •S, $
- S -> •L = R, $
- S -> •R, $
- L -> •* R, =
- L -> •id, =
- R -> •L , $
*new*

Continuing once more from the top nothing yields anything except rule 6. At the right of the dot is symbol L but nothing after that, so the only lookahead is $. Notice that we are re-investigating L now but with a different lookahead! Looking for grammar rules that results in L gives us the same two items again, L -> * R and L -> id. So when we are generating the LR1 items, we will find two items that already exists, but they are missing the $ lookahead. Adding the lookahead to them results in the final list. Doing the algorithm again results in no more items.

- ‘S -> •S, $
- S -> •L = R, $
- S -> •R, $
- L -> •* R, =/$
*new $ lookahead!* - L -> •id, =/$
*new $ lookahead!* - R -> •L , $

I hope this makes things clearer for the algorithm interested reader. If I’ve made any mistakes in this, please do let me know. I do not want to spread disinformation if I can avoid it.

that was very helpful! Thanks!

Thank you very much for writing this, I have an exam tomorrow and it helped A LOT!

Glad to help! The algorithm really isn’t that hard, and though I doubt that I have made the best possible job in the world describing it there are quite few descriptions of this that isn’t exactly the same material as found in the venerable dragon book itself. And all of them use mathematical terms which are at least in my book quite hard to grasp. Good luck on your exam!

Thanks for the great explanation.

Theres only one thing I don’t understand – what to do with the lookaheads when calculating successor states. Any chance you could enlighten me?

I assume you mean the result of the GOTO operation?

Doing a goto is essentially just taking the subset of a set of LR1 states that has a given symbol to the right of the dot and advancing the dot one step. This gives you a new set of LR1 items. In this operation you don’t alter the lookaheads, the lookahead that the items had when you did the goto are the same as after the goto. After you have the goto set you apply the same closure to that set to get a new LR1 item set.

The lookaheads only affect the closure algorithm to differentiate between sets of states that would have been the same in the context of a SLR parser.

In essence, the goto operation is exactly the same as it was when you did this with LR0 or SLR parsers.

I hope that made some sense to you.

Yes that was exactly what I meant.

I thought I should preserve the lookaheads across the goto but I couldn’t find anything to tell me conclusivly.

Thanks for the help