Converting grammar to Chomsky Normal Form?

tehman picture tehman · Sep 19, 2011 · Viewed 23.7k times · Source

Convert the grammar below into Chomsky Normal Form. Give all the intermediate steps.

S -> AB | aB
A -> aab|lambda
B -> bbA

Ok so the first thing I did was add a new start variable S0

so now I have

S0 -> S
S -> AB | aB
A -> aab|lambda
B -> bbA

then I removed all of the lambda rules:

S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb

Then I checked for S->S and A->B type rules which did not exist. And that was the answer I came up with, do I need to do anything further or did I do anything wrong?

Answer

Nayuki picture Nayuki · Sep 19, 2011

Wikipedia says:

In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:

  • A -> BC, or
  • A -> α, or
  • S -> ε

where A, B, C are nonterminal symbols, α is a terminal symbol, S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.

Continuing your work:

S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb

Instead of using | to denote different choices, split a rule into multiple rules.

S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb

Create new rules Y -> a and Z -> b because we will need them soon.

S0 -> S
S -> AB
S -> aB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b

S -> aB is not of the form S -> BC because a is a terminal. So change a into Y:

S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> bb
Y -> a
Z -> b

Do the same for the B -> bb rule:

S0 -> S
S -> AB
S -> YB
S -> B
A -> aab
B -> bbA
B -> ZZ
Y -> a
Z -> b

For A -> aab, create C -> YY; for B -> bbA, create D -> ZZ:

S0 -> S
S -> AB
S -> YB
S -> B
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b

For S -> B, duplicate the one rule where S occurs on the right hand side and inline the rule:

S0 -> B
S0 -> S
S -> AB
S -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b

Deal with the rules S0 -> B and S0 -> S by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):

S0 -> DA
S0 -> ZZ
S0 -> AB
S0 -> YB
A -> CZ
C -> YY
B -> DA
D -> ZZ
B -> ZZ
Y -> a
Z -> b

And we're done. Phew!