How do I generate sentences from a formal grammar?

Mark Cidade picture Mark Cidade · Mar 2, 2009 · Viewed 7.7k times · Source

What's a common way of generating sentences from a grammar?

I want an algorithm that's sort of the opposite of a parser. That is, given a formal context-free grammar (say LL), I want to generate an arbitrary sentence that conforms to that grammar. I use sentence here to mean any valid body of text, so it can actually be a whole program (even if it doesn't make any sense—as long as it's syntactially correct).

Example grammar:

program   : <imports> NEWLINE? <namespace>
imports   : ("import" <identifier> NEWLINE)* 
namespace : "namespace " <identifier> NEWLINE "{" <classes> "}" 
identifier: (A-Za-z_) (A-Za-z0-9_)*
...

Example generated program:

import jkhbhhuob
import aaaaa888_

namespace u8nFGubgykb
{ class ui0op_np { ... }
}

Answer

Bj&#246;rn Lindqvist picture Björn Lindqvist · Jul 20, 2010

Here is a Python example using the NLTK:

from nltk import parse_cfg, ChartParser
from random import choice

def produce(grammar, symbol):
    words = []
    productions = grammar.productions(lhs = symbol)
    production = choice(productions)
    for sym in production.rhs():
        if isinstance(sym, str):
            words.append(sym)
        else:
            words.extend(produce(grammar, sym))
    return words

grammar = parse_cfg('''
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
V -> 'shot' | 'killed' | 'wounded'
Det -> 'an' | 'my' 
N -> 'elephant' | 'pajamas' | 'cat' | 'dog'
P -> 'in' | 'outside'
''')

parser = ChartParser(grammar)

gr = parser.grammar()
print ' '.join(produce(gr, gr.start()))

The example is adapted from the book. The sentences generated are syntactically correct but still total gibberish.