What's a common way of generating sentences from a grammar?
I want an algorithm that's sort of the opposite of a parser. That is, given a formal context-free grammar (say LL), I want to generate an arbitrary sentence that conforms to that grammar. I use sentence here to mean any valid body of text, so it can actually be a whole program (even if it doesn't make any sense—as long as it's syntactially correct).
Example grammar:
program : <imports> NEWLINE? <namespace>
imports : ("import" <identifier> NEWLINE)*
namespace : "namespace " <identifier> NEWLINE "{" <classes> "}"
identifier: (A-Za-z_) (A-Za-z0-9_)*
...
Example generated program:
import jkhbhhuob
import aaaaa888_
namespace u8nFGubgykb
{ class ui0op_np { ... }
}
Here is a Python example using the NLTK:
from nltk import parse_cfg, ChartParser
from random import choice
def produce(grammar, symbol):
words = []
productions = grammar.productions(lhs = symbol)
production = choice(productions)
for sym in production.rhs():
if isinstance(sym, str):
words.append(sym)
else:
words.extend(produce(grammar, sym))
return words
grammar = parse_cfg('''
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
V -> 'shot' | 'killed' | 'wounded'
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'cat' | 'dog'
P -> 'in' | 'outside'
''')
parser = ChartParser(grammar)
gr = parser.grammar()
print ' '.join(produce(gr, gr.start()))
The example is adapted from the book. The sentences generated are syntactically correct but still total gibberish.