Can someone explain to me what a context free grammar is? After looking at the Wikipedia entry and then the Wikipedia entry on formal grammar, I am left utterly and totally befuddled. Would someone be so kind as to explain what these things are?
I am wondering this because I wish to investigate parsing, and also on the side, the limitation of a regex engine.
I'm not sure if these terms are directly programming related, or if they are related more to linguistics in general. If that is the case, I apologize, perhaps this could be moved if so?
A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.
A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols... very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.
Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:
S -> BBB
B -> 0
B -> 1
The way to interpret this is to say that S
can be replaced by BBB
, and B
can be replaced by 0, and B
can be replaced by 1. So to construct the string 010 we can do S -> BBB -> 0BB -> 01B -> 010
.
A context-free grammar is simply a grammar where the thing that you're replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can't appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like
S -> AB
AB -> 1
A -> AA
B -> 0
Are not context free since they contain rules like AB -> 1
that have more than one non-terminal symbol on the left.