What is a Context Free Grammar?

Ell picture Ell · Jul 15, 2011 · Viewed 29.6k times · Source

Can someone explain to me what a context free grammar is? After looking at the Wikipedia entry and then the Wikipedia entry on formal grammar, I am left utterly and totally befuddled. Would someone be so kind as to explain what these things are?

I am wondering this because I wish to investigate parsing, and also on the side, the limitation of a regex engine.

I'm not sure if these terms are directly programming related, or if they are related more to linguistics in general. If that is the case, I apologize, perhaps this could be moved if so?

Answer

aegrisomnia picture aegrisomnia · Jul 15, 2011

A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.

A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols... very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.

Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:

S -> BBB
B -> 0
B -> 1

The way to interpret this is to say that S can be replaced by BBB, and B can be replaced by 0, and B can be replaced by 1. So to construct the string 010 we can do S -> BBB -> 0BB -> 01B -> 010.

A context-free grammar is simply a grammar where the thing that you're replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can't appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like

S -> AB
AB -> 1
A -> AA
B -> 0

Are not context free since they contain rules like AB -> 1 that have more than one non-terminal symbol on the left.