What is the Pumping Lemma in Layman's terms?

shsteimer picture shsteimer · Jan 20, 2009 · Viewed 35.6k times · Source

I saw this question, and was curious as to what the pumping lemma was (Wikipedia didn't help much).

I understand that it's basically a theoretical proof that must be true in order for a language to be in a certain class, but beyond that I don't really get it.

Anyone care to try to explain it at a fairly granular level in a way understandable by non mathematicians/comp sci doctorates?

Answer

Graphics Noob picture Graphics Noob · Dec 19, 2009

The pumping lemma is a simple proof to show that a language is not regular, meaning that a Finite State Machine cannot be built for it. The canonical example is the language (a^n)(b^n). This is the simple language which is just any number of as, followed by the same number of bs. So the strings

ab
aabb
aaabbb
aaaabbbb

etc. are in the language, but

aab
bab
aaabbbbbb

etc. are not.

It's simple enough to build a FSM for these examples:

FSM

This one will work all the way up to n=4. The problem is that our language didn't put any constraint on n, and Finite State Machines have to be, well, finite. No matter how many states I add to this machine, someone can give me an input where n equals the number of states plus one and my machine will fail. So if there can be a machine built to read this language, there must be a loop somewhere in there to keep the number of states finite. With these loops added:

FSM 2

all of the strings in our language will be accepted, but there is a problem. After the first four as, the machine loses count of how many as have been input because it stays in the same state. That means that after four, I can add as many as as I want to the string, without adding any bs, and still get the same return value. This means that the strings:

aaaa(a*)bbbb

with (a*) representing any number of as, will all be accepted by the machine even though they obviously aren't all in the language. In this context, we would say that the part of the string (a*) can be pumped. The fact that the Finite State Machine is finite and n is not bounded, guarantees that any machine which accepts all strings in the language MUST have this property. The machine must loop at some point, and at the point that it loops the language can be pumped. Therefore no Finite State Machine can be built for this language, and the language is not regular.

Remember that Regular Expressions and Finite State Machines are equivalent, then replace a and b with opening and closing Html tags which can be embedded within each other, and you can see why it is not possible to use regular expressions to parse Html