How can I represent epsilon in a regular expression?

ubiquibacon picture ubiquibacon · Sep 15, 2010 · Viewed 12.1k times · Source

The text book teaches us to write regular expressions using the epsilon (ε) symbol, but how can I translate that symbol directly to code without having to completely rework my regular expression?

For instance, how would I write this regex which would catch all lowercase strings that either begin or end in a (or both).

Not 100% sure this is correct but...

((a|epsilon)[a-z]*a) | (a[a-z]*(a|epsilon))

So some strings that should match include:

a //single "a" starts or ends with "a"

aa //starts and ends with "a"

ab //starts with "a"

ba //ends with "a"

aba //starts and ends with "a"

aaaaaaaa //starts and ends with "a"

abbbbbbb //starts with "a"

bbbbbbba //ends with "a"

abbbbbba //starts and ends with "a"

asdfhgdu //starts with "a"

onoineca //ends with "a"

ahnrtyna //starts and ends with "a"

I only what to exchange epsilon for the correct symbol, I do not want to modify any part of the rest of the expression. Also I want to be clear, I am not actually checking for the epsilon symbol, I want to have a choice of a character or nothing (well not nothing... epsilon).

Does such a symbol exist?

Is what I want possible?

Answer

Konrad Rudolph picture Konrad Rudolph · Sep 15, 2010

Just omit the 𝜖, since it denotes the empty string:

([1-9]|)[0-9]*

There’s also a shortcut for this particular case:

([1-9]?)[0-9]*

The ? means zero or one occurrences of the preceding token.