The text book teaches us to write regular expressions using the epsilon (ε)
symbol, but how can I translate that symbol directly to code without having to completely rework my regular expression?
For instance, how would I write this regex which would catch all lowercase strings that either begin or end in a
(or both).
Not 100% sure this is correct but...
((a|epsilon
)[a-z]*
a) | (a[a-z]*
(a|epsilon
))
So some strings that should match include:
a //single "a" starts or ends with "a"
aa //starts and ends with "a"
ab //starts with "a"
ba //ends with "a"
aba //starts and ends with "a"
aaaaaaaa //starts and ends with "a"
abbbbbbb //starts with "a"
bbbbbbba //ends with "a"
abbbbbba //starts and ends with "a"
asdfhgdu //starts with "a"
onoineca //ends with "a"
ahnrtyna //starts and ends with "a"
I only what to exchange epsilon
for the correct symbol, I do not want to modify any part of the rest of the expression. Also I want to be clear, I am not actually checking for the epsilon symbol, I want to have a choice of a character or nothing (well not nothing... epsilon).
Does such a symbol exist?
Is what I want possible?
Just omit the 𝜖, since it denotes the empty string:
([1-9]|)[0-9]*
There’s also a shortcut for this particular case:
([1-9]?)[0-9]*
The ?
means zero or one occurrences of the preceding token.