In learning Regular Expressions it had me wondering how the underlying engine works. Probably more specifically, I'd like to know more about how it evalutates, prioritizies and parses the expression. I feel the RegEx engine is a blackbox to me, and I would really enjoy deciphering it.
So I'd like to ask if there are some great resources that I could read up on that discuss RegEx engine theory.
*Note: I am not interested in building an engine, just learning the inner workings of it.
There are two main classes of regex engines.
Those based on Finite State Automaton. These are generally the fastest. They work by building a state machine, and feeding it characters from the input string. It is difficult, if not impossible, to implement some more advanced features in engines like this.
Examples of FSA based engines:
Those based on back-tracking. These often compile the pattern into byte-code, resembling machine instructions. The engine then executes the code, jumping from instruction to instruction. When an instruction fails, it then back-tracks to find another way to match the input.
Examples of back-tracking based engines:
For more information:
If you want me to expand on something, post a comment.