converting context free grammar into regular expression

Ris picture Ris · Apr 10, 2014 · Viewed 10.9k times · Source

I am currently going over CFG and saw the answer and I am not sure how they got it. How did they get it to convert into Regular Expression from CFG here?

S -> aS|bX|a
X -> aX|bY|a
Y -> aY|a


answer:
R.E -> (a*(a+ba*a+ba*ba*a))

Answer

Grijesh Chauhan picture Grijesh Chauhan · Apr 12, 2014

You should learn the basic rules that I have written in my answer "constructing an equivalent regular grammar from a regular expression", those rules will help you in converting "a regular expression into right or left liner grammar" or "a right or left liner grammar into regular expression" - both.

Though, more than one regular expressions (and grammars/automata) can be possible for a language. Below, I have tried to explain how to find regular expression given in answer for the question in your textbook. Read each step precisely and linked answer(s) so that you can learn approaches to solve such questions yourself next time.

At first step, to answering such question you should be clear "what does language this grammar generate?" (similarly, if you have an automata then try to understand language represented by that automata).

As I said in linked answer, grammar rules like: S → eS | e are corresponding to "plus clouser" and generates strings e+. Similarly, you have three pairs of such rules to generate a+ in your grammar.

S → aS | a   
X → aX | a  
Y → aY | a    

(Note: a+ can also be written as a*a or aa* – describes one or more 'a'.)

Also notice in grammar, you do not have any "null production" e.g. A → ∧, so non-of the variable S, X or Y are nullable, that implies empty string is not a member of language of grammar, as: ε ∉ L(G).

If you notice start-variable's S productions rules:

S → aS | bX | a

Then it is very clear that strings ω in language can either start with symbol 'a' or with 'b' (as you have two choices to apply S productions either (1) S → aS | a that gives 'a' as the first symbol in ω, or (2) S → bX that use to produce strings those start with symbol 'b').

Now, what are the possible minimum length strings ω in L(G)? – minimum length string is "a" that is possible using production rule: S → a.

Next note that "b" ∉ L(G) because if you apples S → bX then later on you have to replace X in sentential form bX using some of X's production rules, and as we know X is also not nullable hence there would be always some symbol(s) after 'b' – in other words sentimental from bX derives &midω&mid &geq 2.

Form above discussion, it is very clear that using S production rules you can generate sentential forms either a*a or a*bX, in two steps:

  1. For a* use S → aS repeatedly that will give S &zigrarr a*S (symbol &zigrarr means more than one steps)

  2. Replace S in rhs of S &zigrarr a*S to get either by a*a or a*bX

Also, "a*a or a*bX" can be written as S &zigrarr a*(a + bX) or S &zigrarr (a*(a + bX)) if you like to parenthesizes complete expression.

Now compare production rules of S and X both are the same! So as I shown above for S, you can also describe for X that it can use to generate sentential forms X &zigrarr (a*(a + bY)).

To derive the regular expressions given in answer replace X by (a*(a + bY)) in S &zigrarr a*(a + bX), you will get:

S &zigrarr a*(a + b X )  
S &zigrarr a*(a + b (a*(a + bY)) )

And now, last Y production rules are comparatively very simple - just use to create "plus clouser" a+ (or a*a).

So let's replace Y also in S derived sentential form.

S &zigrarr a*(a + b(a*(a + bY)))   
  &zigrarr a*(a + b(a*(a + ba*a)))

Simplify it, apply distribution low twice to remove inner parenthesis and concatenate regular expressions – P(Q + R) can be written as PQ + PR.

  &zigrarr a*(a + b(a*(a + ba*a)))     
  &zigrarr a*(a + b(a*a + a*ba*a))     
  &zigrarr a*(a + ba*a + ba*ba*a)

: &plus in regular expression in formal languages use in two syntax (i) &plus as binary operator means – "union operation" (ii) &plus as unary superscript operator means – "plus clouser"
: In regex in programming languages + is only uses for "plus clouser"
: In regex we use &mid symbol for union, but that is not exactly a union operator. In union (A ∪ B) is same as (B ∪ A) but in regex (A &mid B) may not equals to (B &mid A)