Question Detail: I have a homework assignment where I need to convert a grammar into LL(1). I’ve already removed the left recursion, but I’m having trouble doing left-factoring. All of the examples I’ve found are simple, and look something like this: A -> aX | aY
becomes:
A -> aZ
Z -> X | Y I understand that. However, my grammar looks more like this:
becomes:
A -> aZ
Z -> X | Y I understand that. However, my grammar looks more like this:
X -> aE | IXE | (X)E E -> IE | BXE | ϵ I -> ++ | -- B -> + | - | ϵ
I’m not sure how to apply the simpler example to this. I’ve been trying for at least a couple of hours and I’ve lost track of all of the things I’ve tried. Generally, my attempts have looked something like this:
X -> X' | IXE X' -> aE | (X)E E -> IE | BIX'E | BX'E | ϵ
And I then try to convert the E rules into ones having only one production starting with + or -:
X -> X' | IXE X' -> aE | (X)E B' -> + | - E -> IE | B'IX'E | IX'E | B'X'E | X'E | ϵ
And then…
X -> X' | IXE X' -> aE | (X)E B' -> + | - E -> +P | -M | ϵ P -> +E | IX'E | +X'E | X'E M -> -E | IX'E | -X'E | X'E
And so on. But I continually end up with a lot of extra nonterminals, and some very long productions / chains of productions, without actually having left-factored it. I’m not sure how to approach this – I can’t seem to eliminate some nonterminal having multiple productions starting with a + and with a -.
Asked By : Kami’s Aibou
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/4862
Answered By : Raphael
Let’s have a look at your grammar: $qquad begin{align} X &to aE mid IXE mid (X)E E &to IE mid BXE mid varepsilon I &to text{++} mid text{–} B &to text{+} mid text{-} mid varepsilon end{align}$ Note that $X$ does not need left-factoring: all rules have disjoint FIRST sets¹. If you want to make this obvious, you can drop $I$ and inline it: $qquad begin{align} X &to aE mid text{++}XE mid text{–}XE mid (X)E E &to text{++}E mid text{–}E mid BXE mid varepsilon B &to text{+} mid text{-} mid varepsilon end{align}$ Similarly, we can inline $B$: $qquad begin{align} X &to aE mid text{++}XE mid text{–}XE mid (X)E E &to text{++}E mid text{–}E mid text{+}XE mid text{-}XE mid XE mid varepsilon end{align}$ Now we see that we actually have to do left-factoring on $E$: we have obvious conflicts, and we get additional conflicts via $XE$. So, let’s inline $X$ once at $XE$: $qquad begin{align} X &to aE mid text{++}XE mid text{–}XE mid (X)E E &to text{++}E mid text{–}E mid text{+}XE mid text{-}XE mid aEE mid text{++}XEE mid text{–}XEE mid (X)EE mid varepsilon end{align}$ And now we can left-factor as easily as in your example: $qquad begin{align} X &to aE mid text{++}XE mid text{–}XE mid (X)E E &to text{+}P mid text{-}M mid aEE mid (X)EE mid varepsilon P &to text{+}E mid XE mid text{+}XEE M &to text{-}E mid XE mid text{-}XEE end{align}$ By now we can see that we are not getting anywhere: by factoring away $text{+}$ or $text{-}$ from the alternatives, we dig up another $X$ which again has both $text{+}$ and $text{-}$ in its FIRST set. So let’s have a look at your language. Via $qquad displaystyle X Rightarrow aE Rightarrow^* aI^n E Rightarrow aI^nBXE$ and $qquad displaystyle X Rightarrow aE Rightarrow^* aI^n E Rightarrow aI^nIE$ you have arbitrarily long prefixes of the form $+^+$ which end differently, semantic-wise: an LL(1) parser can not decide whether any given (next) $text{+}$ belongs to a pair — which would mean choosing alternative $IE$ — or comes alone — which would mean choosing $BXE$. In consequence, it doesn’t look like your language can be expressed by any LL(1) grammar, so trying to convert yours into one is futile. It’s even worse: as $BXE Rightarrow BIXEE Rightarrow^* BI^n X E^n E$, you can not decide to chose $BXE$ with any finite look-ahead. This is not a formal proof, but it strongly suggests that your language is not even LL. If you think about what you are doing — mixing Polish notation with unary operators — it is not very surprising that parsing should be hard. Basically, you have to count from the left and from the right to identify even a single $B$-$text{+}$ in a long chain of $text{+}$. If I think of multiple $B$-$text{+}$ in a chain, I’m not even sure the language (with two semantically different but syntactically equal $text{+}$) can be parsed deterministically (without backtracking) at all.
- That would be the sets of terminals that can come first in derivations of a non-terminal/rule alternative.