Problem Detail: I am interested in testing for a given string (w) whether it is in the language (L) defined by a nondeterministic finite automaton (A). I have some blurry point in my mind. What is the complexity of this test by using the NFA and backtracking? Assume my NFA has m states and string (w) has length n (|w| = n). For regular expressions, by using Thompson’s algorithm, the resultant NFA has the same complexity (worst case) if I use backtracking? What are the methods/algorithms other than backtracking and converting to DFA?
Asked By : Deniz
Answered By : Luke Mathieson
A naïve backtracking algorithm will still result in exponential behaviour ($O(2^{n})$ where $n$ is the length of the input string). This behaviour can occur even with very small automata (i.e. where the number of states $m$ is a fixed constant). Consider for example the NFA $M$ below: The alphabet is simply $Sigma ={0}$, and the language is $L(M) = {0^{k}mid k in mathbb{N}} = Sigma^{ast}$. If our algorithm chooses poorly for even the first step (for example if we use the obvious choice and go by state ID), then we’re stuck between $q_{0}$ and $q_{1}$, where at each point in the input, we have the choice of moving to the opposite state, or staying where we are – i.e. a binary choice for each character, for the entire length of the string, giving $2^{n}$ steps before we manage to backtrack to $q_{2}$ and then move along the correct path to $q_{4}$. So even with only 4 states and where every string is valid, we can still get terrible performance from a backtracking approach. There is a much better approach however. Even without converting the NFA to a DFA we can decide acceptance in polynomial time. As we are dealing with a regular language, we don’t need to remember the entire computation history up to this point (which is what backtracking does), we only need to know what state we are up to at the current point. In a DFA this is very easy, as we can only be in one state at any point, but an NFA (without knowing the final computation path) allows more possibilities. However we can simply keep track of these possibilities. The algorithm is as follows:
- Initialise $S_{0}$ to ${s_{0}}$ where $s_{0}$ is the start state.
- For each symbol of the input $c_{i}$ with $1 leq i leq n$, for each state $s in S_{i-1}$, let $E(s,c_{i})$ be the set of states reachable from $s$ using $c_{i}$ and any number of $varepsilon$-moves1. Set $C_{i} = C_{i} cup E(s,c_{i})$.
- If there is an accepting state in the set $C_{n}$, return $mathrm{YES}$, otherwise return $mathrm{NO}$.
So all the algorithm is really doing is keeping track of all the states we could be up to after having read $k$ symbols from the input. Assuming that the NFA is fixed, this algorithm is $O(n)$-time in the length of the input string. If the NFA is part of the input, it’s a little more complicated, but still polynomial time – $O(m^{2}n)$ where $m$ is the number of states (if I haven’t missed an implicit loop in there). Footnotes.
- That is, we take the $varepsilon$-closure of the state $s$, then from each of these states see if we can move using symbol $c_{i}$, then for those where we can, we take the $varepsilon$-closure again, and this gives us $E(s,c_{i})$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/33678