[Solved]: Efficient algorithm for iterated find/replace

Problem Detail: I’m looking for an algorithm for doing iterated find/replace, where the act of finding the replacement list of tokens for a given find is slow. Specifically: I have a function, f, that maps a sequence of tokens to either a shorter list of tokens or None. However, it is slow. I want to repeatedly try to replace the (first, in the case of ties) shortest unexplored subsequence in the input with the result of applying the function to it, assuming that there is a valid replacement for it. However, the naive way to do this ends up with exponential runtime. Is there a better algorithm for doing this? (So, for example, if f maps ba -> b, aaa -> ab, and aba -> bb, and I apply the algorithm to aaaaa, I want to get (aaaaa->abaa->aba->)ab, not (aaaaa->abaa->bba->)bb or any of the other sequences of replacements possible, if any.) (Note: this example uses characters for simplicity)

Asked By : TLW

Answered By : Rick Decker

This isn’t a complete answer, but it provides some context that’s far too long to put in a comment. What you’ve described is an instance of a string rewriting system, also known as a semi-Thue system. Start with a finite alphabet, $Sigma$, and a binary relation, $R$, on strings over $Sigma$. For your example, we’ll have $R$ defined by $$ ba, R, b,quad aaa, R, ab,quad aba, R, bb $$ We can then define another relation, $Rightarrow$, on strings by, basically, applying $R$ to substrings. For two strings $x, y$, define $xRightarrow y$ if and only if $$ text{there exist strings }p, q, r, stext{ such that } x=prq, y=psq,text{ and }r, R s $$ In your example, we have, for instance, the chain of rewrites $aaaaaRightarrow abaaRightarrow abaRightarrow bb$, and there we stop, since we cannot reduce $bb$ any further. In a rewriting system, a string like $bb$ which cannot be rewritten is called irreducible or sometimes a normal form. If there is a chain of rewrites starting from a seed string $x$ that leads to a normal form we denote the result by $xdownarrow$. Again, in your example, the normal forms derivable from $aaaaa$ are $ab, bb, aab, abb$. Your question in its most general form is

Question 1. In a string rewriting system, is there a computationally efficient way to find a normal form for a given string $x$?

In general, the answer is no, there isn’t. The problem comes in part from the fact that some (or all) seed strings might never lead to a normal form. This could arise in a rewriting system that permits chains like $$ x_1Rightarrow x_2Rightarrow cdotsRightarrow x_iRightarrow x_{i+1}RightarrowcdotsRightarrow x_i $$ Fortunately, you’ve imposed a helpful constraint, namely that whenever $x,R,y$ then the lengths satisfy $|x|>|y|$. We’ll call such a system monotone or length-reducing. In a monotone rewriting system we’re always going to have a normal form for any seed string, since every rewrite operation strictly decreases the length of the string. A rewriting system for which every string has at least one normal form is called noetherian, by the way. Refining the question gives us

Question 2. In a monotone string rewriting system, is there a computationally efficient way to find a normal form for a given string $x$?

The answer to this question is yes, of course: given a seed string $x$, simply apply rewrite rules until you reach a normal form. More systematically, if $|x| = n$ and you have $m$ base rewrite rules ${w_1,R,z_1,w_2,R,z_2,dots w_m,R,z_m}$ with $M=max{|w_i|}$, then we’ll require $O(nmM)$ character comparisons, even if we don’t use efficient string search algorithms. We haven’t quite gotten to your real question, though, since you also stipulate that at each step you will choose to apply your base rewrite rules in order of the lengths of their left-hand sided, |w_i|, from shortest to longest. It’s not clear that this is a useful constraint; in fact, some string search algorithms (Boyer-Moore, in particular) are more efficient when searching for longer substrings than they are for shorter ones. It might be helpful to impose another constraint, though: when searching for a substring to replace, always choose the leftmost substring, regardless of length. This would put you in a situation involving prefixes and there’s a fair amount known about this. Choosing rightmost substrings as replacement candidates might also be a useful avenue. Finally, we come to the most interesting question:

Question 3. In a monotone string rewriting system, is there a computationally efficient way to find a normal form of minimal length?

This should help you to find answers. There’s good news for your search, namely that there’s a lot of material out there, both in books (yes, whole books have been written on string rewriting systems) and online. I did a search for “string rewriting systems” and got a million and a half hits. The bad news is that there’s a lot of material out there, so finding an answer is likely going to take some serious drilling. Frankly, I don’t know the answer to Question 3, but my uninformed guess is that the answer is affirmative. If you haven’t gotten a definitive answer in a few days, you might want to ask your question, suitably reworded, over at theoretical computer science.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/28307  Ask a Question  Download Related Notes/Documents

Leave a Reply