[Solved]: Determining capabilities of a min-heap (or other exotic) state machines

Problem Detail: See the end of this post for some clarification on the definition(s) of min-heap automata. One can imagine using a variety of data structures for storing information for use by state machines. For instance, push-down automata store information in a stack, and Turing machines use a tape. State machines using queues, and ones using two multiple stacks or tapes, have been shown to be equivalent in power to Turing machines. Imagine a min-heap machine. It works exactly like a push-down automaton, with the following exceptions:

  1. Instead of getting to look at the last thing you added to the heap, you only get to look at the smallest element (with the ordering defined on a per-machine basis) currently on the heap.
  2. Instead of getting to remove the last thing you added to the heap, you only get to remove one of the smallest element (with the ordering defined on a per-machine basis) currently on the heap.
  3. Instead of getting to add an element to the top of the heap, you can only add an element to the heap, with its position being determined according to the other elements in the heap (with the ordering defined on a per-machine basis).

This machine can accept all regular languages, simply by not using the heap. It can also accept the language $displaystyle {a^{n}b^{n} in {a, b}^{*} mid n ge 0}$ by adding $a$’s to the heap, and removing $a$’s from the heap when it reads $b$’s. It can accept a variety of other context-free languages. However, it cannot accept, for instance, $displaystyle {w in {a, b}^{*} mid w = w^{R}}$ (stated without proof). EDIT: or can it? I don’t think it can, but I’ve been surprised before, and I’m sure I’ll keep being surprised when my assumptions to keep making of me an… well.

Can it accept any context-sensitive or Turing-complete languages?

More generally, what research, if any, has been pursued in this direction? What results are there, if any? I am also interested in other varieties of exotic state machines, possibly those using other data structures for storage or various kinds of restrictions on access (e.g., how LBAs are restricted TMs). References are appreciated. I apologize in advance if this question is demonstrating ignorance. Formal Definition: I provide some more detailed definitions of min-heap automata here in order to clarify further discussion in questions which reference this material. We define a type-1 nondeterministic min-heap automaton as a 7-tuple $$(Q, q_0, A, Sigma, Gamma, Z_0, delta)$$ where…

  1. $Q$ is a finite, non-empty set of states;
  2. $q_0 in Q$ is the initial state;
  3. $A subseteq Q$ is the set of accepting states;
  4. $Sigma$ is a finite, non-empty input alphabet;
  5. $Gamma$ is a finite, non-empty input alphabet, where the weight of a symbol $gamma in Gamma$, $w(gamma) in mathbb{N}$, is such that $w(gamma_1) = w(gamma_2) iff gamma_1 = gamma_2$;
  6. $Z_0 notin Gamma$ is the special bottom-of-the-heap symbol;
  7. $delta : Q times (Sigma cup {epsilon}) times (Gamma cup {Z_0}) rightarrow mathcal{P}({Q times Gamma^*})$ is the transition function.

The transition function works by assuming an initially empty heap consisting of only $Z_0$. The transition function may add to the heap an arbitrary collection (finite, but possibly empty or with repeats) of elements $gamma_1, gamma_2, …, gamma_k in Gamma$. Alternatively, the transition function may remove an instance of the element $gamma$ with the lowest weight $w(gamma)$ of all elements remaining on the heap (i.e., the element on top of the heap). The transition function may only use the top-most (i.e., of minimal weight) symbol instance in determining any given transition. Further, define a type-1 deterministic min-heap automaton to be a type-1 nondeterministic min-heap automaton which satisfies the following property: for all strings $x{sigma}y in Sigma$ such that $|x| = n$ and $sigma in Sigma$, $|delta^{n+1}(q_0, x{sigma}y, Z_0)| leq 1$. Define also a type-2 nondeterministic min-heap automaton exactly the same as a type-1 nondeterministic min-heap automaton, except for the following changes:

  1. $Gamma$ is a finite, non-empty input alphabet, where the weight of a symbol $gamma in Gamma$, $w(gamma) in mathbb{N}$, is such that $w(gamma_1) = w(gamma_2)$ does not necessarily imply $gamma_1 = gamma_2$; in other words, different heap symbols can have the same weight.
  2. When instances of distinct heap symbols with same weight are added to the heap, their relative order is preserved according to a last-in, first-out (LIFO) stack-like ordering.

Thanks to Raphael for pointing out this more natural definition, which captures (and extends) the context-free languages. Some results demonstrated so far:

  1. Type-1 min-heap automata recognize a set of languages which is neither a subset nor a superset of the context-free languages. [1,2]
  2. Type-2 min-heap automata, by their definition, recognize a set of languages which is a proper superset of the context-free languages, as well as a proper superset of the languages accepted by type-1 min-heap automata.
  3. Languages accepted by type-1 min-heap automata appear to be closed under union, concatenation, and Kleene star, but not under complementation [1], intersection, or difference;
  4. Languages accepted by type-1 nondeterministic min-heap automata appear to be a proper superset of languages accepted by type-1 deterministic min-heap automata.

There may be a few other results I have missed. More results are (possibly) on the way. Follow-up Questions

  1. Closure under reversal? — Open
  2. Closure under complementation? — No!
  3. Does nondeterminism increase power? — Yes?
  4. Is $HAL subsetneq CSL$ for type-2? — Open
  5. Does adding heaps increase power for type-1? — $HAL^1 subsetneq HAL^2 = HAL^k$ for $k > 2$ (?)
  6. Does adding a stack increase power for type-1? — Open
Asked By : Patrick87

Answered By : Alex ten Brink

You can recognize the canonical non-context-free (but context-sensitive) language ${ a^n b^n c^n | n geq 1 }$ with this type of state machine. The crux is that you add tokens to the heap for every $a$ character, and while parsing the $b$ characters, you add ‘larger’ tokens to the heap, so they only end up at the bottom of the heap when you have parsed all the $b$ characters. Heap symbols are $a$ and $b$, where $a < b$. We consume all the $a$ symbols on the input and add $a$ symbols to the heap. If we encounter a $b$, we switch strategies: for every $b$ we encounter subsequently we remove an $a$ from the heap and add a $b$ to the heap. When we encounter a $c$ we should have run out of $a$s to remove, and then for every $c$ in the remaining input we remove a $b$ from the heap. If the heap is empty at the end, the string is in the language. Obviously, we reject if something goes wrong. Update: The language $EPAL = { ww^R | w in {a, b}^* }$ can not be recognized by min-heap automata. Suppose that we do have a min-heap automaton that can recognize $EPAL$. We look at the ‘state’ the automaton is in after reading $w$ (the first part of the input, so $w^R$ is next). The only state we have are the contents of the heap and the particular state of the automaton it is in. This means that after recognizing $w$, this ‘state’ needs to hold enough information to match $w^R$. In particular, in order to do this, there must be $2^n$ possible different ‘state’s (where $n = |w|$), as there are $2^n$ possible words consisting of $a$ and $b$ characters. As there are only a finite number of states and only a finite number of heap characters, this implies that there exists some word $w$ for which the heap contains an exponential number of some heap character, say $x$. We first prove the theorem for deterministic min-heap automata, and then extend this proof to non-deterministic min-heap automata. In particular, deterministic automata that recognize some language will not put themselves in an infinite loop, which is a useful property. We shall prove that the heap can only contain at most a number of heap tokens that is linear in the number of characters read from the input. This immediately rules out that $x$ appears an exponential number of times on the heap, which completes the proof that $EPAL$ can not be recognized by min-heap automata. Because we only have a finite number of states in our automaton and because a deterministic automaton will not put itself into an infinite loop, on reading an input signal it will add at most a constant number of heap characters onto the heap. Similarly, on consuming some heap symbol $y$, it can only add at most a constant number of heap characters that are strictly larger than $y$ and it can only decrease the number of $y$ symbols on the stack (otherwise we get an infinite loop). Consuming heap symbols may therefore cause an (enormous) buildup of larger heap symbols, but as there are only a constant number of different types of heap symbols, this is only a constant number not dependent on $n$. This implies that the number of heap symbols is at most some (large) constant times the number of input symbols read so far. This completes the proof for the deterministic case. In the non-deterministic case, the proof is similar, but a bit trickier: instead of adding at most some constant number of heap tokens to the heap, it adds some arbitrary number of heap tokens to the heap. However, the crucial point is that this number does not depend on $n$. In particular, if we can non-deterministically get exactly the right heap symbols on the heap after recognizing $w$ (right for recognizing $w^R$), we can also non-deterministically choose the heap symbols that match some other word $w'$, and thereby recognize $w w'^R$, thus contradicting that the min-heap automaton recognizes exactly $EPAL$. Update 3: I’ll make the last argument (about non-determinism) rigorous. By the above argument, there must exist an infinite set of words $W subseteq {a,b}^*$ such that for every $w in W$, after recognizing $w$, the heap contains $omega(|w|)$ elements (note that we can talk about $O(f(|w|))$ as we have an infinite set of words). As we cannot get that many elements on the heap through deterministic means, we must have had some form of a loop in which we first non-deterministically chose to add more elements to the heap (without consuming input), and later chose to exit this loop, and we must have traversed this loop $omega(1)$ times. Take the set of all such loops used by $W$. As there are only $O(1)$ states, the size of this set is $O(1)$, and the set of all its subsets is also $O(1)$. Now note that the ‘deterministic’ part of the execution paths can only contribute to $O(|w|)$ of the tokens, which means that a lot of the exponential number of different words must have execution paths whose ‘deterministic’ parts contribute the same tokens to the heap. In particular, the only way to get more tokens is to take the loops we identified above. Combining these observations, this means that there must be two distinct words in $W$, $w_1$ and $w_2$ say, whose ‘deterministic’ part of the execution paths contribute the same tokens to the heap, and that are differentiated by taking some subset of the loops above a different number of times, but that use the same subset of loops (remember there are only $O(1)$ of these loops). We can now show that $w_1 w_2$ can also be recognized by the min-heap automaton: we follow the execution path for $w_1$ as above, but we traverse the loops the same number of times the execution path for $w_2$ traverses them. This fills the min-heap with tokens such that $w_2$ is accepted as suffix, thus completing the proof. Update 2: It just occurred to me that the above means that we can simulate a deterministic min-heap automaton using only logarithmic space: we keep a counter for every type of character in the min-heap. As shown above, this counter will at most be $O(n)$, and hence can be stored using only $O(log n)$ space (as there are only a constant number of these counters). This gives us:

$mathrm{DHAL} subset mathrm{L}$ $mathrm{HAL} subset mathrm{NL}$

where $mathrm{DHAL}$ is the set of languages recognized by some deterministic min-heap automaton.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/110

Leave a Reply