What are the possible sets of word lengths in a regular language?

Problem Detail: Given a language $L$, define the length set of $L$ as the set of lengths of words in $L$: $$mathrm{LS}(L) = {|u| mid u in L }$$ Which sets of integers can be the length set of a regular language?

Asked By : Gilles

Answered By : Gilles

First, an observation which is not crucial but convenient: the set $mathscr{S}$ of sets of integers that are $LS(L)$ for some regular language $L$ on a non-empty alphabet $mathscr{A}$ does not depend on the choice of alphabet. To see that, consider a finite automaton that recognizes $L$; the lengths of the words that are in $L$ are the lengths of the paths on the automaton seen as an unlabeled graph from the start state to any accept state. In particular, you can relabel every arrow to $a$ and get a regular language with the same length set over the alphabet ${a}$. Conversely, if $L$ is a regular language over a one-element alphabet, it can be trivially injected into a larger alphabet, and the result is still a regular language. Therefore we are looking for the possible length sets for words over a singleton alphabet. On a singleton alphabet, the language is the length set written out in unary: $mathrm{LS}(L) = {ninmathbb{N} mid a^n in L}$. Let $L$ be a regular language, and consider a deterministic finite automaton (DFA) that recognizes $L$. The set of lengths of words of $L$ is the set of lengths of paths in the DFA seen as a directed graph that start on the start state and end in one of the accept states. A DFA on a one-element alphabet is pretty tame (NFAs would be wilder): it’s either a finite list or a circular list. If the list is finite, number the states from $0$ to $h$ following the list order; if it’s circular, number the states from $0$ to $h$ following the head of the list, and $h$ to $h+r$ along the loop. list-shaped automata Let $F$ be the set of indices of accept states up to $h$, and $G$ be the set of indices of accept states from $h$ to $h+r$. Then $$mathrm{LS}(L) = F cup { k , r + x mid x in G, kinmathbb{N} }$$ Conversely, let $h$ and $r$ be two integers and $F$ and $G$ be two finite sets of integers such that $forall x in F, x le h$ and $forall x in G, h le x le h+r$. Then the set $L_{F,G,r} = { a^{k,r+x} mid xin G, kinmathbb{N} }$ is a regular language: it is the language recognized by the DFA described above. A regular expression that describes this language is $a^F mid a^{G} (a^r)^*$. To summarize in English, the length sets of regular languages are the sets of integers that are periodic¹ above a certain value. ¹ To hang on to a well-established notion, periodic means the characteristic function of the set (which is a function $mathbb{N}to{mathtt{false},mathtt{true}}$ which we lift to a function $mathbb{Z}to{mathtt{false},mathtt{true}}$) is periodic. Periodic above a certain value means that the function restricted to $[h,+infty[$ can be prolonged into a periodic function.
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/164

Leave a Reply