Problem Detail: I’ve come across the CYK algorithm and was wondering, as it’s quite old, if it is still relevant today. Is it or an extension of it still being used in compilers (for example), or have other algorithms proven to be better?
Asked By : Camil Staps
Answered By : babou
CYK is still relevant, afaik, as the simplest example of a family of general parsing algorithm based on dynamic programming, ranging over all parsing technique (that I know of) and many syntactic formalisms. There is a simpler general parsing algorithm (below), but where the dynamic programming (DP) aspect is no longer visible. as things are defined more globally. What I mean by general parsing algorithm, is a parsing algorithm that can parse all memeber of a family of languages, according to their given grammar, and produce all the possible parse structures when the parsed string is ambiguous, combined in a condensed form called a (shared-)parse-forest. Though this is often considered only for context-free (CF) grammars, it does generalize to other formalisms. Most known such algorithms are for CF grammars: such as CYK, Earley, GLR, GLL with diverse variants. They are also known as chart parsers, and extend beyond formal languages to some logical formalisms, often introduced by computational linguists via so-called feature structures. They can all be seen as extensions or refinement of CYK, though that is often a subjective assessment. They can also be seen as extensions or refinement of the more basic algorithm presented below. My experience (but new research might prove me wrong) is that the DP structure of CYK is a proper route to understand, in a simple context, some of the issues raised by the various techniques for improving the performance of those parsers in the structured context of the DP calculation, or to introduce various features such as Viterbi selection of the best parses. Some of the most basic issues, such as the grammatical forms to be used to keep a low time and space complexity (cubic), or how to represent parse-forests. can be more readily viewed and understood from the simplest algorithm, which is the old cross-product construction (Bar-Hillel, Perles and Shamir, 1961) between a CF grammar and a FSA, to produce the grammar of the CF intersection of their languages. This view is developed in a paper by (Lang 1995) along with extensions to other formal languages. From a practical point of view, i.e. for use in actual parsers, some of the more recent general parsers may be better tool. But I believe that CYK remains an important and simple educational step, more so in my opinion than Earley’s algorithm that is much more complex. Which of the other DP algorithm should be used for actual systems remains the object of some debate, as performance issues are somewhat subtle. I discuss some aspects a bit more in depth in another answer.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/32320