Informally, from the point of view of algorithmic information theory, the information content of a string is equivalent to the length of the shortest possible self-contained representation of that string.
What is the analogous informal rigorous definition of “useful information”? Why is “useful information” not taken as the more natural or more fundamental concept; naively it seems a purely random string must by definition contain zero information, so I’m trying to get my head around the fact that it is considered to have maximal information by the standard definition.
Asked By : user1247
Answered By : Juho
$A = 1010$ $1010$ $1010$ $1010$, and $B = 1011$ $0110$ $0111$ $1001$.
Note that $|A| = |B| = 16$. How could we quantify how much information $A$ or $B$ has? If we think about classical information theory, in general, transmitting a string of length $n$ takes $n$ bits on average. However we cannot say how many bits we need to transmit a specific string of length $n$. Why is the information content of a random string not zero? On a closer look, we can see that in fact $A = 10^8$. However, it is much harder to say if $B$ has any obvious patterns in its structure, at least it seems and feels more random than $A$. Because we can find a pattern in $A$, we can easily compress $A$ and represent it with less than $16$ bits. Likewise, since it is not easy to detect any patterns in $B$, we cannot compress it as much. Therefore we can say that $B$ has more information than $A$. Moreover, a random string of length $n$ has maximal information since there is no way we can compress it, and hence represent it with less than $n$ bits. What is useful information, then? For useful information, yes, there is a definition using a Turing machine $T$. The useful information in $x in mathbb{B}^*$ is $$min_T space {space l(T) + C(x|T) : T in { T_0, T_1, … } },$$ where $l(T)$ denotes the length of a self-limiting encoding for a Turing machine $T$. The notation is usually such that $C(x)$ denotes the Kolmogorov complexity of $x$ and $C(x|y)$ the conditional Kolmogorov complexity of $x$ given $y$. Here $T$ embodies the amount of useful information contained in $x$. What we could ask is which such $T$ to select among those that satisfy the requirement. The problem is to separate a shortest program $x^*$ into parts $x^* = pq$ s.t. $p$ represents an appropriate $T$. This is actually the very idea that spawned minimum description length (MDL).
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/945