| A | B | C | Use in plaintext | in ciphertext ---+------+------+------ ---+------------------+--------------- A | 0.16 | 0.04 | 0.01 A | 0.5 | 0.1 B | 0.00 | 0.04 | 0.09 B | 0.3 | 0.5 C | 0.01 | 0.01 | 0.04 C | 0.2 | 0.4
The Hungarian algorithm then gives me a frequency-analysis-wise optimal key: $A mapsto C, B mapsto A, C mapsto B$. This is because the sum of the errors (0.01+0.00+0.01=0.02) is minimal. This is useful, but what I need is the $n$ best keys. The only thing I can come up with is to run the Hungarian algorithm, and then set one of the cells in the matrix above corresponding to the key found to a high value, so that if you run the algorithm again the key you find doesn’t contain the mapping. So, in the example above, you could set A-C to 1 so that when you run the Hungarian algorithm again you find a different key that doesn’t contain $Amapsto C$. However, this isn’t guaranteed to find good keys in order. What would be a better way to extend the Hungarian algorithm to find the best $n$ keys? Epilogue: after implementing this using D.W.’s approach below, it turned out that this method doesn’t perform well enough to crack small-length (at least up to 1000 letters) cipher texts, because frequency analysis alone isn’t enough. Performance may be improved by taking frequent digrams or trigrams into account, but I doubt this method can be as powerful as simple hill climbing.
Asked By : Camil Staps
Answered By : D.W.
- $mathcal{S}_1$: We find the best assignment $A$ such that $A(v_1) ne A_1(v_1)$ using one invocation of the Hungarian algorithm, by changing the cost of the edge $(v_1,A_1(v_1))$ to $infty$ (or some very large positive number) and then re-running the Hungarian algorithm. This finds the best assignment out of all assignments that assign $v_1$ to something different than $A_1$ did.
- $mathcal{S}_2$: We find the best assignment $A$ such that $A(v_1) = A_1(v_1)$ and $A(v_2) ne A_1(v_2)$ using one invocation of the Hungarian algorithm: change the cost of the edge $(v_1,A_1(v_1))$ to $0$, and change the cost of the edge $(v_2,A_1(v_2))$ to $infty$.
- $mathcal{S}_i$: Similarly, for each $i$, we can find the best assignment $A$ such that $A(v_j) = A_1(v_j)$ for all $j=1,2,dots,i-1$ and such that $A(v_i) ne A_1(v_i)$, using one invocation of the Hungarian algorithm.
This gives us $m$ assignments, i.e., $m$ candidates for $A_2$. By construction, each one of these assignments is different from $A_1$. Also, by construction, this covers all the space of all assignments that are different from $A_1$. Therefore, $A_2$ will be the best of these $m$ candidates, so we can just compare these $m$ candidates and call it $A_2$. That find the second-best assignment. How can we find $A_3$, the third-best assignment? Well, the same ideas apply: we’ll use a case split, but now the case-split will be a little more involved. Suppose that $v_i$ is the first agent where $A_1$ and $A_2$ disagree (i.e., $A_1$ and $A_2$ agree on $v_1,dots,v_{i-1}$ but disagree on $v_i$, so that $A_2 in mathcal{S}_i$). Then we can break down the space of possibilities for $A_3$ by looking at the first agent that receives a different assignment from $A_2$, or from $A_1$. In particular, let $mathcal{T}$ denote the space of possible candidates for $A_3$ (i.e., the space of all assignments other than $A_1$ or $A_2$). We can partition it into the disjoint union $$mathcal{T} = mathcal{S}_1 cup dots cup mathcal{S}_{i-1} cup (mathcal{T}_1 cup dots cup mathcal{T}_m) cup mathcal{S}_{i+1} cup dots cup mathcal{S}_m.$$ In other words, since $A_2 in S_i$ and we now want to exclude $A_2$ from the space of allowable assignments, we partition $S_i$ into $S_i = {A_2} cup mathcal{T}_1 cup dots cup mathcal{T}_m$ and remove $A_2$. Here $mathcal{T}_j$ denotes the set of assignments that agree with $A_2$ on $v_1,dots,v_{j-1}$ but disagrees with $A_2$ on $v_j$ (and, if $j=i$, disagrees with $A_1$ on $v_j$ as well). Now, we use the Hungarian algorithm to find the best assignment in each of $mathcal{S}_1, dots, mathcal{S}_{i-1}, mathcal{T}_1, dots, mathcal{T}_m, mathcal{S}_{i+1}, dots, mathcal{S}_m$. This is doable using the techniques shown above, using one invocation of the Hungarian algorithm per subspace. Finally, we let $A_3$ be best of all the solutions found. We can continue in this way, at each step identifying the next-best by decomposing the space of remaining assignments into multiple subspaces and invoking the Hungarian algorithm on each subspace. At each step, we introduce at most $m$ new subspaces, and we can reuse the previously-obtained results for the other subspaces. Therefore, on each step we make at most $m$ invocations of the Hungarian algorithm, so the total number of invocations of the Hungarian algorithm is $O(nm)$. There’s probably a better way to do it, but if you can’t find any other algorithm, this is one you could use. Note that this is a general technique for problem of enumerating the $n$ best assignments to any instance of the assignment problem. It’s not specific to your substitution-cipher example.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/51384 Ask a Question Download Related Notes/Documents