[Solved]: Algorithm for a list of best solutions to the Assignment problem

Problem Detail: I am trying to brute force a classical substitution cipher. The problem is that there are $26!$ possible keys. So, I’d like to do frequency analysis to try likely keys first. Then, on the first $n$ tries I will use a dictionary to see if there are actual words in the decryption. That way, I don’t need to try all possible keys. So, as suggested here, I will use chi-squared testing. I was thinking of making a matrix, best illustrated with an example. Suppose in an alphabet {A,B,C} the letter frequencies are 50%, 30% and 20% respectively, and that in a ciphertext the frequencies are 10%, 50% and 40% respectively. Then the A-B cell in the matrix is $(0.1-0.3)^2=0.04$, which is the error rate when a B in plain is encrypted as an A.

   |  A   |  B   |  C                 | Use in plaintext | in ciphertext ---+------+------+------           ---+------------------+---------------  A | 0.16 | 0.04 | 0.01             A |        0.5       |       0.1  B | 0.00 | 0.04 | 0.09             B |        0.3       |       0.5  C | 0.01 | 0.01 | 0.04             C |        0.2       |       0.4 

The Hungarian algorithm then gives me a frequency-analysis-wise optimal key: $A mapsto C, B mapsto A, C mapsto B$. This is because the sum of the errors (0.01+0.00+0.01=0.02) is minimal. This is useful, but what I need is the $n$ best keys. The only thing I can come up with is to run the Hungarian algorithm, and then set one of the cells in the matrix above corresponding to the key found to a high value, so that if you run the algorithm again the key you find doesn’t contain the mapping. So, in the example above, you could set A-C to 1 so that when you run the Hungarian algorithm again you find a different key that doesn’t contain $Amapsto C$. However, this isn’t guaranteed to find good keys in order. What would be a better way to extend the Hungarian algorithm to find the best $n$ keys? Epilogue: after implementing this using D.W.’s approach below, it turned out that this method doesn’t perform well enough to crack small-length (at least up to 1000 letters) cipher texts, because frequency analysis alone isn’t enough. Performance may be improved by taking frequent digrams or trigrams into account, but I doubt this method can be as powerful as simple hill climbing.

Asked By : Camil Staps

Answered By : D.W.

Here’s one technique to enumerate the best $n$ assignments, for any instance of the assignment problem. I suspect my approach isn’t optimal, but it does run in polynomial time: it uses $O(nm)$ invocations of the Hungarian algorithm, where $m$ denotes the number of agents in the problem instance. In your example, $m=26$, so my approach requires $O(n)$ invocations of the Hungarian algorithm. Let $A_1,A_2,A_3,dots$ denote the assignments, from best to worse. $A_1$ is the best assignment; $A_2$ is the next-best; and so on. Our goal want to enumerate $A_1,dots,A_n$. You can find $A_1$ by solving the original assignment problem, e.g., with the Hungarian algorithm. How can we find $A_2$, the second-best assignment? The idea is to use a case analysis. Let $v_1,dots,v_m$ denote the $m$ agents in the problem instance, and let $A(v)$ denote the task assigned to agent $v$ by assignment $A$. We’ll break down the space $mathcal{S}$ of possible candidates for $A_2$ (i.e., the space of all assignments other than $A_1$) into the disjoint union $mathcal{S} = mathcal{S}_1 cup dots cup mathcal{S}_m$, where $mathcal{S}_i$ is the space of assignments that agree with $A_1$ for $v_1,dots,v_{i-1}$ but disagree with $A_1$ on $v_i$. (In other words, we look at the first agent that receives a different assignment in $A_1$ vs $A_2$. Then there are $m$ possibilities for that agent; we let $i$ denote its index, i.e., the index of the first agent whose assignment in $A_1$ is different from its assignment in $A_2$. This breaks down the space $mathcal{S}$ into subspaces $mathcal{S}_1,dots, mathcal{S}_m$, as listed before.) Now the approach will be to find the best assignment in each $mathcal{S}_i$, separately.

  • $mathcal{S}_1$: We find the best assignment $A$ such that $A(v_1) ne A_1(v_1)$ using one invocation of the Hungarian algorithm, by changing the cost of the edge $(v_1,A_1(v_1))$ to $infty$ (or some very large positive number) and then re-running the Hungarian algorithm. This finds the best assignment out of all assignments that assign $v_1$ to something different than $A_1$ did.
  • $mathcal{S}_2$: We find the best assignment $A$ such that $A(v_1) = A_1(v_1)$ and $A(v_2) ne A_1(v_2)$ using one invocation of the Hungarian algorithm: change the cost of the edge $(v_1,A_1(v_1))$ to $0$, and change the cost of the edge $(v_2,A_1(v_2))$ to $infty$.
  • $mathcal{S}_i$: Similarly, for each $i$, we can find the best assignment $A$ such that $A(v_j) = A_1(v_j)$ for all $j=1,2,dots,i-1$ and such that $A(v_i) ne A_1(v_i)$, using one invocation of the Hungarian algorithm.

This gives us $m$ assignments, i.e., $m$ candidates for $A_2$. By construction, each one of these assignments is different from $A_1$. Also, by construction, this covers all the space of all assignments that are different from $A_1$. Therefore, $A_2$ will be the best of these $m$ candidates, so we can just compare these $m$ candidates and call it $A_2$. That find the second-best assignment. How can we find $A_3$, the third-best assignment? Well, the same ideas apply: we’ll use a case split, but now the case-split will be a little more involved. Suppose that $v_i$ is the first agent where $A_1$ and $A_2$ disagree (i.e., $A_1$ and $A_2$ agree on $v_1,dots,v_{i-1}$ but disagree on $v_i$, so that $A_2 in mathcal{S}_i$). Then we can break down the space of possibilities for $A_3$ by looking at the first agent that receives a different assignment from $A_2$, or from $A_1$. In particular, let $mathcal{T}$ denote the space of possible candidates for $A_3$ (i.e., the space of all assignments other than $A_1$ or $A_2$). We can partition it into the disjoint union $$mathcal{T} = mathcal{S}_1 cup dots cup mathcal{S}_{i-1} cup (mathcal{T}_1 cup dots cup mathcal{T}_m) cup mathcal{S}_{i+1} cup dots cup mathcal{S}_m.$$ In other words, since $A_2 in S_i$ and we now want to exclude $A_2$ from the space of allowable assignments, we partition $S_i$ into $S_i = {A_2} cup mathcal{T}_1 cup dots cup mathcal{T}_m$ and remove $A_2$. Here $mathcal{T}_j$ denotes the set of assignments that agree with $A_2$ on $v_1,dots,v_{j-1}$ but disagrees with $A_2$ on $v_j$ (and, if $j=i$, disagrees with $A_1$ on $v_j$ as well). Now, we use the Hungarian algorithm to find the best assignment in each of $mathcal{S}_1, dots, mathcal{S}_{i-1}, mathcal{T}_1, dots, mathcal{T}_m, mathcal{S}_{i+1}, dots, mathcal{S}_m$. This is doable using the techniques shown above, using one invocation of the Hungarian algorithm per subspace. Finally, we let $A_3$ be best of all the solutions found. We can continue in this way, at each step identifying the next-best by decomposing the space of remaining assignments into multiple subspaces and invoking the Hungarian algorithm on each subspace. At each step, we introduce at most $m$ new subspaces, and we can reuse the previously-obtained results for the other subspaces. Therefore, on each step we make at most $m$ invocations of the Hungarian algorithm, so the total number of invocations of the Hungarian algorithm is $O(nm)$. There’s probably a better way to do it, but if you can’t find any other algorithm, this is one you could use. Note that this is a general technique for problem of enumerating the $n$ best assignments to any instance of the assignment problem. It’s not specific to your substitution-cipher example.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/51384  Ask a Question  Download Related Notes/Documents

Leave a Reply