[Solved]: Weighted Set covering problem with a fixed number of colors

Problem Detail: I have a set of elements U = {1, 2, …. , n} and a set S of k sets whose union form the whole universe. Each of these sets is associated with a cost. I have a fixed number of colors, C = {1 , 2, … , m}. Some of the sets mentioned above interfere with each other. I cannot assign same color to both those sets together. I want to pick the sets and color them from my available color list in the following way: **Objective: Minimize the total cost of the selected sets Constraints:

All elements of the universe are covered
No two sets that interfere with each other is assigned the same color**

If the second constraint, i.e., coloring constraint, is taken out, the problem reduces to standard weighted set covering problem. I can solve that using a greedy manner. For example, greedy unweighted set covering will work in the following way: — 1. pick the set with the highest number of elements at first, 2. Remove that set and the associated elements from the universe, 3. Repeat step 1 until all elements of the universe are covered. But the coloring constraint in the presence of interference among sets and a fixed number of colors complicates the issue. For example, let’s assume, U = {1, 2, 3, 4, 5}. There are three sets, {1, 2, 3}; {2, 3, 4, 5} ; {4, 5}. Assume {2, 3, 4, 5} interferes with both {1, 2, 3} and {4, 5}. {1, 2, 3} and {4, 5} do not interfere with each other. Assume that there is only one color in the system. A standard greedy unweighted set coloring solution will pick {2, 3, 4, 5} at first and {1, 2, 3} in the second round. But that is an infeasible solution to my problem since they interfere with each other and I have only one color in my system. A feasible solution will be the selection of {1, 2, 3} and {4, 5}. I wonder how I can minimize the total cost while meeting the color constraint. Any hint on the unweighted version of the problem (where all sets have equal cost) will be very helpful, too. Thanks, Nazmul Additional information: The coloring of the sets and the interference can be understood by my application scenario. I am looking at a wireless cell. I have a set of frequencies, possible base station locations and their associated users. Each set is associated with a station and it shows the set of users that the base station can serve. “Interference” from one set to the other means that the users’ signals of one set reaches the other set’s base station. In a wireless setting, this interference is not symmetric. But assumption of symmetry is OK in the algorithm because if set A interferes with set B, A & B both should get different colors if they are selected. The interference among sets is not transitive. Set A may interfere with set B (A’s users may interfere with B’s base station) and set B may interfere with set C (B’s users may interfere with C’ base station) but that does not mean that set A will interfere with set C. The current example is showing that two sets interfere if they intersect. This is just a coincidence. I will have a pre-generated look up table that shows which set interferes with which set before the algorithm starts. This pre-generated table will be an input to the optimization problem

Asked By : Nazmul

Answered By : D.W.

General advice

The question is confused about what you are looking for. There is obviously no hope for a polynomial-time algorithm that finds the optimal solution to this optimization problem. This problem is at least as hard as weighted set covering (without coloring restrictions). Weighted set covering is NP-hard. Therefore, your problem is NP-hard. Therefore, any general polynomial-time algorithm that always outputs the optimal solution to your optimization problem would imply that $P=NP$ (which seems unlikely). Obviously, if the question is confused about what you’re looking for or if you are unable to communicate it in the language that computer scientists understand, then that reduces the likelihood that anyone will be able to answer your question in a way that helps you. We’re not mind-readers. Based on your comments, I suspect that what you actually want is an approximation algorithm for your problem, or a heuristic algorithm. So, here is my advice, based upon my inference of what I suspect you actually want (as opposed to what you said you want). First, I suggest that you learn about approximation algorithms and heuristic algorithms. Learn the terminology, learn some basic results, learn some of the standard examples of approximation algorithms. This will help you communicate with others and be able to express your requirements clearly in a language that other computer scientists can understand. Also learn about standard strategies for dealing with NP-completeness (e.g., Dealing with intractability: NP-complete problems). Second, I suggest that you take a closer look at the sort of problem instances typically arise in your application. If you want to maximize the chance you find an approximation algorithm or heuristic algorithm that is useful in your application scenario, you need to analyze the parameters and properties of the problem instances that arise in your application. What are typical values of $k,m,n$? That’s the sort of question you should be asking (and communicating in the question). Without that kind of information, I don’t think anyone here can predict what the best approach to your particular problem might be. However, I’ll suggest some candidate approaches that you could try out.

SAT solvers

You can formulate (the decision problem version of) your problem as a SAT instance and then try solving it with an off-the-shelf SAT solver. We’ll solve the decision problem version: given $u$, we’ll test whether there exists a solution of total cost at most $u$. Introduce boolean variables $x_1,dots,x_k$. The intent is that $x_i=1$ means that the $i$th set $S_i$ is selected, and $x_i=0$ means that it is not. Also introduce variables $c_1,dots,x_k$, where $c_iin {1,dots,m}$ represents the color assigned to the $i$th set (if it is selected). Note that each $c_i$ can be represented as a number of boolean variables using standard techniques. Finally, introduce variables $y_0,y_1,dots,y_k$, where $y_i$ is an integer that represents the total cost of all sets that were selected from out of the first $i$ sets. Each $y_i$ can be represented as $lceil lg (u+1) rceil + 1$ bits, using binary encoding. Now we’re going to add some constraints to capture the requirements for a solution to be valid. For each element $u in U$, we’ll add a constraint to require that element $u$ is covered by at least one of the selected sets: $$bigvee_{u in S_i} x_i.$$ (Here the logical OR is taken over all sets $S_i$ that contain $u$; at least one of them must be selected.) Also, for each pair $i,j$ of sets that interfere, we’ll add a constraint to require that they have been given different colors: namely, $c_i ne c_j$. Finally, we’ll add constraints to define the $y_i$’s: $y_0=0$, and for each $i=1,dots,k$: $$y_i = y_{i-1} + begin{cases} text{cost}(S_i) &text{if $x_i=1$} 0 &text{otherwise} end{cases}.$$ Finally, we’ll add the constraint that the total cost is at most $u$: namely, $y_k le u$. To deal with overflow, it helps to add constraints that $y_i le u$ for all $i$ and to make sure you’ve allocated enough bits for each $y_i$ so that you can encode all integers in the range $0,1,dots,2u$. Notice that all of these constraints can be encoded in SAT (i.e., as CNF clauses) in a straightforward way. Some SAT solver front-ends, such as STP, even provide support to make it easy to express these: they’ll convert the integer constraints to CNF by bit-blasting. Now that you’ve formed a CNF formula representing the assertion that there exists a solution of total cost $le u$, you feed it to a SAT solver and see whether it can find a satisfying assignment. If it can, then you know the optimal solution costs at most $u$; if it can’t, you know the optimal solution costs at least $u+1$. Now you can use binary search to find the optimal solution. This is not an approximation algorithm: this algorithm finds the exact optimum. Of course, it is a heuristic. SAT is NP-complete, so there is no guarantee that this algorithm will run in polytime (and we can be pretty sure that on some problem instances, it won’t). However, SAT solvers are pretty good in practice, and so this might work pretty well on problem instances that arise from practice. This formulation is a straightforward combination of the formulation of weighted set cover as a SAT problem, and the formulation of graph coloring as a SAT problem (see e.g. here). If you are having trouble following this, study other reductions to SAT and other ways of encoding standard problems as SAT instances.

Integer linear programming

Another approach would be to formulate this as an integer linear program, then feed it to an off-the-shelf ILP solver. The approach is very similar to the formulation in SAT. You can use similar variables to that listed in my SAT formulation above. Boolean variables can be encoded as integer variables that are restricted to be in the range $0 le cdot le 1$. Boolean operations on boolean variables can be expressed using the techniques shown here: Express boolean logic operations in zero-one integer linear programming (ILP) I recommend encoding the $c_i$’s as a one-hot encoding, i.e., introduce boolean variables $c_{i,j}$ where $c_{i,j}=1$ means that the $i$th set (if selected) is assigned the color $j$. Then you can add the constraints $$sum_j c_{i,j} = 1 text{ for each $i$}$$ and $c_{i,j}+c_{i’,j} le 1$ for each pair of sets $i,i’$ that interfere and each color $j$. There’s no need for the variables $y_0,dots,y_k$ or to limit yourself to the decision version of the problem. Instead, you can make the objective function be $$sum_i text{cost}(S_i) x_i,$$ which is a linear function of the variables (as required). Now feed it to an ILP solver, and see what the best solution it can find is.

Naive greedy algorithm

There’s a straightforward extension of the greedy algorithm, that could be used for your problem. It will only sometimes find a solution, so it is incomplete; other times, it will fail to find any valid solution. In particular, at each point in time, we will have a partial solution (some sets that can be colored with $le m$ colors). We will incrementally select a set to add to the partial solution greedily. We can prioritize the sets as follows, to help us select the next set. The priority of $S_i$ is the number of additional elements of $U$ that would be covered if we selected $S_i$ (over and above the number of elements that have already been covered by our current partial solution), divided by the cost of $S_i$. Given a partial solution, we find a coloring for the sets (using some heuristic for graph coloring that tries to color them with as few colors as possible), then for each $i$ we see whether adding $S_i$ would increase the number of colors above $m$. If we can’t find a way to color the current selection + $S_i$, then we throw away $S_i$. Out of the remaining unselected sets, we choose the set with the highest priority. We repeat until we can make no further progress. If this process ends with a selection that covers all of $U$, then we have found a candidate solution. It may not be optimal (it probably won’t be), but it is a solution. If it ends without covering all of $U$, then the algorithm failed.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/19503