- Instantaneously begin processing of any $k$ workloads on any free node.
- Instantaneously receive confirmation of the completion by a node of a previously initiated batch of $k$ workloads.
- At any point in time, and instantaneously, determine the state of all nodes (free or busy) as well as the number of workloads completed and the number of workloads remaining.
For simplicity’s sake, assume $k$ divides $n$. There are at least two categories of load balancing strategies for minimizing the total execution time of all workloads using all slaves (to clarify, I’m talking about the makespan or wall-clock time, not the aggregate process time, which is independent of the load-balancing strategy being used under the simplifying assumptions being made in this question): static and dynamic. In a static scheme, all placement decisions are made at time $t = 0$. In a dynamic scheme, the master can make placement decisions using information about the progress being made by some slaves, and as such, better utilization can be attained (in practice, there are overheads associated with dynamic scheduling as compared to static scheduling, but we ignore these). Now for some questions:
- Is there a better way to statically schedule workloads than to divide batches of $k$ workloads among the $m$ slaves as evenly as possible (we can also assume, for simplicity’s sake, that $m$ divides $n/k$, so batches could be statically scheduled completely evenly)? If so, how?
- Using the best static scheduling policy, what should the mean and standard deviation be for the total execution time, in terms of the mean $mu$ and standard deviation $sigma$ of $X$?
A simple dynamic load balancer might schedule $i$ batches of $k$ workloads to each slave initially, and then, when nodes complete the initial $i$ batches, schedule an additional batch of $k$ workloads to each slave on a first-come, first-served basis. So if two slave nodes are initially scheduled 2 batches of 2 workloads each, and the first slave finishes its two batches, an additional batch is scheduled to the first slave, while the second slave continues working. If the first slave finishes the new batch before the second batch finishes its initial work, the master will continue scheduling to the first slave. Only when the second slave completes executing its work will it be issued a new batch of workloads. Example:
DYNAMIC STATIC POLICY POLICY slave1 slave2 slave1 slave2 ------ ------ ------ ------ t<0 -- -- -- -- t<1 batch1 batch3 batch1 batch3 batch2 batch4 batch2 batch4 batch5 batch7 batch6 batch8 t=1 -- batch3 batch5 batch3 batch4 batch6 batch4 batch7 batch8 t<2 batch5 batch3 batch5 batch3 batch4 batch6 batch4 batch7 batch8 t=2 -- batch4 batch6 batch4 batch7 batch8 t<3 batch6 batch4 batch6 batch4 batch7 batch8 t=3 -- -- -- batch7 batch8 t<4 batch7 batch8 -- batch7 batch8 t=4 -- -- -- batch8 t<5 -DONE- -- batch8 t=5 -- -- t < 6 -DONE-
For clarification, batches 1 and 2 take 1/2 second each to be processed, batch 3 takes 2 seconds to be processed, and batches 4-8 take 1 second each to be processed. This information is not known a-priori; in the static scheme, all jobs are distributed at t=0, whereas in the dynamic scheme, distribution can take into account what the actual runtimes of the jobs “turned out” to be. We notice that the static scheme takes one second longer than the dynamic scheme, with slave1 working 3 seconds and slave2 working 5 seconds. In the dynamic scheme, both slaves work for the full 4 seconds. Now for the question that motivated writing this:
- Using the dynamic load balancing policy described above, what should the mean and standard deviation be for the total execution time, in terms of the mean $mu$ and standard deviation $sigma$ of $X$?
Interested readers have my assurances that this isn’t homework, although it probably isn’t much harder than what one might expect to get as homework in certain courses. Given that, if anyone objects to this being asked and demands that I show some work, I will be happy to oblige (although I don’t know when I’ll have time in the near future). This question is actually based on some work that I never got around to doing a semester or two ago, and empirical results were where we left it. Thanks for help and/or effort, I’ll be interested to see what you guys put together.
Asked By : Patrick87
Answered By : Alex ten Brink
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/134