[Solved]: Get the running time of forest disjoint sets

Problem Detail: If you have a forest implementation of disjoint sets, and have the union by weight/rank heuristic where you append the smaller one. Then why is the worst case running time Θ(m log n)? (m is the number of disjoint sets and n is the number of Make Set operations.)

Asked By : omega

Answered By : templatetypedef

You’ve asked two questions:

  • Why is the tree height Θ(log n)?
  • Why is the runtime Θ(m log n)?

This answer addresses both questions. I’ll start off with a review of the ranks of the different trees to go over some basics, then discuss these time bounds. One trick that might be useful here is to think of the minimum number of nodes that can be in a tree of height n. Once we have that insight, we can try to relate the maximum runtime to the maximum height of any tree in the forest. When we initially create all of the trees, they each have height 0 and have a total of 1 node in them. Now, suppose we want to create a tree of height 1. The only way to do that would be to merge together two trees of height 0 to form a new tree of height 1. This would have a total of 2 nodes in it, and that’s the minimum number of nodes possible. Next, suppose we want to create a tree of height 2. To do that, we have to merge together two trees of height 1. Each of those trees has a minimum of 2 nodes in them, so the minimum number of nodes possible in a tree of height 2 would be 4. To make a tree of height 3, we need to merge two trees of height 2, each of which has at least 4 nodes. This means that we would end up with eight nodes in the tree. If we look so far, we see this pattern:

  • Height 0: At least 1 node.
  • Height 1: At least 2 nodes.
  • Height 2: At least 4 nodes.
  • Height 3: At least 8 nodes.

It looks like a tree of height n will have at least 2n nodes in it. This makes intuitive sense – the only way to make a tree of height n + 1 is to merge together two trees of height n, so the minimum number of nodes in a tree of height n + 1 should be at least double the number of nodes in a tree of height n. So now let’s suppose we do a series of n different union operations to link trees together. What is the largest possible tree we can make? Suppose that we want to make a tree of height k. In order to do that, we first need to make two trees of height k – 1. Each of those trees are formed from two trees of height k – 2 each, and those trees are formed from two trees of height k – 3 each, etc. Eventually, we bottom out at trees of height 0. If we work out a recurrence relation, we get the following expression for the total number of union operations required to get at tree of height k:

  • U(0) = 0
  • U(k + 1) = 2U(k) + 1

If we expand out the terms in this recurrence, we get

  • U(0) = 0
  • U(1) = 1
  • U(2) = 3
  • U(3) = 7
  • U(4) = 15
  • U(k) = 2k – 1

In other words, if we want to get to a tree of height k, we need to do 2k – 1 link operations. This means that if we do a total of n union operations, the maximum tree height we can possibly make has height Θ(log n). This explains the runtime bound you were curious about. Creating the disjoint set forest takes time O(m), where m is the number of elements. Each union operation requires us to walk up to the root of a tree whose height is at most Θ(log n), so if we do a total of n of these, the overall runtime is at worst Θ(m + n log n). Note that this is different from the bound you’ve proposed. However, if you make the assumption that m = Θ(n) (that is, you do about the same number of union operations as there are nodes in the graph), then we get that Θ(m + n log n) = Θ(n + n log n) = Θ(n log n) = Θ(m log n), which matches the bound you’ve described. Hope this helps!

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/10759

Leave a Reply