Asked By : John Swoon
Answered By : Gilles
- One that you encounter everyday when working with computers is the filesystem¹: there’s a root directory, subtrees are subdirectories, and there are subtrees which have no children and which are regular files; the children of a node are identified by a name (so it’s not “first child”, “second child”, etc. but “child named hello.txt, child named pictures”, etc.).
- An example from real life is the section structure of a book: there’s the book, which is divided into chapters, which are divided into sections, which are divided into subsections, etc.
- One type of tree data structure is the trie, which stores information indexed by a string. Each node has one child per character in the alphabet. The information corresponding to a string is located by starting from the root, going to the child corresponding to the first character of the string, then to the child of that child corresponding to the second character, etc. For example the string abc would be stored three levels deep, going down from the root to the a child, then to its b child, then to its c child.
You’ll notice that in the filesystem and in the book sections example, information can be inserted anywhere. I can store my files in whichever directory I want; I can add a section to the chapter of my choice. In contrast, the structure of a trie is completely rigid: there’s only a single place in the tree where abc can go. A binary tree is a tree where each node happens to have at most two children. The two children are typically known as the left and right child. There is a common variation on this definition that defines a binary tree to have two types of nodes: internal nodes which must have exactly two children, and leaves which have no children. A binary search tree is a type of tree data structure. Like a trie, it imposes a constraint on where the information must be stored. A trie indexes information by a string; a search tree indexes information that is ordered: it can be numbers, strings, etc. In a binary search tree, the data in the left subtree of a node is always smaller than the datum in the node itself, and the data in the right subtree of a node is always smaller than the datum in the node. If you try to add element into an existing search tree, there’s a single way to do that without changing the current structure of the tree: going down from the root, take the left child if the node is larger than the element to add, take the right child if the node is smaller, and keep going until you get to a child that doesn’t exist, and create that child. However, the structure of search trees is less rigid than the structure of tries: there are different search trees that contain the same data. For example, here are three binary search trees containing the integers 2, 4, 5, 8, 9 (there are of course many more):
8 4 2 / / 4 9 2 5 4 / / 2 5 8 9 5 9 / 8
In the worst case, inserting an element in a search tree requires traversing the whole height of the tree, which as the last example above show can be as much as the number of elements of the tree. The leeway in the tree structure allows one to reshape the tree when adding or removing elements from it. This is called balancing. Balancing is done in order to keep the height small, which keeps operations such as search, insertion and deletion efficient. There are several variants of balanced search trees, where the balancing operations is designed to keep the height of the tree logarithmic in the number of elements of the tree: $h = O(log(n))$. The basic principle of a balanced search tree is to keep some information about the height of the subtree in each node; when the height difference between the two children of a node is too large, some nodes are moved from the deeper subtree to the shallower subtree to rebalance. The rebalancing operation costs $O(1)$ at each node and is performed over the path to the element that’s being inserted or removed, so its cost over the whole tree is $O(h) = O(log n)$. This way, search, insertion and deletion in a balanced search tree cost $O(h)$ to find the target location, plus $O(h)$ to rebalance, which is $O(h) + O(h) = O(log n)$ total. ¹ In practice filesystems can be more complex than what I’m talking about here. Filesystems with hard links aren’t even trees.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/35361