B-Tree and how it is used in practice

Problem Detail: I understand what a B-Tree is (I already implemented a B-Tree in Java with insert and delete methods that preserve the invariant). However I do not understand exactly how it is used for example for file systems or databases.

  1. How do you choose the keys in these 2 scenarios? And what is the data?
  2. Are data and keys the same?
  3. I can’t think of a good key that is able to order the data in a way that it is easily accessible. If I want to find an entry containing some string.
  4. how useful would be a key that gives information about the size of each entry? or the alphabetical order? I don’t exactly understand the concept.

I have to add that I don’t understand much of file systems or databases which is probably why I’m confused.

Asked By : Nocta

Answered By : Pseudonym

A B-Tree is a type of dictionary, no more and no less. It can be used to implement a set (e.g. see the interface for java.util.Set for the sort of operations we’re talking about), but is most commonly used to implement a map (ditto for java.util.Map). So let’s just look at maps for a moment. If you think about a linguistic dictionary, it’s ordered by “word”, and associated with each word is a definition. You look up a word, and get a definition. So the context of a map data structure is that you have keys (“words”) and you want to map this to values (“definitions”). B-trees are like other kinds of search tree (e.g. binary search trees) in that keys can be accessed in sorted order. This has an advantage for certain types of query. For example you can do range queries, say if you want to find all entries where the key is between two values (e.g. all words in the dictionary starting with “q”). The main advantage of B-trees over other kinds of search tree is that it is particularly efficient when represented on disk. B-trees are page-structured (meaning they can be implemented on top of fixed-size disk pages; this also avoids fragmentation), and have a wide ply, which minimises the number of disk accesses needed to perform a query. A typical use in a database server is to implement indexes. An index is, once again, much like the index in a paper book. Some books have multiple indexes (e.g. there might be a general index and an index of names of people). Given a key, the index in a book returns a set of pages on where you can find references to whatever is indexed (be it a concept, name, or whatever). A database index typically maps a key to a set of database records. In a typical organisation, there is one data structure in which records are stored, and multiple indexes which map keys to record identifiers. To find records, you query the index, which gives you a collection of record identifiers. Then you can use those record identifiers to find the records in the store. A typical use in a modern filesystem is to implement directories (“folders” if you’re more used to GUIs). A directory is, logically, a map from names (file names or directory names) to filesystem objects. A filesystem object may be a file, or another directory, or what have you. As has been pointed out elsewhere, you might want to do some reading on this. Any “introduction to databases” textbook will probably do.
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/27985

Leave a Reply