Binary Search Tree in C#
it does ... uhm ... Binary Search Tree things
Motivation
I have implemented the binary search tree library accompanied by this blog post since there is no binary search tree available in the .NET library. If you are looking for a balanced binary search tree, the SortedSet<T>
, which internally uses a red-black tree (see SortedSet.cs), might suit your needs. Consider, however, that this .NET class does not expose a classic tree interface.
But even an algorithm like a balanced tree or AVL tree, I don't use in my own programs unless I know that it' going to be a really big tree. [...] I Use an ordinary binary search tree with a little trick for randomizing it that I just put in. - Donald Knuth in Coders at Work
Introduction
A binary search tree (BST) is one of the most fundamental data structures in computer science. While its definition is rather simple it can be used as a sorted set with logarithmic runtimes for insertion, deletion and search.
Each node in a binary search tree has at most two children, which are called left child and right child. Because a tree is recursive in nature, these child nodes can also be interpreted as the left subtree and the right subtree, respectively. Moreover, the root of the tree is the node that has no parent and nodes without children are called leaves.
A binary search tree is a binary tree with nodes that have comparable keys and conform to the following rules:
- The key of a node must be larger then all the keys in the left subtree
- The key of a node must be smaller then all the keys in the right subtree
In practice you often want to store complex objects (items) on the nodes instead of primitive keys. Hence, it is a natural design decision to make the binary search tree data structure generic.
Note that the rules above imply that duplicate keys are not allowed in a binary search tree. Fortunately, this restriction does not pose any problems. If several items with the same key have to be stored, you can simply store a list of items on each node.
The exact shape of a binary search tree depends on the order of insertion. Consider the following trees:
Both are valid binary search trees. The first one is a well-shaped tree; its height \(h\) scales logarithmically with the number of nodes \(n\), \(h \in \mathcal{O}(\log n)\). The second tree resembles a linear structure and it is fair to call the tree degenerate. Its height scales linearly with the number of nodes.
The height of the tree is crucial because the runtime (time complexity) of the binary search tree operations is in \(\mathcal{O}(h)\). There are self-balancing binary search trees that ensure that the tree is always well-shaped, e.g. the AVL tree and the Red-black tree.
Whether an ordinary binary search tree with explicit rebalancing operations or a self-balancing tree is preferable depends on the problem and data at hand. Performing measurements with both can give you valuable insights into your problem and lead to a better decision.
When you decide to use a binary search tree it is a good idea to randomize the order of new items before inserting them. In this way the construction of a degenerate tree can be (most likely) avoided and the tree will have an average height in \(\mathcal{O}(\log n)\).
Operations
The most important operations on a binary search tree are Search
, Insert
and Delete
. For a well-shaped tree these operations have a time complexity in \(\mathcal{O}(\log n)\).
Search
The implementation of the search
-method can be directly derived from the rules defining a binary search tree (see above). It can be achieved as follows.
Start the search at the root node and compare its key to the key you search for. If they are equal, the node of interest has been found. If the key you are looking for is smaller than the key of the node, continue the search in the left subtree, otherwise continue the search in the right subtree. The search ends either if a node with the same key has been found or if there is no more subtree to dive into. In the latter case a node with the given key does not exist.
Insertion
Once we know how to search for a key, the insertion of a new node is rather simple. First, search for the key of the new node in the binary search tree (pretend that it exists). Secondly, add the new node into the position where the search (unsuccessfully) ended.
Deletion
In order to delete a node with a given key we can once again start with a search for the key in the tree. If the found node has no children (it is a leave node), we can simply remove the node from the tree. If it has only one child, replace the node with its child. On the other hand, if the found node has two children, things get a little bit more complicated. In this case the idea is to replace the node with its successor, which is the node with the smallest key in the right subtree. In order to accomplish that the successor node has to be found and deleted first. Since this node has the smallest key in the subtree, it has no left child and can hence be deleted via the cases already discussed.
Tree Traversals
Important algorithms on trees are tree traversals. Particularly, the in-order traversal is used for retrieving all the nodes sorted by their key. The recursive version of the in-order traversal can be straightforwardly derived from the definition of a binary search tree: Before a node is visited, visit its left subtree. This left subtree contains all the nodes with keys smaller then the current node. After the visit of the current node visit its right subtree. Here is the code:
public static void InOrderTraversal<T>(
this Node<T>? tree,
Action<Node<T>> onVisit)
{
if (tree != null)
{
InOrderTraversal(tree.Left, onVisit);
onVisit(tree);
InOrderTraversal(tree.Right, onVisit);
}
}
If the onVisit
-Method is moved to the beginning or the end of the recursive traversal calls, the tree is traversed in pre-order or post-order fashion, respectively:
onVisit(tree);
PreOrderTraversal(tree.Left, onVisit);
PreOrderTraversal(tree.Right, onVisit);
PostOrderTraversal(tree.Left, onVisit);
PostOrderTraversal(tree.Right, onVisit);
onVisit(tree);
The different methods of traversal are illustrated in the following figure:
The black path corresponds to a depth-first search and the different traversal methods are related to the time when a node is visited. When visited on the first encounter (orange), the result is a pre-order traversal. When visited on the last encounter (blue), the result is a post-order traversal. An in-order traversal results when the node is visited in between (green).
The implementations discussed above are recursive in nature. They can be understood easily but may require significantly more memory than non-recursive versions, depending on the height of the tree. The non-recursive traversal methods use stacks and queues and are also part of the published library.
Conclusion
A binary search tree is a fundamental data structure for storing items that has very likely a good performance for both, search and update. Very likely in the sense that the runtimes scale with the height of the binary search tree, which, for randomized inserts, is very likely proportional to the logarithm of the number of items. A binary search tree can be used whenever you have to store and query a sorted set of items. It can serve a range of items and the tree can be, if necessary, rebalanced. Since the .NET library is lacking such a data structure I have implemented a Binary search tree library in C# that may be of use for you in one case or another.
References
Thank you for reading.
If you have any questions, thoughts, or feedback you'd like to share,
please feel free to connect with me via mail.
For more updates on upcoming blog posts and library releases, you may follow me on
Twitter.
Happy Coding!
- Kevin