Lesson 13: Binary Search Trees

Topics

In this lesson, we are going to study tree linked data structures, in particular the Binary Search Tree (BST). Topics include:

  1. BST concepts
  2. More on recursive methods with auxiliary methods
  3. BST insertion, deletion, and traversal

Required Tasks

  1. Complete and submit the tasks of Lab 8.
  2. Complete and submit the tasks of Assignment 8.

Learning Outcomes

By the end of this lesson, you should be able to

  1. Understand the concepts behind BSTs.
  2. Implement various BST methods.

Introduction

In this Lesson we discuss in detail the linked Binary Search Tree (BST) data structure and approaches to manipulating tree-linked data structure nodes.

The Binary Search Tree (BST)

The Binary Tree

A binary tree is a non-linear data structure. A binary tree is either the empty tree or a node that has left and right subtrees that are binary trees. (This is a recursive definition.)

The Binary Search Tree

A binary search tree (BST) is a special form of a binary tree. Its properties are:

Although BSTs can be implemented with an array, it is generally easier conceptually to implement them with a linked structure. Each node in the linked structure contains two links, left and right, which point to the left and right subtrees (or children) of any given node. The node also contains the height of the node's subtree. The height of any tree or subtree is defined as the maximum height of its children, plus one.

Sample BST

The elements of each node are:

l
link to the left child node
r
link to the right child node
green circle
data stored in node
number above green circle
height of node subtree

Note that the empty left and right links actually point to None. They have been left out so as not to clutter the figure.

The BST Implementation

The _BST_Node Class

Each BST node is slightly more complex than a singly linked node. Each node contains a copy of the value it stores, links to its left and right children, if any, and the height of its subtree.

The _BST_Node Class Implementation
    
class _BST_Node:

    def __init__(self, value):
        """
        -------------------------------------------------------
        Initializes a BST node containing value. Child links
        are None, height is 1.
        Use: node = _BST_Node(value)
        -------------------------------------------------------
        Parameters:
            value - value for the node (?)
        Returns:
            A _BST_Node object (_BST_Node)
        -------------------------------------------------------
        """
        self._value = deepcopy(value)
        self._left = None
        self._right = None
        self._height = 1

    def _update_height(self):
        """
        -------------------------------------------------------
        Updates the height of the current node. _height is 1 plus
        the maximum of the node's (up to) two child heights.
        Use: node._update_height()
        -------------------------------------------------------
        Returns:
            None
        -------------------------------------------------------
        """
        if self._left is None:
            left_height = 0
        else:
            left_height = self._left._height

        if self._right is None:
            right_height = 0
        else:
            right_height = self._right._height

        self._height = max(left_height, right_height) + 1
        return

  

The _update_height method changes the height of a node whenever changes are made to the contents of the node's subtree. It does not return a value as it is updating its own private _height attribute.

The BST Class

The BST class constructor is very simple, containing only a root node initialized to None and a node count.

The BST Class Implementation
    
class BST:

    def __init__(self):
        """
        -------------------------------------------------------
        Initializes an empty BST.
        Use: bst = BST()
        -------------------------------------------------------
        Returns:
            A BST object (BST)
        -------------------------------------------------------
        """
        self._root = None
        self._count = 0

  

Searching a BST

We have not yet seen the BST insertion method, but we can understand it better if we understand how to search for a value in a BST. Searching a BST can be done either recursively or iteratively. We will examine the iterative algorithm here - we will soon see examples of BST methods that are far more easily done recursively than iteratively.

To understand how to search a BST we have to understand how nodes are order in the tree. As noted above:

for each node N, the keys in the left subtree of N are less than the key of N, and the keys in the right subtree of N are greater than the key of N

This means that each time we look at a node, if the value in the node does not match the key value we must look to either the left or right children of the node. The direction chosen depends on the comparison between the current node value and the key. The algorithm is:

Remember that we can treat each node as the root of its own tree. The following diagram shows the route through the tree taken to search for the value 8:

Searching the BST For the Value 8

The iterative algorithm for the search works nicely because we can treat the path through the BST as a singly-linked list. We never have to search both the left and right subtrees - the choice of the next subtree to search is determined strictly by the comparison of a node value to the search key.

What is the time complexity of this algorithm? The hint is to note that after each value-key comparison, potentially half of the remaining tree can be ignored. The key word here, however, is potentially. In a well-balanced BST where each level is filled with nodes before the next level has nodes attached, the search method looks like a binary search, and requires \(O(\log n)\) time on average.

The BST retrieve Implementation
    
    def retrieve(self, key):
        """
        -------------------------------------------------------
        Retrieves a copy of a value matching key in a BST. (Iterative)
        Use: v = bst.retrieve(key)
        -------------------------------------------------------
        Parameters:
            key - data to search for (?)
        Returns:
            value - value in the node containing key, otherwise None (?)
        -------------------------------------------------------
        """
        node = self._root
        value = None

        while node is not None and value is None:

            if node._value > key:
                node = node._left
            elif node._value < key:
                node = node._right
            elif node._value == key:
                # for comparison counting
                value = deepcopy(node._value)
        return value

  

BST Insertion

Inserting Into a BST

To insert into a BST we perform some of the same steps that we do when searching for a key in the tree. However, nodes are always inserted as leaves, i.e. at the bottom of the tree below an existing node, never in the middle of the tree. The important thing to understand, then, is how to find the correct position of the new value within the existing tree: following the search rules will always get us to the proper location within the tree. Once we arrive there we create a new node that becomes the root of a new subtree within the BST.

In the following example we wish to insert the value 13 into the BST. The value 13 ends up in a node that is the right child of the node containing the value 12:

Inserting the Value 13 Into the BST

The node heights in red are the node heights that are updated by the insertion of the new value.

It should be clear that the shape of a BST is determined by the insertion order of the values in the tree. A nicely balanced tree requires that some thought be put into the order in which items are inserted into the tree. We will examine this in lab.

The last consideration to take into account is what to do with duplicate values? At first we will work with trees that do not allow duplicates. Duplicate values are simply ignored. We still have to search the tree to determine if there is a duplicate, but if we find one in the tree we can stop searching and exit the insertion method. There are a number of approaches to working with trees that allow duplicates, and we will examine some of them later.

Why Recursion?

At first glance, it seems that an iterative algorithm would perform a node insertion quite nicely. It would look a great deal like the retrieve algorithm, except that it would fail if a duplicate value were found, and add a node to the bottom of the tree if no duplicate existed. However, note that when inserting 13 into the sample tree, the heights of the nodes containing 12 and 15 are incremented by one to take into account the fact that their maximum subtree heights have increased by one with the addition of the new child node. These heights cannot be updated from the top down, since the lower nodes must have their heights updated before their parent nodes. The iterative insertion algorithm, which works from the top down, does not lend itself to an easy solution for this.

A recursive solution allows us to update a tree both from the top down and the bottom up. We can find the location of the new node on the way down the tree, and update the node heights on the way back up the tree. The insertion code involves the following three methods:

_update_height is a private _BST_Node method that simply takes the maximum height of its two child nodes and adds one to it to get its new height. (See The Binary Search Tree - Code Listing 1).

insert is a public BST method that takes a value as a parameter and attempts to insert that value into its proper place in the BST. If it succeeds, it returns True, and False otherwise. insert does not do much work as its only job is to call a private auxiliary method that does the actual work. The auxiliary method is required because it requires two parameters, a node and a value, rather than the single value parameter required by insert. The auxiliary method also returns a node as well as the insertion success flag. This node updates the root node of each subtree.

Because insert_aux is recursive it allows work to be performed on the way down the tree as well as back up the tree. On the way down the tree the method searches for the proper place to insert the new node. This search has two base cases: the first when it reaches None, and is therefore at the bottom of the tree and ready to insert a new node; and the second when it finds a node that already contains a duplicate value, and therefore the insertion fails. The two general cases simply move further down the tree, either to the left or right depending on the value being inserted.

On the way back up the tree insert_aux updates the height of each node that it has already traversed all the way back to the root node. Because it is performing these height updates from the bottom up, the updated heights correctly reflect the addition of the new node at the bottom of the tree. Note that only the nodes traversed on the way down the tree can possible be affected by these new heights - the nodes on the bypassed halves of the subtrees remain unchanged.

The BST insert Implementation
    
    def insert(self, value):
        """
        -------------------------------------------------------
        Inserts a copy of value into the bst. Values may appear
        only once in a tree.
        Use: b = bst.insert(value)
        -------------------------------------------------------
        Parameters:
            value - data to be inserted into the bst (?)
        Returns:
            inserted - True if value is inserted into the BST,
                False otherwise. (boolean)
        -------------------------------------------------------
        """
        self._root, inserted = self._insert_aux(self._root, value)
        return inserted

    def _insert_aux(self, node, value):
        """
        -------------------------------------------------------
        Inserts a copy of value into the bst. Values may appear
        only once in a tree.
        Private recursive operation called only by insert.
        Use: node, inserted = self._insert_aux(node, value)
        -------------------------------------------------------
        Parameters:
            node - a bst node (_BST_Node)
            value - data to be inserted into the node (?)
        Returns:
            node - the current node (_BST_Node)
            inserted - True if value is inserted into the BST,
                False otherwise. (boolean)
        -------------------------------------------------------
        """
        if node is None:
            # Base case: add a new node containing the value.
            node = _BST_Node(value)
            self._count += 1
            inserted = True
        elif value < node._value:
            # General case: check the left subtree.
            node._left, inserted = self._insert_aux(node._left, value)
        elif value > node._value:
            # General case: check the right subtree.
            node._right, inserted = self._insert_aux(node._right, value)
        else:
            # Base case: value is already in the BST.
            inserted = False

        if inserted:
            # Update the node height if any of its children have been changed.
            node._update_height()
        return node, inserted

  

BST Insertion: Call Trace

A Call Trace is simply a visual representation of a series of recursive function calls. Each call is indented to show the ownership of the call and to show the differences between the base and general cases. Return values, if any, are also shown. The following is a call trace of the insertion of 13 into the sample tree. Node values are enclosed in {}.

Inserting 13 Call Trace
insert(13)
    _insert_aux({11}, 13)
        _insert_aux({15}, 13)
            _insert_aux({12}, 13)
                _insert_aux(None, 13)
                    # Base case - new node inserted.
                    _BST_Node(13)
                        return {13}
                    {13}._update_height()
                    return {13}, True
                {12}._update_height()
                return {12}, True
            {15}._update_height()
            return {15}, True
        {11}._update_height()
        return {11}, True
    return True
  

Visually, the increasing indentation in the call trace shows the search being done down through the tree. The decreasing indentation shows the recursive calls finishing and working their way back up through the tree. When the recursive calls return to the root node, all of the subtrees that could possibly be affected by the value insertion have had their heights updated.

BST Deletion

Deleting From a BST

Inserting nodes into a BST is fairly straightforward. Deleting a node from a BST is not. Deleting a node may cause major shifts in the positions of the rest of the nodes in the tree. We will look at deleting nodes from various positions in the tree and how that affects the structure of the tree. Starting with the BST:

Sample BST

If we wish to delete the node containing the value 18, it is fairly straightforward. First, we have to find the node to delete, keeping track of its parent in much the same way that we kept track of the previous node when removing a value from a singly-linked list. Note, however, we have to keep track of whether we are removing the parent node's left or right child.

There are three types of BST node deletion:

Deleting a Node with No Children

This is the simplest form of node deletion. Once the node is found and its value extracted, then the parent's left or right link (as appropriate) is set to None. In the following example the node containing the value 18 is deleted from the BST:

BST Deletion: No Children

The key to successfully deleting this node, as with deleting any node, is keeping track of the parent to be deleted. In this case it is sufficient to set the right link of the node containing the value 15 to None.

Deleting a Node with One Child

Deleting a node with one child is not much more difficult than deleting a node with no children. Whether the child is to the left or right, or whether the child has children of its own doesn't matter. The child node simply takes the place of the node being removed. The deleted node's parent's left or right link (as appropriate) is updated to point to its grandchild. In the following example the node containing 15 is removed from the BST:

BST Deletion: One Child

The node containing 12 is moved up to take the place of the deleted node. Again, as with other linked structures we have looked at, we do not move the data, we only change the node links.

Deleting a Node with Two Children

The rule for deleting a node with is simple, though the implementation is not: replace the deleted node with the node containing the maximum value in its left subtree. The reason for this rule is simple: the maximum value in the left subtree is, by definition, larger than all of the other values to the left of the deleted node. The maximum value in the left subtree of the deleted node is also smaller than all of the values in the right subtree of the deleted node. It has to be - it is to the left of the deleted node after all. This means that when this node takes the place of the deleted node, the BST retains its key property: all values in the left subtree are smaller than the root of the subtree, and all values in the right subtree are larger than the root of the subtree.

Finding the maximum value in the left subtree is simple. We move to the left child of the node to be deleted, then keep moving to the right of that child until there are no nodes left, i.e. we reach a node that has no right child. This is the node that we will use to replace the deleted node.

In our sample tree, if we remove the root node with the value 11, it will be replaced by the node containing the value 9. We would find this node by moving to the 7 node (the left child of 11), then moving to the right until we reach None. The 9 node is to the right of the 7 node, and the 9 node has no right child, so it must be the node with the maximum value of the left subtree of the 11 node. All of the nodes above 9 may have to have their heights adjusted appropriately.

BST Deletion: Two Children - Step 1

When the 9 node takes the place of the 11 node as the new root it is clear that the BST retains its key property: 9 is larger than any value in its left subtree and smaller than any value in its right subtree.

The only difficulty is that the 9 node has a child: the 8 node. However, removing the 9 node is no different than removing a node with a single child - the child node moves up to take the place of its parent. In this case the right link of the 7 node must now point to the 8 node, while the 9 node must change its left and right links to point to the 7 and 12 nodes respectively. (If the 9 node had no children, it would be even easier to move it.)

BST Deletion: Two Children - Step 2

As in our other examples, the parent of the removed node must have its left or right link (as appropriate) updated to point to the new node. In this particular case the node being removed is the root node, so there is no parent to update.

Recursive Removal

The search portion of node deletion is very similar to the search portion of node insertion. There are two base cases: when the key is found in the tree, or when the search reaches the bottom of the tree and no match is found. There are two general cases when the value is less than or greater than the value in the node being searched. If a node is found and deleted, all of the nodes above the deleted node must have their heights updated. The _delete_node method performs the actual node deletion according to the cases discussed above.

(Note that in code presented in this course all changes are performed on the nodes, rather than on the values in the nodes. A alternate approach would be to move values between nodes in order to preserve the BST structure. We will not be using this approach in this course.)

The BST (partial) remove Implementation
    
    def remove(self, key):
        """
        -------------------------------------------------------
        Removes a node with a value matching key from the bst.
        Returns the value matched. Updates structure of bst as
        required.
        Use: value = bst.remove(key)
        -------------------------------------------------------
        Parameters:
            key - data to search for (?)
        Returns:
            value - value matching key if found, otherwise None.
        -------------------------------------------------------
        """
        assert self._root is not None, "Cannot remove from an empty BST"

        self._root, value = self._remove_aux(self._root, key)
        return value

    def _remove_aux(self, node, key):
        """
        -------------------------------------------------------
        Attempts to find a value matching key in a BST node. Deletes the node
        if found and returns the sub-tree root.
        Private recursive operation called only by remove.
        Use: node, value = self._remove_aux(node, key)
        -------------------------------------------------------
        Parameters:
            node - a bst node to search for key (_BST_Node)
            key - data to search for (?)
        Returns:
            node - the current node or its replacement (_BST_Node)
            value - value in node containing key, None otherwise.
        -------------------------------------------------------
        """
        if node is None:
            # Base Case: the key is not in the tree.
            value = None
        elif key < node._value:
            # Search the left subtree.
            node._left, value = self._remove_aux(node._left, key)
        elif key > node._value:
            # Search the right subtree.
            node._right, value = self._remove_aux(node._right, key)
        else:
            # Value has been found.
            value = node._value
            self._count -= 1
            # Replace this node with another node.
            if node._left is None and node._right is None:
                # node has no children.

# your code here

            elif node._left is None:
                # node has no left child.

# your code here

            elif node._right is None:
                # node has no right child.

# your code here

            else:
                # Node has two children

# your code here (calls _delete_node somewhere)

        if node is not None and value is not None:
            # If the value was found, update the ancestor heights.
            node._update_height()
        return node, value


    def _delete_node_left(self, parent):
        """
        -------------------------------------------------------
        Finds a replacement node for a node to be removed from the tree.
        Private operation called only by _remove_aux.
        Use: repl_node = self._delete_node_left(node, node._right)
        -------------------------------------------------------
        Parameters:
            parent - node to search for largest value (_BST_Node)
        Returns:
            repl_node - the node that replaces the deleted node. This node
                is the node with the maximum value in the deleted node's left
                subtree (_BST_Node)
        -------------------------------------------------------
        """

# your code here

        return repl_node

  

Implementing these methods is left as a task for you.

Deletion Call Traces

A call trace for deleting the key 18 (a node with no children) from the tree:

Removing 18 Call Trace
remove(18)
    _remove_aux({11}, 18)
        _remove_aux({15}, 18)
            _remove_aux({18}, 18)
                # Base case - value found ({18})
                    return None
                return None, 18
            {15}._update_height()
            return {15}, 18
        {11}._update_height()
        return {11}, 18
    return 18
  

A call trace for deleting the key 15 (a node with one child) from the tree:

Removing 15 Call Trace
remove(15)
    _remove_aux({11}, 15)
        _remove_aux({15}, 15)
            # Base case - value found ({15})
                return {12}
            {12}._update_height()
            return {12}, 15
        {11}._update_height()
        return {11}, 15
    return 18
  

A call trace for deleting the key 11 (a node with two children) from the tree:

Removing 11 Call Trace
remove(11)
    _remove_aux({11}, 11)
        # Base case - value found
        _delete_node({11})
            return {9}
        {9}._update_height()
        return {9}, 11
    return 11
  

Note that the node containing 7 must also have its height updated. Since it does not appear in the call tree, it is clear that it is the responsibility of the _delete_node method to update its height.

BST Traversal

Traversing an Entire BST

We have seen that insertion, retrieval, and deletion - whether iterative or recursive - follows a single path through a BST. This is because all of these methods are in essence performing a search for a single node, the node that is to be inserted, retrieved, or deleted, and every node in a tree has a unique place in that specific tree. But what if we need to visit every node in the tree? Traversing a singly-linked list is straightforward, but how do we to this with nodes that have two non-linear children?

The answer is to use tree recursion. (See Types of Recursion). The general algorithm is very simple:

Imagine that the BST _count attribute is not available, and we wish to count the number of nodes in the following tree:

Sample BST

We'll define a method called count to perform the node count. Because we are going to use recursion, we'll need an auxiliary function, which we'll call _count_aux to perform the actual counting. Both methods are fruitful because clearly we have to return a count value. The count method is:

The BST Traversing count Implementation
    
    def count(self):
        """
        ---------------------------------------------------------
        Returns the number of nodes in a BST.
        Use: number = bst.count()
        -------------------------------------------------------
        Returns:
            number - count of nodes in tree (int)
        ----------------------------------------------------------
        """
        number = self._count_aux(self._root)
        return number

    def _count_aux(self, node):
        """
        ---------------------------------------------------------
        Returns the number of nodes in a BST subtree.
        -------------------------------------------------------
        Parameters:
            node - a BST node (_BST_Node)
        Returns:
            number - count of nodes in the current subtree (int)
        ----------------------------------------------------------
        """
        if node is None:
            # Base case: node does not exist
            number = 0
        else:
            # General case: node exists.
            number = 1 + self._count_aux(node._left) + \
                self._count_aux(node._right)
        return number

  

The traversal requires an auxiliary method because the top-level method call does not allow access to the internal node structure of a BST.

The method is very simple and follows the algorithm noted above: check to see if the node exists, and if it does count that node (1 +), and then call the recursive method on the left (self._count_aux(node._left)) and right (self._count_aux(node._right)) children.

BST Traversing count Call Trace

BST count Call Trace
number = count(11)
  # Count node {11}
  number = 1 + _count_aux({7}) + _count_aux({15})
    # Count node {7}
    number = 1 + _count_aux({6}) + _count_aux({9})
      # Count node {6}
      number = 1 + _count_aux(None) + _count_aux(None)
        # Base Case: left child of 6
        return 0
        # Base Case: right child of 6
        return 0
      # Return count of {6}
      return 1
      # Count node {9}
      number = 1 + _count_aux({8}) + _count_aux(None)
        # Count node {8}
        number = 1 + _count_aux(None) + _count_aux(None)
          # Base Case: left child of 8
          return 0
          # Base Case: right child of 8
          return 0
        # Return count of {8}
        return 1
        # Base Case: right child of 9
        return 0
      # Return count of {9}
      return 2
    # Return count of {7}
    return 4
    # Count node {15}
    number = 1 + _count_aux({12}) + _count_aux({18})
      # Count node {12}
      number = 1 + _count_aux(None) + _count_aux(None)
        # Base Case: left child of 12
        return 0
        # Base Case: right child of 12
        return 0
      # Return count of 12
      return 1
      # Count node {18}
      number = 1 + _count_aux(None) + _count_aux(None)
        # Base Case: left child of 18
        return 0
        # Base Case: right child of 18
        return 0
      # Return count of {18}
      return 1
    # Return count of {15}
    return 3
  # Return count of {11}
  return 8
# Return from main method
return 8
  

The elegance of recursion is that you can think of each node as the root of its own subtree. The height of any given subtree is simply the total of the heights of all its own subtrees, plus 1 to count its root node. Applying this concept to the entire tree gives the height of the entire tree, as the root node of the entire tree is no different than the root of every subtree.

Inorder, Preorder, Postorder, and Levelorder Traversals

In the count method, the order in which the tree nodes are traversed makes no difference. We could walk through the tree by recursing through the right children rather than the left children first, and the resulting count would be the same. We could increment the count between child node visits, or after child node visits, and again, the resulting count would be the same. However, there are some traversal orders that give us information about the data stored in the tree, or tell us something about the order in which data was stored in the tree.

Inorder Traversal

Inorder traversal allows us to walk through the tree and access node data in order - by that we mean we can either print or extract (i.e. copy the data to an array) the data in value order. For our sample tree, and inorder traversal would give us the data back in the following order:

Sample Inorder Traversal Result
6, 7, 8, 9, 11, 12, 15, 18

In short, we are retrieving the tree data in order, thus the name of the traversal. This makes BSTs extremely powerful in terms of doing things like sorting data.

The inorder algorithm is very simple:

Preorder Traversal

Preorder is just as simple:

Thus the data in our sample tree would be printed or extracted in preorder as:

Sample Preorder Traversal Result
11, 7, 6, 9, 8, 15, 12, 18

Preorder is useful in that if we insert data into a BST in preorder, we will produce a tree with a structure identical to that of the tree that we extracted the data from. Try it with the preorder data above: you should end up with a tree that looks the same as the tree at the top of these notes.

Postorder Traversal

Postorder moves the data processing to the end:

Thus the data in our sample tree would be printed or extracted in postorder as:

Sample Postorder Traversal Result
6, 8, 9, 7, 12, 18, 15, 11

Levelorder Traversal

A levelorder traversal returns data according to the level that the data occupies in the tree. Like preorder, inserting data into the tree in levelorder should reproduce the tree. It also helps you visualize the tree. The algorithm is:

Thus the data in our sample tree would be printed or extracted in levelorder as:

Sample Levelorder Traversal Result
11, 7, 15, 6, 9, 12, 18, 8

Summary

In this Lesson we looked at the concept and implementation of the Binary Search Tree. We looked specifically at retrieval, insertion, deletion, and various forms of traversal. We also discussed the role that auxiliary methods take.

References

  1. Data Structures and Algorithm Analysis in C++, Mark Allen Weiss. 2006.