Topics

In this lesson, we are going to introduce the Sorted_List ADT. The topic includes:

Sorted_List ADT
Use of the Sorted_List ADT
Using Python lists to implement Sorted_List ADT
The Binary Search

Required Tasks

The Sorted_List ADT
Using the Sorted_List
Implementation of the Sorted_List ADT

Required Tasks

Complete the Sorted_List tasks of Lab 3.
Complete and submit Assignment 4.

Learning Outcomes

By the end of this lesson, you should be able to

Define a Sorted_List ADT
Implement the array-based Sorted_List ADT

Introduction

In this lesson, we are going to learn in more detail about the Sorted_List. We talk about how the methods of this ADT are similar to and differ from that of the List. We will discuss Array data structures and explain how we can implement the Sorted_List ADT using the Python lists.

The Sorted_List ADT

A Sorted_List is an ADT that contains a collection of values in sorted order. The values are ordered by some inherent characteristics of the values, so the values themselves determine where they are stored in the list. The values in a Sorted_List must always be sorted.

Sorted_List ADT Methods

Many of the Sorted_List ADT methods and operators are the same as the List ADT methods and operators such as initialize(), len(), remove(key), find(key), is_empty(), count(key), min(), max(), clear(), index(key), l[i], and in. However, insert is defined differently:

insert(value): adds value to the Sorted_List at the appropriate (sorted) position. The new value must be added after any equivalent values.

A new element should be inserted so that the Sorted_List remains sorted. That's why the Sorted_List insert lacks the index parameter of the List insert.

Stop and Think

Do you think append can be considered a method for Sorted_List ADT? How about reverse?

Not all methods make sense in both the List ADT and the Sorted_List ADT. For example, the reverse method is fine for the List ADT but violates the basic Sorted_List rule that values must always be in sorted (not reverse sorted) order. In addition, the append method does not make sense in the Sorted_List ADT since we cannot just add an arbitrary at the end of a Sorted_List. Any new values must be placed in a position so that the Sorted_List remains sorted.

Array-Based Sorted_List

Part of the point of a Sorted_List is to make use of the fact that its contents are sorted in its method algorithms. Thus, even the methods that return the same result as the equivalent method in the List ADT may be implemented with a significantly different algorithm, and these algorithms may have a different efficiency than that of its List counterpart.

Sorted Data Algorithms

We provided the code for the count method in the array-based List ADT:

List count

    
    def count(self, key):
        n = len(self._values)
        i = 0
        number = 0

        while i < n:
            if self._values[i] == key:
                number += 1
            i += 1
        return number

We can rewrite this code for the Sorted_List to take advantage of the fact that values are in order:

Sorted_List count

    
    def count(self, key):
        n = len(self._values)
        i = self._binary_search(key)

        if i == -1:
            number = 0
        else:
            # Because the list is sorted, all values with the same
            # key are lumped together.
            number = 1
            i += 1

            while i < n and self._values[i] == key:
                i += 1
                number += 1
        return number

The line:
i = self._binary_search(key)
is discussed in the next section. Its purpose is to find the first occurrence of key, and if it exists at all, count the rest of the values that are equivalent to key.

Note that although potentially this algorithm has to do less work than its List version (not every value in the list has to be compared against key like in the List version), it is still \(O(n)\). Why? Imagine a Sorted_List where all the values are the same. In this worst case, every value would have to be compared against key in any case.

Stop and Think

What other methods can be implemented differently in the Sorted_List ADT than in the List ADT to take advantage of the fact the contents are sorted?

Some examples:

The min method of the List ADT must examine every value in the List to determine the minimum value. Hint: in a Sorted_List, where is the minimum value? How many comparisons would it take to find this values?
A similar situation as above holds for the max method.
The _linear_search method can be replaced by a method whose algorithm takes advantage of the fact that the list elements are sorted - this is the _binary_search discussed in the next section.

The Binary Search

In the previous section we've claimed that we can replace the List _linear_search(key) with the Sorted_List _binary_search(key) that takes advantage of the fact the Sorted_List contents are in order. Not only that, but we'll show that it has a better time efficiency than the _linear_search \(O(n)\).

Binary Search Implementations

The Unstable `_binary_search`

We noted that the List _linear_search is a private helper method, used by other methods that need to perform a similar algorithm by using the same code. (find, remove, index, etc.). The Sorted_List _binary_search method provides the same service. However, not all binary search algorithms are the equivalent.

The unstable binary search starts by looking at the value in the middle of the Sorted_List. If that value matches the key, it returns the index of that value. If not, it searches either the left or right half of the list for the key by looking at the middle value of the left or right half, or subarray. At every step until the key value is found, the binary search eliminates the need to search half of the unsearched values. This reduction in the number of comparisons to perform makes this algorithm \(O(\log n)\), which is a significant improvement over the linear search \(O(n)\).

The unstable binary search performs two comparisons with respect to the midpoint, and if both of those comparisons fail, the key and the midpoint value must be equivalent. However, there is no guarantee that the value found this way is the first in the list of that value. This code is:

Unstable Binary Search

def _binary_search(self, key):
        i = -1
        # Index of beginning of subarray to search.
        low = 0
        # Index of end of subarray to search.
        high = len(self._values) - 1

        while low <= high and i == -1:
            # Find the middle of the current subarray.
            # (avoids overflow on large values vs (low + high)//2
            mid = (high - low) // 2 + low

            if key > self._values[mid]:
                # Search the right subarray.
                low = mid + 1
            elif key < self._values[mid]:
                # Search the left subarray.
                high = mid - 1
            else:
                i = mid
        return i

The prohibition symbol is there to make sure that you understand that you are not to use this code. The unstable algorithm is not the best version of the binary search for the purposes of the Sorted_List.

To check that the algorithm in Code Listing 1 does not find the first occurrence of the element in the list , let's try a very simple test: use this algorithm to search the following list for the element 0:

[0, 0, 0, 0, 0]

This version of the binary search returns an index of 2. The length of the list is 5, and therefore the index of the midpoint is 2. The very first set of comparisons finds that the value at that midpoint is 2. The return value is set to 2, the loop stops executing, and the method returns 2. However, the 0 at position 2 is clearly not the first 0 in the list.

Deferred Detection of Equality

To find the first occurrence of an element in the list, the binary search uses the deferred detection of equality concept. It does not check to see if the current value being examined is equivalent to the element being searched until after exhausting all possible comparisons - i.e. until the search runs out of subarray midpoints to examine. This is called the stable binary search. The code is:

The Stable Binary Search

def _binary_search(self, key):
    """
    -------------------------------------------------------
    Searches for the first occurrence of element in the sorted list.
    Private helper method - used only by other ADT methods.
    Use: i = self._binary_search(key)
    -------------------------------------------------------
    Parameters:
        key - a key value to search for (?)
    Returns:
        i - the index of the first occurrence of key in
            the list, -1 if key is not found. (int)
    -------------------------------------------------------
    """
    # Index of beginning of subarray to search.
    low = 0
    # Index of end of subarray to search.
    high = len(self._values) - 1

    while low < high:
        # Find the middle of the current subarray.
        # (avoids overflow on large values vs (low + high)//2
        mid = (high - low) // 2 + low

        if key > self._values[mid] :
            # Search the right subarray.
            low = mid + 1
        else:
            # Default: search the left subarray.
            high = mid

    # Deferred test for equality.
    if low == high and self._values[low] == key:
        i = low
    else:
        i = -1
    return i

Searching the [0, 0, 0, 0, 0] array, the binary search in Code Listing 2 finds the first occurrence of 0 at position 0.

Note that the unstable binary search may find a matching value at any time during its comparisons. The stable binary search waits until it exhausts the array before checking for a matching value and therefore, on average, takes more time than the unstable binary search. However, the difference is generally not large, and the stable binary search is generally significantly faster than a linear search. As noted earlier, the stable binary search algorithm is \(O(\log n)\), a significant improvement over the linear search \(O(n)\)

Implementing the `insert` Method

Unlike the List insert method which allows a value to be inserted anywhere in the List, the Sorted_List insert must make sure that the sorted property of the contents is preserved. (And no, you don't want to have to sort the contents of the entire list after every insertion - that is grossly inefficient.) The insert method should be able to make use of a binary search when determining where to insert a new value. However, because the insertion must also be stable, it cannot use the binary search as written: the stable binary search looks for the first occurrence of a matching value (if any), while the insertion is required to find the last occurrence of a matching value (if any), and put the new value to the right of it. This requires a subtle change to the binary search algorithm above:

The Sorted_List insert Method


    def insert(self, value):
        """
        -------------------------------------------------------
        Inserts value at the proper place in the sorted list.
        Must be a stable insertion, i.e. consecutive insertions
        of the same value must keep their order preserved.
        Use: sl.insert(value)
        -------------------------------------------------------
        Parameters:
            value - the value to insert (?)
        Returns:
            None
        -------------------------------------------------------
        """
        # Index of beginning of subarray to search.
        low = 0
        # Index of end of subarray to search.
        high = len(self._values) - 1

        while low <= high:
            # Find the middle of the current subarray.
            # (avoids overflow on large values vs (low + high)//2
            mid = (high - low) // 2 + low

            if value < self._values[mid]:
                # search the left subarray.
                high = mid - 1
            else:
                # Default: search the right subarray.
                low = mid + 1
        self._values.insert(low, value)
        return

Note in contrast to the previous versions of binary search, this version moves to the right subarray for its default search, rather than the left subarray. This makes sense if we wish to find the last occurrence of a matching value (if any), and put the new value to the right of it.

Challenge

The count method described in the previous section as written is an \(O(n)\) algorithm. How could it be rewritten to make it an \(O(\log n)\) algorithm? (In fact, although not covered in this course, it could take advantage of parallel processing in a way that a linear search could not.)

Summary

In this lesson, we introduced the Sorted_List ADT and discussed different approaches to implement the Sorted_ListList ADT using arrays. We discussed how a Sorted_List can be implemented using Python lists. We looked at the Binary Search in detail.

References

CS1027b: Computer Science Fundamentals II, Department of Computer Science at Western University, Notes on the List ADT.
Design and Analysis of Data Structures, Niema Moshiri and Liz Izhikevich, 2018.

Lesson 6: The Sorted List

Topics

Required Tasks

Required Tasks

Learning Outcomes

Introduction

The Sorted_List ADT

Sorted_List ADT Methods

Array-Based Sorted_List

Sorted Data Algorithms

The Binary Search

Binary Search Implementations

The Unstable _binary_search

Deferred Detection of Equality

Implementing the insert Method

Challenge

Summary

References

The Unstable `_binary_search`

Implementing the `insert` Method