In this lesson, we are going to introduce the Sorted_List ADT. The topic includes:
By the end of this lesson, you should be able to
In this lesson, we are going to learn in more detail about the Sorted_List. We talk about how the methods of this ADT are similar to and differ from that of the List. We will discuss Array data structures and explain how we can implement the Sorted_List ADT using the Python lists.
A Sorted_List is an ADT that contains a collection of values in sorted order. The values are ordered by some inherent characteristics of the values, so the values themselves determine where they are stored in the list. The values in a Sorted_List must always be sorted.
Many of the Sorted_List ADT methods and operators are the same as the
List ADT methods and operators such as initialize()
, len()
,
remove(key)
, find(key)
, is_empty()
,
count(key)
, min()
, max()
, clear()
,
index(key)
, l[i]
, and in
.
However, insert
is defined differently:
insert(value)
value
to the Sorted_List at the appropriate (sorted)
position. The new value
must be added after
any equivalent values.
A new element should be inserted so that the Sorted_List remains sorted.
That's why the Sorted_List insert
lacks the index parameter
of the List insert
.
Part of the point of a Sorted_List is to make use of the fact that its contents are sorted in its method algorithms. Thus, even the methods that return the same result as the equivalent method in the List ADT may be implemented with a significantly different algorithm, and these algorithms may have a different efficiency than that of its List counterpart.
We provided the code for the count
method in the
array-based List ADT:
count
def count(self, key):
n = len(self._values)
i = 0
number = 0
while i < n:
if self._values[i] == key:
number += 1
i += 1
return number
We can rewrite this code for the Sorted_List to take advantage of the fact that values are in order:
count
def count(self, key):
n = len(self._values)
i = self._binary_search(key)
if i == -1:
number = 0
else:
# Because the list is sorted, all values with the same
# key are lumped together.
number = 1
i += 1
while i < n and self._values[i] == key:
i += 1
number += 1
return number
The line:
i = self._binary_search(key)
is
discussed in the next section. Its purpose is to find the first
occurrence of key
, and if it exists at all, count the rest
of the values that are equivalent to key
.
Note that although potentially this algorithm has to do less work than
its List version (not every value in the list has to be compared against
key
like in the List version), it is still \(O(n)\). Why?
Imagine a Sorted_List where all the values are the same. In this worst
case, every value would have to be compared against key
in
any case.
In the previous section we've claimed that we can replace the List _linear_search(key)
with the Sorted_List _binary_search(key)
that takes
advantage of the fact the Sorted_List contents are in order. Not only
that, but we'll show that it has a better time efficiency than the _linear_search
\(O(n)\).
_binary_search
We noted that the List _linear_search
is a private helper
method, used by other methods that need to perform a similar algorithm
by using the same code. (find
, remove
, index
,
etc.). The Sorted_List _binary_search
method provides the
same service. However, not all binary search algorithms are the
equivalent.
The unstable binary search starts by looking at the value in the middle of the Sorted_List. If that value matches the key, it returns the index of that value. If not, it searches either the left or right half of the list for the key by looking at the middle value of the left or right half, or subarray. At every step until the key value is found, the binary search eliminates the need to search half of the unsearched values. This reduction in the number of comparisons to perform makes this algorithm \(O(\log n)\), which is a significant improvement over the linear search \(O(n)\).
The unstable binary search performs two comparisons with respect to the midpoint, and if both of those comparisons fail, the key and the midpoint value must be equivalent. However, there is no guarantee that the value found this way is the first in the list of that value. This code is:
def _binary_search(self, key):
i = -1
# Index of beginning of subarray to search.
low = 0
# Index of end of subarray to search.
high = len(self._values) - 1
while low <= high and i == -1:
# Find the middle of the current subarray.
# (avoids overflow on large values vs (low + high)//2
mid = (high - low) // 2 + low
if key > self._values[mid]:
# Search the right subarray.
low = mid + 1
elif key < self._values[mid]:
# Search the left subarray.
high = mid - 1
else:
i = mid
return i
The prohibition symbol is there to make sure that you understand that you are not to use this code. The unstable algorithm is not the best version of the binary search for the purposes of the Sorted_List.
To check that the algorithm in Code Listing 1 does not find the first occurrence of the element in the list , let's try a very simple test: use this algorithm to search the following list for the element 0:
[0, 0, 0, 0, 0]
This version of the binary search returns an index of 2. The length of the list is 5, and therefore the index of the midpoint is 2. The very first set of comparisons finds that the value at that midpoint is 2. The return value is set to 2, the loop stops executing, and the method returns 2. However, the 0 at position 2 is clearly not the first 0 in the list.
To find the first occurrence of an element in the list, the binary search uses the deferred detection of equality concept. It does not check to see if the current value being examined is equivalent to the element being searched until after exhausting all possible comparisons - i.e. until the search runs out of subarray midpoints to examine. This is called the stable binary search. The code is:
def _binary_search(self, key):
"""
-------------------------------------------------------
Searches for the first occurrence of element in the sorted list.
Private helper method - used only by other ADT methods.
Use: i = self._binary_search(key)
-------------------------------------------------------
Parameters:
key - a key value to search for (?)
Returns:
i - the index of the first occurrence of key in
the list, -1 if key is not found. (int)
-------------------------------------------------------
"""
# Index of beginning of subarray to search.
low = 0
# Index of end of subarray to search.
high = len(self._values) - 1
while low < high:
# Find the middle of the current subarray.
# (avoids overflow on large values vs (low + high)//2
mid = (high - low) // 2 + low
if key > self._values[mid] :
# Search the right subarray.
low = mid + 1
else:
# Default: search the left subarray.
high = mid
# Deferred test for equality.
if low == high and self._values[low] == key:
i = low
else:
i = -1
return i
Searching the [0, 0, 0, 0, 0]
array, the binary search in
Code Listing 2 finds the first occurrence of 0 at position 0.
Note that the unstable binary search may find a matching value at any time during its comparisons. The stable binary search waits until it exhausts the array before checking for a matching value and therefore, on average, takes more time than the unstable binary search. However, the difference is generally not large, and the stable binary search is generally significantly faster than a linear search. As noted earlier, the stable binary search algorithm is \(O(\log n)\), a significant improvement over the linear search \(O(n)\)
insert
Method
Unlike the List insert
method which allows a value to be
inserted anywhere in the List, the Sorted_List insert
must
make sure that the sorted property of the contents is preserved. (And
no, you don't want to have to sort the contents of the entire list after
every insertion - that is grossly inefficient.) The insert
method should be able to make use of a binary search when determining
where to insert a new value. However, because the insertion must also be
stable, it cannot use the binary search as written: the stable binary
search looks for the first occurrence of a matching value (if
any), while the insertion is required to find the last
occurrence of a matching value (if any), and put the new value to the
right of it. This requires a subtle change to the binary search
algorithm above:
insert
Method
def insert(self, value):
"""
-------------------------------------------------------
Inserts value at the proper place in the sorted list.
Must be a stable insertion, i.e. consecutive insertions
of the same value must keep their order preserved.
Use: sl.insert(value)
-------------------------------------------------------
Parameters:
value - the value to insert (?)
Returns:
None
-------------------------------------------------------
"""
# Index of beginning of subarray to search.
low = 0
# Index of end of subarray to search.
high = len(self._values) - 1
while low <= high:
# Find the middle of the current subarray.
# (avoids overflow on large values vs (low + high)//2
mid = (high - low) // 2 + low
if value < self._values[mid]:
# search the left subarray.
high = mid - 1
else:
# Default: search the right subarray.
low = mid + 1
self._values.insert(low, value)
return
Note in contrast to the previous versions of binary search, this version moves to the right subarray for its default search, rather than the left subarray. This makes sense if we wish to find the last occurrence of a matching value (if any), and put the new value to the right of it.
The count
method described in the previous
section as written is an \(O(n)\) algorithm. How could it be
rewritten to make it an \(O(\log n)\) algorithm? (In fact, although not
covered in this course, it could take advantage of parallel
processing in a way that a linear search could not.)
In this lesson, we introduced the Sorted_List ADT and discussed different approaches to implement the Sorted_ListList ADT using arrays. We discussed how a Sorted_List can be implemented using Python lists. We looked at the Binary Search in detail.