In this lesson, we are going to study further linked algorithms. Topics include:
By the end of this lesson, you should be able to
Most of the work necessary to implement the linked List and Sorted List data structures are covered in Lab 6 and Assignment 6. Instead this Lesson discusses linked algorithm design.
We have required that data structure methods be written at the lowest
possible level, i.e. that an method should not call other public methods
to complete their word. (They may, however, call private helper methods
such as _linear_search
or _binary_search
,
which are specially designed to simplify other methods.) Why is this
important? What are the benefits of writing slightly longer chunks of
code when shortcuts are available?
There are two major reasons that we require you to code at the lowest level possible:
to demonstrate to us that you truly understand the data structures you are implementing
that these shortcuts may be shortcuts in coding only, and be grossly inefficient in actual execution.
Understanding the second point reinforces the first.
Here is an example of a simple linked list:
The function is_identical
compares two Lists to determine
if their contents are the same and in the same order. The code that uses
Lists looks like:
list_is_identical
Function
def list_is_identical(source, target):
"""
---------------------------------------------------------
Determines whether two lists are is_identical, i.e. same values
appear in the same locations in both lists.
Use: b = list_is_identical(source, target)
-------------------------------------------------------
Parameters:
source - a list (List)
target - another list (List)
Returns:
identical - True if source contains the same values
in the same order as target, otherwise False. (boolean)
-------------------------------------------------------
"""
if len(source) != len(target):
identical = False
else:
n = len(source)
i = 0
# Compare each element, stop when the elements are not the same
# or the end of the list is reached.
while i < n and source[i] == target[i]:
i += 1
if i == n:
identical = True
else:
identical = False
return identical
How efficient is this code if we compare the array-based and linked versions of the List data structure? The linked version will be significantly less efficient. The reason lies in the definition of the linked List methods:
len(source)
(and len(target)
) rely upon
the definition of the __len__
methodsource[i]
(and target[i]
) rely upon the
definition of the __getitem__
method
Using len
(__len__
) is not a problem in terms
of efficiency in either version. The important parts of the two versions
of len
are:
return len(self._values)
return self._count
Both versions are \(O(1)\). The inefficiency in the linked List is
caused by the use of [i]
(__getitem__
):
value = deepcopy(self._values[n])
i = 0
curr = self._front
while i < n:
i += 1
curr = curr._next
value = deepcopy(curr._value)
The array-based version is clearly \(O(1)\). The linked version is \(O(n)\), and what is worse, it is being called, twice, from inside a loop condition. Further, it is repetitious, traversing the same nodes many times. There are a number of different ways of determining how much work an algorithm does - counting the number of lines executed, for example - but we will examine how many nodes are visited during the execution of the method.
Based upon the sample list diagram at the top of the page, and assuming
we are comparing two identical lists, it is clear that at minimum the is_identical
method is \(O(n)\), and that in this example we would have to traverse
eight nodes: four in source
, and four in target
.
How many nodes would actually be visited given the code above? We will
step through the
while i < n and source[i] == target[i]:
loop and see.
i
is 0 (and the node values are 7), for each List
there is a call to __getitem__(0)
. The loop in __getitem__
is not executed at all, but the value is extracted from the first
node.i
is 1 (and the node values are 5), for each List
there is a call to __getitem__(1)
. The loop in __getitem__
is executed once (to move through the nodes containing 7), before the
value 5 is extracted from the next node.i
is 2 (and the node values are 3), for each List
there is a call to __getitem__(2)
. The loop in __getitem__
is executed twice (to move through the nodes containing 7 and 5),
before the value 3 is extracted from the next node.i
is 3 (and the node values are 1), for each List
there is a call to __getitem__(3)
. The loop in __getitem__
is executed thrice (to move through the nodes containing 7, 5, and 3),
before the value 1 is extracted from the next node.Total node visits: 2 + 4 + 6 + 8 = 20.
As to the repetition, the first node containing 7 is visited four times for each list, the node containing 5 three times for each list, and so on.
It turns out that we can represent the total number of node visits for
any arbitrary lists size n
as the sum of the first n
even numbers. The simple formula to calculate this is:
\(n^2 + n\)
which, by our rules of thumb, is very clearly \(O(n^2)\).
Thus, by using high-level methods instead of low-level ones, we have turned an \(O(n)\) method into an \(O(n^2)\) method.
Is it, however, really true that working at the lowest level of the data structure is going to be \(O(n)\)? The following code uses only the low level elements of the linked data structure:
is_identical
Method
def is_identical(self, target):
"""
---------------------------------------------------------
Determines whether two lists are identical.
Use: b = source.is_identical(target)
-------------------------------------------------------
Parameters:
target - another list (List)
Returns:
identical - True if this list contains the same values as
target in the same order, otherwise False.
-------------------------------------------------------
"""
if self._count != target._count:
identical = False
else:
curr1 = self._front
curr2 = target._front
while curr1 is not None and curr1._value == curr2._value:
curr1 = curr1._next
curr2 = curr2._next
if curr1 is None:
identical = True
else:
identical = False
return identical
It is trivial to show that each node in the two lists is visited only
once, for a total node visits of 8 in this sample list. Thus the time
complexity for this version of the is_identical
method is
\(O(n)\).
This is why we require you to write your data structure methods at the lowest level possible.
These side-by-side code examples compare array-based code (on the left) against linked code (on the right) to demonstrate that despite the differences in details, the algorithms are often very similar. The details are, of course, absolutely key, so you have to clearly understand those differences, but the algorithms are often - not always, but often - the same.
def peek(self):
"""
-------------------------------------------------------
Returns a copy of the value at the top of the stack.
Attempting to peek at an empty stack throws an exception.
Use: value = stack.peek()
-------------------------------------------------------
Returns:
value - a copy of the value at the top of stack (?)
-------------------------------------------------------
"""
assert len(self._values) > 0, "Cannot peek at an empty stack"
value = deepcopy(self._values[-1])
return value
def peek(self):
"""
-------------------------------------------------------
Returns a copy of the value at the top of the stack.
Attempting to peek at an empty stack throws an exception.
Use: value = stack.peek()
-------------------------------------------------------
Returns:
value - a copy of the value at the top of stack (?)
-------------------------------------------------------
"""
assert self._top is not None, "Cannot peek at an empty stack"
value = deepcopy(self._top._value)
return value
def split_alt(self):
"""
-------------------------------------------------------
Splits the source stack into separate target stacks with values
alternating into the targets. At finish source stack is empty.
Use: target1, target2 = source.split_alt()
-------------------------------------------------------
Returns:
target1 - contains alternating values from source (Stack)
target2 - contains other alternating values from source (Stack)
-------------------------------------------------------
"""
target1 = Stack()
target2 = Stack()
left = True
while len(self._values) > 0:
if left:
target1._values.append(self._values.pop())
else:
target2._values.append(self._values.pop())
left = not left
return target1, target2
def split_alt(self):
"""
-------------------------------------------------------
Splits the source stack into separate target stacks with values
alternating into the targets. At finish source stack is empty.
Use: target1, target2 = source.split()
-------------------------------------------------------
Returns:
target1 - contains alternating values from source (Stack)
target2 - contains other alternating values from source (Stack)
-------------------------------------------------------
"""
target1 = Stack()
target2 = Stack()
left = True
while self._top is not None:
if left:
target1._move_top_to_top(self)
else:
target2._move_top_to_top(self)
left = not left
return target1, target2
def count(self, key):
"""
-------------------------------------------------------
Finds the number of times key appears in list.
Use: n = lst.count(key)
-------------------------------------------------------
Parameters:
key - a partial data element (?)
Returns:
number - number of times key appears in list (int)
-------------------------------------------------------
"""
number = 0
for value in self._values:
if value == key:
number += 1
return number
def count(self, key):
"""
-------------------------------------------------------
Finds the number of times key appears in list.
Use: n = l.count(key)
-------------------------------------------------------
Parameters:
key - a partial data element (?)
Returns:
returns
number - number of times key appears in list (int)
-------------------------------------------------------
"""
number = 0
curr = self._front
while curr is not None:
if key == curr._value:
number += 1
curr = curr._next
return number
count
def count(self, key):
"""
-------------------------------------------------------
Determines the number of times key appears in the sorted list.
Use: n = sl.count(key)
-------------------------------------------------------
Parameters:
key - a data element (?)
Returns:
returns
number - the number of times key appears in the sorted list (int)
-------------------------------------------------------
"""
number = 0
i = self._binary_search(key)
if i > -1:
# Because the list is sorted, all values with the same
# key are grouped together.
n = len(self._values)
number = 1
i += 1
while i < n and self._values[i] == key:
number += 1
i += 1
return number
def count(self, key):
"""
-------------------------------------------------------
Determines the number of times key appears in the sorted list.
Use: n = sl.count(key)
-------------------------------------------------------
Parameters:
key - a data element (?)
Returns:
returns
number - the number of times key appears in the sorted list (int)
-------------------------------------------------------
"""
number = 0
_, curr, i = self._linear_search(key)
if i > -1:
# Because the list is sorted, all values with the same
# key are grouped together.
while curr is not None and curr._value == key:
number += 1
curr = curr._next
return number
We will use recursion with linked structures in Binary Search Trees extensively, but it is useful to discuss linked recursion with respect to simpler linear linked structures first.
Printing the contents of a linked List from front to rear is very
simple. Loop through the nodes, following the links to the next node
until None
is reached:
def print_i(self):
"""
-------------------------------------------------------
Prints the contents of list from front to rear.
-------------------------------------------------------
Returns:
None
-------------------------------------------------------
"""
curr = self._front
while curr is not None:
print(curr._value)
curr = curr._next
return
The algorithm is simple, and one we have sees again and again when
processing linked structures: assign a curr
variable to the
top of the stack and move through the stack node by node, updating curr
to the next node until it reaches the end of the links, i.e. when _next
is None
.
What if we want to print the contents of the List from rear to front?
(Let's call the method print_r
to distinguish it from print_i
.)
Then we have a problem. The links go only one way:
Looping through the linked List from rear to front cannot be done simply. Because the nodes used are singly-linked, each node can point to only one other node. For inserting and removing purposes, that node has to be the next node in the data structure. The node has no way to get back to its parent node. How, then, can we print the contents of this List from rear to front?
In simple recursion a function calls itself, such as in the following function that reverses a string:
def string_reverse(s):
if s == "":
result = ""
else:
result = string_reverse(s[1:]) + s[0]
return result
In this example the function string_reverse
works directly
with a simple data structure, in this case a string. The function calls
itself directly and passes the same data type to its subcall. Problems
arise, however, with more complex data structures such as Lists. The
actual data is stored within a more complex structure. Unfortunately,
this outer 'layer' of the data structure is not amenable to being used
recursively because, unlike the strings and lists of our previous
examples, it is made up of layers of elements. Our linked version of a
List has this structure:
self._front = None
self._size = 0
We wish to work recursively with the _front
portion of the
List definition - the part where the nodes begin - not the entire List.
Our recursive solution would look something like this:
def some_func(self, curr):
if curr is not None:
self.some_func(curr._next)
print(curr._value)
return
This method works by recursively moving through the List node by node
until it reaches the last node. Only then, after the last recursive call
has finished, does the method execute its print
statement
as the recursive calls finish and are popped off the top of the
recursion stack.
The problem is that the call to some_func
requires a node
as a parameter. However, the call to print_r
has no
parameters other than self
. Adding a node parameter to print_r
at the top level violates the ADT and means that the method so written
would not work with a different implementation of the ADT. The
array-based version of print_r
doesn't require extra
parameters or even recursion since it is trivial to move through a
Python list from rear to front.
The solution is to use an auxiliary method that works with the
node portion of the List only. Although we can claim that print_r
is recursive, the actual recursion is done in its auxiliary method. This
is print_r
itself:
def print_r(self):
"""
-------------------------------------------------------
Prints the contents of list from rear to front.
-------------------------------------------------------
Returns:
None
-------------------------------------------------------
"""
self._print_r_aux(self._front)
return
Note that print_r
has no extra nodes and does not
recursively call itself. Instead, it calls a private auxiliary method.
This method is passed the front of the node list rather than the entire
List.
This auxiliary method works directly with the node data:
def _print_r_aux(self, curr):
"""
-------------------------------------------------------
Prints the contents of the current node.
-------------------------------------------------------
Returns:
None
-------------------------------------------------------
"""
if curr is not None:
self._print_r_aux(curr._next)
print(curr._value)
return
This method makes a recursive call to itself, and with each call passes a link to the next node in the list of nodes.
Note that auxiliary recursive methods generally have more parameters and/or different parameters than the base method that calls them. If you write an auxiliary method that uses the same number and type of parameters as the base method that calls it, then you probably don't need the auxiliary method (or you wrote it incorrectly).
Using auxiliary methods for recursion, particularly with data structures, is a very common technique that you must master.
In this Lesson, we left the implementation of the linked List and Sorted List to the labs and assignments, but examined why writing data structure methods at the lowest possible level is sometimes required in terms of Big-O efficiency. We also briefly compared a few array-based and linked methods for efficiency.