Lesson 12: Linked Lists and Sorted Lists

Topics

In this lesson, we are going to study further linked algorithms. Topics include:

  1. Comparison of array-based and linked algorithms
  2. Why certain algorithm choices are made

Required Tasks

  1. Complete and submit the tasks of Lab 6.
  2. Complete and submit the tasks of Assignment 6.

Learning Outcomes

By the end of this lesson, you should be able to

  1. Develop efficient linked algorithms
  2. Understand the difference between low and high level data structure internals access.

Introduction

Most of the work necessary to implement the linked List and Sorted List data structures are covered in Lab 6 and Assignment 6. Instead this Lesson discusses linked algorithm design.

Low Level Data Access

We have required that data structure methods be written at the lowest possible level, i.e. that an method should not call other public methods to complete their word. (They may, however, call private helper methods such as _linear_search or _binary_search, which are specially designed to simplify other methods.) Why is this important? What are the benefits of writing slightly longer chunks of code when shortcuts are available?

There are two major reasons that we require you to code at the lowest level possible:

Understanding the second point reinforces the first.

A Bad Algorithm Example

Here is an example of a simple linked list:

A Simple Linked List

The function is_identical compares two Lists to determine if their contents are the same and in the same order. The code that uses Lists looks like:

The list_is_identical Function
    
def list_is_identical(source, target):
    """
    ---------------------------------------------------------
    Determines whether two lists are is_identical, i.e. same values
    appear in the same locations in both lists.
    Use: b = list_is_identical(source, target)
    -------------------------------------------------------
    Parameters:
        source - a list (List)
        target - another list (List)
    Returns:
        identical - True if source contains the same values
            in the same order as target, otherwise False. (boolean)
    -------------------------------------------------------
    """
    if len(source) != len(target):
        identical = False
    else:
        n = len(source)
        i = 0

        # Compare each element, stop when the elements are not the same
        # or the end of the list is reached.
        while i < n and source[i] == target[i]:
            i += 1

        if i == n:
            identical = True
        else:
            identical = False
    return identical

  

How efficient is this code if we compare the array-based and linked versions of the List data structure? The linked version will be significantly less efficient. The reason lies in the definition of the linked List methods:

Using len (__len__) is not a problem in terms of efficiency in either version. The important parts of the two versions of len are:

Both versions are \(O(1)\). The inefficiency in the linked List is caused by the use of [i] (__getitem__):

The array-based version is clearly \(O(1)\). The linked version is \(O(n)\), and what is worse, it is being called, twice, from inside a loop condition. Further, it is repetitious, traversing the same nodes many times. There are a number of different ways of determining how much work an algorithm does - counting the number of lines executed, for example - but we will examine how many nodes are visited during the execution of the method.

Based upon the sample list diagram at the top of the page, and assuming we are comparing two identical lists, it is clear that at minimum the is_identical method is \(O(n)\), and that in this example we would have to traverse eight nodes: four in source, and four in target. How many nodes would actually be visited given the code above? We will step through the

    
while i < n and source[i] == target[i]:

  

loop and see.

Total node visits: 2 + 4 + 6 + 8 = 20.

As to the repetition, the first node containing 7 is visited four times for each list, the node containing 5 three times for each list, and so on.

It turns out that we can represent the total number of node visits for any arbitrary lists size n as the sum of the first n even numbers. The simple formula to calculate this is:

\(n^2 + n\)

which, by our rules of thumb, is very clearly \(O(n^2)\).

Thus, by using high-level methods instead of low-level ones, we have turned an \(O(n)\) method into an \(O(n^2)\) method.

Is it, however, really true that working at the lowest level of the data structure is going to be \(O(n)\)? The following code uses only the low level elements of the linked data structure:

The Linked is_identical Method
    
    def is_identical(self, target):
        """
        ---------------------------------------------------------
        Determines whether two lists are identical.
        Use: b = source.is_identical(target)
        -------------------------------------------------------
        Parameters:
            target - another list (List)
        Returns:
            identical - True if this list contains the same values as
                target in the same order, otherwise False.
        -------------------------------------------------------
        """
        if self._count != target._count:
            identical = False
        else:
            curr1 = self._front
            curr2 = target._front

            while curr1 is not None and curr1._value == curr2._value:
                curr1 = curr1._next
                curr2 = curr2._next

            if curr1 is None:
                identical = True
            else:
                identical = False
        return identical

  

It is trivial to show that each node in the two lists is visited only once, for a total node visits of 8 in this sample list. Thus the time complexity for this version of the is_identical method is \(O(n)\).

This is why we require you to write your data structure methods at the lowest level possible.

Array vs Linked Algorithms

These side-by-side code examples compare array-based code (on the left) against linked code (on the right) to demonstrate that despite the differences in details, the algorithms are often very similar. The details are, of course, absolutely key, so you have to clearly understand those differences, but the algorithms are often - not always, but often - the same.

Stack Peek

Array-Based
    
def peek(self):
    """
    -------------------------------------------------------
    Returns a copy of the value at the top of the stack.
    Attempting to peek at an empty stack throws an exception.
    Use: value = stack.peek()
    -------------------------------------------------------
    Returns:
        value - a copy of the value at the top of stack (?)
    -------------------------------------------------------
    """
    assert len(self._values) > 0, "Cannot peek at an empty stack"

    value = deepcopy(self._values[-1])
    return value

  
Linked
    
def peek(self):
    """
    -------------------------------------------------------
    Returns a copy of the value at the top of the stack.
    Attempting to peek at an empty stack throws an exception.
    Use: value = stack.peek()
    -------------------------------------------------------
    Returns:
        value - a copy of the value at the top of stack (?)
    -------------------------------------------------------
    """
    assert self._top is not None, "Cannot peek at an empty stack"

    value = deepcopy(self._top._value)
    return value

  

Stack Split

Array-Based
    
def split_alt(self):
    """
    -------------------------------------------------------
    Splits the source stack into separate target stacks with values
    alternating into the targets. At finish source stack is empty.
    Use: target1, target2 = source.split_alt()
    -------------------------------------------------------
    Returns:
        target1 - contains alternating values from source (Stack)
        target2 - contains other alternating values from source (Stack)
    -------------------------------------------------------
    """
    target1 = Stack()
    target2 = Stack()
    left = True

    while len(self._values) > 0:

        if left:
            target1._values.append(self._values.pop())
        else:
            target2._values.append(self._values.pop())
        left = not left
    return target1, target2

  
Linked
    
def split_alt(self):
    """
    -------------------------------------------------------
    Splits the source stack into separate target stacks with values
    alternating into the targets. At finish source stack is empty.
    Use: target1, target2 = source.split()
    -------------------------------------------------------
    Returns:
        target1 - contains alternating values from source (Stack)
        target2 - contains other alternating values from source (Stack)
    -------------------------------------------------------
    """
    target1 = Stack()
    target2 = Stack()
    left = True

    while self._top is not None:

        if left:
            target1._move_top_to_top(self)
        else:
            target2._move_top_to_top(self)
        left = not left
    return target1, target2

  

List Count

Array-Based
    
def count(self, key):
    """
    -------------------------------------------------------
    Finds the number of times key appears in list.
    Use: n = lst.count(key)
    -------------------------------------------------------
    Parameters:
        key - a partial data element (?)
    Returns:
        number - number of times key appears in list (int)
    -------------------------------------------------------
    """
    number = 0

    for value in self._values:
        if value == key:
            number += 1
    return number

  
Linked
    
def count(self, key):
    """
    -------------------------------------------------------
    Finds the number of times key appears in list.
    Use: n = l.count(key)
    -------------------------------------------------------
    Parameters:
        key - a partial data element (?)
    Returns:
        returns
        number - number of times key appears in list (int)
    -------------------------------------------------------
    """
    number = 0
    current = self._front

    while current is not None:
        if key == current._value:
            number += 1
        current = current._next
    return number

  

Sorted List count

Array-Based
    
def count(self, key):
    """
    -------------------------------------------------------
    Determines the number of times key appears in the sorted list.
    Use: n = sl.count(key)
    -------------------------------------------------------
    Parameters:
        key - a data element (?)
    Returns:
        returns
        number - the number of times key appears in the sorted list (int)
    -------------------------------------------------------
    """
    number = 0
    i = self._binary_search(key)

    if i > -1:
        # Because the list is sorted, all values with the same
        # key are grouped together.
        n = len(self._values)
        number = 1
        i += 1

        while i < n and self._values[i] == key:
            number += 1
            i += 1
    return number

  
Linked
    
def count(self, key):
    """
    -------------------------------------------------------
    Determines the number of times key appears in the sorted list.
    Use: n = sl.count(key)
    -------------------------------------------------------
    Parameters:
        key - a data element (?)
    Returns:
        returns
        number - the number of times key appears in the sorted list (int)
    -------------------------------------------------------
    """
    number = 0
    _, current, i = self._linear_search(key)

    if i > -1:
        # Because the list is sorted, all values with the same
        # key are grouped together.
        while current is not None and current._value == key:
            number += 1
            current = current._next
    return number

  

Linked Recursion

We will use recursion with linked structures in Binary Search Trees extensively, but it is useful to discuss linked recursion with respect to simpler linear linked structures first.

Printing the Contents of a Linear Linked Structure

Printing the contents of a linked List from front to rear is very simple. Loop through the nodes, following the links to the next node until None is reached:

Traversing a Linked List
    
    def print_i(self):
        """
        -------------------------------------------------------
        Prints the contents of list from front to rear.
        -------------------------------------------------------
        Returns:
            None
        -------------------------------------------------------
        """
        curr = self._front

        while curr is not None:
            print(curr._value)
            curr = curr._next
        return

  

The algorithm is simple, and one we have sees again and again when processing linked structures: assign a curr variable to the top of the stack and move through the stack node by node, updating curr to the next node until it reaches the end of the links, i.e. when _next is None.

What if we want to print the contents of the List from rear to front? (Let's call the method print_r to distinguish it from print_i.) Then we have a problem. The links go only one way:

A Simple Linked List

Looping through the linked List from rear to front cannot be done simply. Because the nodes used are singly-linked, each node can point to only one other node. For inserting and removing purposes, that node has to be the next node in the data structure. The node has no way to get back to its parent node. How, then, can we print the contents of this List from rear to front?


Recursion with Auxiliary Functions and Methods

In simple recursion a function calls itself, such as in the following function that reverses a string:

Reversing a String Recursively
    
def string_reverse(s):
    if s == "":
        result = ""
    else:
        result = string_reverse(s[1:]) + s[0]
    return result

  

In this example the function string_reverse works directly with a simple data structure, in this case a string. The function calls itself directly and passes the same data type to its subcall. Problems arise, however, with more complex data structures such as Lists. The actual data is stored within a more complex structure. Unfortunately, this outer 'layer' of the data structure is not amenable to being used recursively because, unlike the strings and lists of our previous examples, it is made up of layers of elements. Our linked version of a List has this structure:

    
        self._front = None
        self._size = 0

  

We wish to work recursively with the _front portion of the List definition - the part where the nodes begin - not the entire List. Our recursive solution would look something like this:

Recursive List Reversal Algorithm
    
  def some_func(self, curr):

        if curr is not None:
            self.some_func(curr._next)
            print(curr._value)
        return

  

This method works by recursively moving through the List node by node until it reaches the last node. Only then, after the last recursive call has finished, does the method execute its print statement as the recursive calls finish and are popped off the top of the recursion stack.

The problem is that the call to some_func requires a node as a parameter. However, the call to print_r has no parameters other than self. Adding a node parameter to print_r at the top level violates the ADT and means that the method so written would not work with a different implementation of the ADT. The array-based version of print_r doesn't require extra parameters or even recursion since it is trivial to move through a Python list from rear to front.

The solution is to use an auxiliary method that works with the node portion of the List only. Although we can claim that print_r is recursive, the actual recursion is done in its auxiliary method. This is print_r itself:

Traversing a Linked List in Reverse
    
    def print_r(self):
        """
        -------------------------------------------------------
        Prints the contents of list from rear to front.
        -------------------------------------------------------
        Returns:
            None
        -------------------------------------------------------
        """
        self._print_r_aux(self._front)
        return

  

Note that print_r has no extra nodes and does not recursively call itself. Instead, it calls a private auxiliary method. This method is passed the front of the node list rather than the entire List.

This auxiliary method works directly with the node data:

Traversal Auxiliary Method
    
  def _print_r_aux(self, curr):
        """
        -------------------------------------------------------
        Prints the contents of the current node.
        -------------------------------------------------------
        Returns:
            None
        -------------------------------------------------------
        """
        if curr is not None:
            self._print_r_aux(curr._next)
            print(curr._value)
        return

  

This method makes a recursive call to itself, and with each call passes a link to the next node in the list of nodes.

Note that auxiliary recursive methods generally have more parameters and/or different parameters than the base method that calls them. If you write an auxiliary method that uses the same number and type of parameters as the base method that calls it, then you probably don't need the auxiliary method (or you wrote it incorrectly).

Using auxiliary methods for recursion, particularly with data structures, is a very common technique that you must master.

Summary

In this Lesson, we left the implementation of the linked List and Sorted List to the labs and assignments, but examined why writing data structure methods at the lowest possible level is sometimes required in terms of Big-O efficiency. We also briefly compared a few array-based and linked methods for efficiency.

References

  1. Data Structures and Algorithm Analysis in C++, Mark Allen Weiss. 2006.