Lesson 9: Linked Data Structures

Topics

In this lesson, we are going to introduce linked-list data structure. Topics include:

  1. What is a linked data structure?
  2. Implementation of the Stack ADT using a linked data structure

Required Tasks

  1. Read this lesson.

Learning Outcomes

By the end of this lesson, you should be able to:

  1. Define a linked data structure.
  2. Implement the Stack ADT using a linked data structure.

Introduction

In this lesson, we discuss linked-data structures where the data elements are placed in different places in memory and are linked together. Specifically, we are going to introduce and discuss in detail the linked Stack data structure.

Linked Data Structures Concepts

So far we have worked with array-based data structures. Our Stack, Queue, and List ADTs have been implemented with the arrays or Python lists that allows us to access individual elements with an index value. Linked data structures are based upon nodes. A node typically consists of two parts:

A Single-Linked Node

Figure 1 shows a node made up of the elements:

_value
an attribute that contains the node value (i.e. data). This can contain any kind of data (consistent within the data structure, i.e. you can have a linked Stack of all integers, or all strings, or all Students.)
_next
an attribute that contains a link to (i.e. memory address of) another node. Any individual node can be anywhere in memory - we neither know, nor care what the address in memory of a node actually is - so long as the _next part of the previous node contains the address of the next node.
A Linked Series of Nodes

Figure 2 shows a series of nodes linked together. The elements of the Figure are:

v
the _value, or data, part of a node, as noted in Figure 1.
n
the _next, or link, part of a node, as noted in Figure 1.
a
the (numbered) address of a node.
(a)
a reference to an address either in the node immediately previous to the addressed node, or in a variable outside the linked structure. That variable is named _front, or _top, or something else appropriate to the data structure. We have to know where the first node of the data structure resides or we will lose track of it.
green circle
indicates the presence of (unspecified) data. Note that nodes must always contain valid data. There are no nodes that contain None. If there is no data, there is no node.
None
marks the end of the linked structure and indicate that there are no more nodes to link to.

Depending on the number of links in a node and the way nodes point to each other, we may have a linear linked structure or a non-linear linked data structure. In this course, we will be talking about linear linked data structures such as linked lists and non-linear linked data structures such as trees.

In the next section we will look specifically at the linked implementation of the Stack.

The Linked Stack

A linked Stack is a linear linked data structure, as each node in a Stack links to one other node. We must keep track of the first node in a Stack, and call it the top node.

Given a node:

A Single-Linked Node

how can we construct a Stack from a linked series of these nodes?

Figure 2 displays the workings of the linked Stack. Use and to see how it works:

Linked Stack Implementation

The _Stack_Node Class

A Stack node needs nothing beyond the basic _value and _next attributes already discussed. There is no use for an empty node. The _Stack_Node class defines this node and a constructor to create a node. There are no other methods in the _Stack_Node class as the Stack class takes care of manipulating the nodes. Thus the _Stack_Node class code is very simple:

The _Stack_Node Class
    
class _Stack_Node:

    def __init__(self, value, next_):
        """
        -------------------------------------------------------
        Initializes a stack node that contains a copy of value
        and a link to the next node in the stack.
        Use: node = _Stack_Node(value, _next)
        -------------------------------------------------------
        Parameters:
            value - value for node (?)
            next_ - another Stack node (_Stack_Node)
        Returns:
            a new _Stack_Node object (_Stack_Node)
        -------------------------------------------------------
        """
        self._value = deepcopy(value)
        self._next = next_

  

Note that the name of the _Stack_Node class is prefixed with an underscore (_). This means that the entire class is private. Its definition must be included in the same source code file as the Stack class. _Stack_Node should never be referenced by an import statement because nothing other than Stack should ever access it.

Values are deepcopied into a node for exactly the same reasons they are copied into the array-based Stack list, i.e. to isolate the values from the outside world.

The parameter next_ is named with a trailing _ (underscore) because next is a Python keyword, and we shouldn't use Python keywords as attribute or variable names.

The Linked Stack Class

The Linked Stack Constructor

A linked Stack is initialized with:

The __init__ Method
    
class Stack:

    def __init__(self):
        """
        -------------------------------------------------------
        Initializes an empty stack. Values are stored in a
        linked structure.
        Use: stack = Stack()
        -------------------------------------------------------
        Returns:
            a new Stack object (Stack)
        -------------------------------------------------------
        """
        self._top = None

  

The Stack initialization is simple: the private attribute _top links to the first _Stack_Node in the stack. Because the stack is empty, it contains no _Stack_Node objects, thus _top is initialized to None. The real work comes in pushing a node onto or popping a node from a stack. (Defining the is_empty method is trivial and left to you as an exercise.)

The Linked Stack push Method

Pushing a value onto a linked Stack means creating a new node, adding the value to it, and making this new node the top of the Stack.

The push Method
    
    def push(self, value):
        """
        -------------------------------------------------------
        Pushes a copy of value onto the top of the stack.
        Use: stack.push(value)
        -------------------------------------------------------
        Parameters:
            value - value to be added to stack (?)
        Returns:
            None
        -------------------------------------------------------
        """
        self._top = _Stack_Node(value, self._top)
        return

  

push creates a new _Stack_Node object and adds a value to it. The _next link of the _Stack_Node object is set to point to the 'old' top of the stack. When a Stack is empty, the value of _top is None, otherwise it links to whatever _Stack_Node is currently the top node. The 'new' top of the stack is linked to this new node. This is analogous to updating a sum variable by incrementing its previous value with sum = sum + n, which we have done many times.

Note that there are no special cases to consider when pushing a new value onto the stack. The new node always becomes the top node whether the stack is empty or not.

The Linked Stack pop Method

Popping a value from a linked stack means extracting the value from the top node then removing that node. The stack's _top pointer must now point to the next node in the stack. This is easy: get the current top node's link to its next node by referring to its _next attribute, then making that node the new top. This is the reverse of push.

The pop Method
    
    def pop(self):
        """
        -------------------------------------------------------
        Pops and returns the top of stack. The value is removed
        from the stack. Attempting to pop from an empty stack
        throws an exception.
        Use: value = stack.pop()
        -------------------------------------------------------
        Returns:
            value - the value at the top of stack (?)
        -------------------------------------------------------
        """
        assert self._top is not None, "Cannot pop from an empty stack"

        value = self._top._value
        self._top = self._top._next
        return value

  

Moving Nodes Between Stacks

Combining Two Stacks

The Array-Based Stack combine Method

You may have been asked to write methods or functions to combine two Stacks into one. The function version stack_combine uses the ADT methods to combine the Stacks. Because the ADT methods look identical from the outside for both the array-based and linked implementations of the Stack, that function can use the linked Stack representation without change - you would simply import the linked implementation of the Stack rather than the array-based implementation.

The method that is defined as part of the Stack class extends the Stack ADT. Although the method signatures (the def lines of the methods) are identical, the implementation details of between the two Stack implementations (array-based and linked) are radically different. However, their general algorithms is not. It requires looping through the two source Stacks moving data to the target Stack until one or the other of the source Stacks is exhausted. Then the algorithm loops through the remaining source Stack (if any) moving data to the target Stack until the remaining source Stack is exhausted:

Array-Based Stack combine
    
    def combine(self, source1, source2):
        """
        -------------------------------------------------------
        Combines two source Stacks into the current target Stack.
        When finished, the contents of source1 and source2 are
        interlaced into target and source1 and source2 are empty.
        Use: target.combine(source1, source2)
        -------------------------------------------------------
        Parameters:
            source1 - an array-based Stack (Stack)
            source2 - an array-based Stack (Stack)
        Returns:
            None
        -------------------------------------------------------
        """
        while len(source1._values) > 0 and len(source2._values) > 0:
            self._values.append(source1._values.pop())
            self._values.append(source2._values.pop())

        while len(source1._values) > 0:
            self._values.append(source1._values.pop())

        while len(source2._values) > 0:
            self._values.append(source2._values.pop())
        return

  

The key things to understand about this algorithm:

The Linked Stack combine Method

The linked version of the algorithm is implemented as:

Linked Stack combine

    def combine(self, source1, source2):
        """
        -------------------------------------------------------
        Combines two source Stacks into the current target Stack.
        When finished, the contents of source1 and source2 are interlaced
        into target and source1 and source2 are empty.
        Use: target.combine(source1, source2)
        -------------------------------------------------------
        Parameters:
            source1 - a linked Stack (Stack)
            source2 - a linked Stack (Stack)
        Returns:
            None
        -------------------------------------------------------
        """
        while source1._top is not None and source2._top is not None:
            self._move_top_to_top(source1)
            self._move_top_to_top(source2)

        while source1._top is not None:
            self._move_top_to_top(source1)

        while source2._top is not None:
            self._move_top_to_top(source2)
        return

The key things to understand about this algorithm:

Note that we are going to the trouble of moving the top nodes of each source Stack to the target Stack one at a time, because we are alternating the contents of the source Stacks into the target Stack.

How, then, does _move_top_to_top actually shift nodes from one Stack to another? It performs the equivalent of self.push(source.pop()), but without copying that data that calling these methods would involve.

Linked Stack _move_top_to_top
    
    def _move_top_to_top(self, source):
        """
        -------------------------------------------------------
        Moves the top node from the source Stack to the target Stack.
        The target Stack contains the old top node of the source Stack.
        The source Stack top is updated. Equivalent of
        self.push(source.pop()), but moves nodes, not data.
        Use: target._move_top(source)
        -------------------------------------------------------
        Parameters:
            source - a linked Stack (Stack)
        Returns:
            None
        -------------------------------------------------------
        """
        assert source._top is not None, "Cannot move the top of an empty Stack"

        temp = source._top
        source._top = source._top._next
        temp._next = self._top
        self._top = temp
        return

  

Note that the reference to target is generic and refers to how a Stack variable might be referenced in a program - within the actual helper method self is the target Stack.

The assert statement makes sure we don't attempt to move a node from an empty Stack, which makes no sense in any case.

The key difference between the two approaches is that in the array-based approach the actual node data is moved/copied from one memory location to another. In the linked version it is the node connections that are changed and the data is left where it is in memory. This is a significant improvement over the work done by the array-based implementation, not in terms of Big-O, which is the same, but in terms of work done because copying data is more more expensive than changing addresses. Remember that the _top attribute simply stores the address of the first node of a Stack. This address is just an integer, and updating the value of an integer is significantly less work than copying a chunk of Student, or Food, or Movie data.

How, then, do we move nodes? What does this look like? The actual work is done by the following lines in the _move_top_to_top above contains:

    
        temp = source._top
        source._top = source._top._next
        temp._next = self._top
        self._top = temp

  

Use and to see how this works:

Moving Stack Tops Between Stacks

While moving a Stack top, both temp._next and source._top refer to the same node. Remember that we can have multiple references to an object in memory.

The order in which these lines are executed is key to moving a node correctly. We must make sure that at all times the node we are moving has a reference, and that the next nodes are referenced as well. Without the temp variable we would lose access to one of these nodes.

Once the top is moved, we no longer care what happens to temp. In the combine algorithm, it is used repeatedly to move nodes, and when the method ends, it disappears, leaving the combined Stack behind.

Moving nodes between linked data structures is a critical concept in understanding the efficient use of linked data structures. We will look at moving nodes in other data structures in later Lessons.

is and None

Comparing None

The Stack class definition use None to mark the end of the Stack, and our sample code looks for this None to determine when the end of a Stack is reached. The code uses the is operator rather than == to compare to None. Why?

The is operator is Python's way of allowing you to ask the question, Are these two objects the same object?, which is not the same as asking whether the contents of two objects are equal. It is particularly important to use when working with None. Here are some examples:

Using None Examples

f1 = Food('Ravioli', 7, False, 246)
f2 = Food('Ravioli', 7, False, 246)
f3 = f1
x = None
y = None
print(f"f1 == f2:   {f1 == f2}")
print(f"f1 is f2:   {f1 is f2}")
print(f"f1 is None: {f1 is None}")
print(f"f1 is f3:   {f1 is f3}")
print(f"x is y:     {x is y}")
print(f"f1 == None: {f1 == None}")

          

and the results are:

f1 == f2:   True
f1 is f2:   False
f1 is None: False
f1 is f2:   True
x is y:     True
Traceback (most recent call last):
  File "test.py", line 38, in <module>
    print("f1 == None: {}".format(f1 == None))
  File "Food.py", line 105, in __eq__
    result = (self.name.lower(), self.origin) == (
            target.name.lower(), target.origin)
AttributeError: 'NoneType' object has no attribute 'name'

The first thing to understand is that there is one and only one None object. Every reference to None is to the same None. Thus None is a very useful placeholder that we will use a lot in linked data structures. It is a way of clearly illustrating that whatever we are referring to has no value.

Code Listing 1 illustrates that:

The rule that arises out of this: always use the is operator when comparing against None. Even if equivalency works with simple items, consistency is almost always a better approach.

Summary

In this Lesson, we introduced linked data structures including nodes and how nodes contain data and are linked to other nodes to form a data structure. We discussed the linked Stack in particular, and provided code to demonstrate how a linked data structure can be initialized and updated. We examined how to move nodes between Stacks without copying data. We look at the role of None in linked data structures.

References

  1. Data Structures and Algorithm Analysis in C++, Mark Allen Weiss. 2006.