CP104 Notes: Files


File Access

A file is a collection of data saved under a unique name. In general there are two types of files: binary and text. Text files can be read by humans and consist of characters. Binary files are just that - binary, i.e. photographs, executable code, etc. Python works with either.

We will not be working with binary files in CP104.

We talk about "writing to a file" and "reading from a file".

The contents of a file can be accessed in order (sequential access) or a particular position in the file can be accessed (direct access). The access method you choose depends on how the data is structured within the file.

The usual steps in a program when dealing with a file are:

  1. Open the file
  2. Process the file
  3. Close the file

Opening a File

In Python the syntax for opening a text file is

fh = open(filename, mode, encoding="utf-8")

File Modes
Code Description
"r" Read: Open the file for reading only - filename must already exist. The file handle points to the beginning of the file.
"w" Write: Open the file for writing only - if filename already exists it is overwritten, otherwise it is created
"a" Append: Open the file for appending - if filename exists data is appended to the end of it, otherwise it is created. The file handle points to the end of the file.
"r+" Read and Write - filename must already exist. The file handle points to the beginning of the file.
"w+" Read and Write - if filename exists it is overwritten, otherwise it is created
"a+" Read and Write - if filename exists data is appended to the end of it, otherwise it is created. The file handle points to the end of the file.

An example:

fh = open("customers.txt", "r", encoding="utf-8")

opens the file "customers.txt" in the current folder for reading.

Errors are typically caused by:


Closing a File

To close a file:

fh.close()

In every program you write you should close a file if you have opened it. However, it is generally good practice for a function that opens a file to close it after the file is processed. For example, if a function accepts a file handle as a parameter it should not close that file since it did not open it.

Failing to properly close a file may result in loss of data.


Writing to a File

In Python the syntax for writing to a file is:

fh.write(f".. {value1} ... {value2} ...")

i.e. just a typical format function where the resulting string is written to the file handle. Note that you may explicitly have to append the newline (end-of-line) character '\n' if the lines being written to the file do not already have a newline.

So we could fill a file by using the following code:

    
def file_fill(destinations_file, destinations):
    # Write the trip locations to the file
    for destination in destinations:
        destinations_file.write(f"{destination}\n")
    return

# Call the function
filename = "travelplans.txt"
output_file = open(filename, "w", encoding="utf-8")
city_list = ["Paris", "Prague", "London"]
file_fill(output_file, city_list)
output_file.close()

  

Our file travelplans.txt contains:

Paris
Prague
London
  

Reading from a File

We can read data in much the same way once a file has been created:

    
def file_read(travel_file):
    data = []
    # Read the first line of the file.
    line = travel_file.readline()

    while line != "":
        data.append(line.strip())
        line = travel_file.readline()
    return data

filename = "travelplans.txt"
travel_file = open(filename, "r", encoding="utf-8")
travel_contents = file_read(travel_file)
travel_file.close()

  

When we use a while loop we need to know when the file is finished. Note that the readline method returns an empty string ("") when it attempt to read beyond the end of the file.

Note also in this example we have passed the file handle, not the file name, as the parameter to the function. The file has already been opened and we can just use it.

Note also the use of the string method strip (discussed in Strings), to strip off the trailing spaces and end of line of each string read from the file. Without this you may end up with extra blank lines in your results.

The same can be done with a for loop:

    
def file_read(travel_file):
    data = []

    for line in travel_file:
        data.append(line.strip())
    return data

filename = input("Enter the file name: ")
travel_file = open(filename, "r", encoding="utf-8")
travel_contents = file_read(travel_file)
travel_file.close()

  

The for loop iterates once for each line in the file, one after another from beginning to end.

The for loop looks much simpler, so why use a while loop versus a for loop? For similar reasons to using a while loop over a for loop in other situations: the for loop processes the entire file every time. The while loop allows us to add extra conditions to stop processing the file before reaching its end. For example, say we had a function that just needed to know whether a particular city was one of the destinations we were interested in - say, 'Prague'. In that case we would want to stop reading the file just as soon as we found the word 'Prague' in it. Why read the rest of the file at that point? Here is an example:

    
def has_city(travel_file, city_name):
    # City name not yet found.
    city_found = False
    # Get the first city in the file.
    city = travel_file.readline()

    while city != "" and city.strip() != city_name:
        # Get the next city in the file.
        city = travel_file.readline()

    # See why loop stopped - end of file or city names match?
    if city != "":
        # End of file not reached.
        city_found = True
    return city_found

filename = input("Enter the file name: ")
city_name = input("Enter the city name to look for: ")
travel_file = open(filename, "r", encoding="utf-8")
result = has_city(travel_file, city_name)
travel_file.close()

  

You may check to see if a file exists by using the Python function os.path.exists(filename) . This function returns True if filename exists, and False otherwise. You must import os in order to use this function. For example:

    
from os import path

filename = input("Enter file name: ")

if path.exists(filename):
    fh = open(filename, "r", encoding="utf-8")
    # process file
    ...
    fh.close()
else:
    print(f"File {filename} does not exist.")

  

A marks file example:

    
def marks_process(marks_file):
    total = 0
    count = 0

    for line in marks_file:
        mark = int(line)
        total = total + mark
        count = count + 1
    average = total / count
    return average

marks_file = open("marksfile.txt", "r", encoding="utf-8")
marks_average = marks_process(marks_file)
marks_file.close()

  

In general functions that work with files should work correctly with empty files. Which of the examples given on this page would work fine with empty files and which ones may have problems?


Different Types of Reading from a File

There are different ways of reading the contents of a file, depending on what you want to do with those contents.


Using split

Data stored in a file as a string may not be in the format we want it. We may want our data in separate strings, or in a list. Once a line has been read it can be broken up into appropriate chunks using either string slicing or the split method. split is a string method that breaks a string up into chunks by spaces (default), or based upon some other string passed as a parameter, and returns the chunks in a list. The separator character is discarded.

In order to avoid having end of line characters (\n, \r, or \n\r), strip() the string before splitting it. If a file contains the lines:

12345,Tom,Black,300.00,1998-01-30
23456,Alice,Smith,1200.50,1998-02-20
...

then the following:


line = fh.readline()
data = line.split(",")
print(data)

produces:

['12345', 'Tom', 'Black', '300.00', '1998-01-30\n']

(note the \n on the last list element) whereas:


line = fh.readline()
data = line.strip().split(",")
print(data)

produces:

['12345', 'Tom', 'Black', '300.00', '1998-01-30']

without the \n. If there is no trailing newline character, strip() has no effect.

Some examples:

Code Result Description
    
line = "David Brown"
names = line.strip().split()

  
['David', 'Brown'] List from string separated by spaces
    
line = "first,second,third"
number_names = line.strip().split(",")

  
['first', 'second', 'third'] List from string separated by commas
    
line = "I ate hamburgers and hot dogs and pizza and ice cream."
items = line.strip().split("and")

  
['I ate hamburgers ', ' hot dogs ', ' pizza ', ' ice cream.' List from string separated by "and"
    
line = "No commas in this sentence."
strings = line.strip().split(",")

  
['No commas in this sentence.'] List from string without the requested separator
    
line = "1,12,94,32,10"
digits = line.strip().split(",")

  
['1', '12', '94', '32', '10'] List of digits

Once the list is obtained its contents can be processed. In the last example, the list of digits could be turned into a list of integers with the following code:

    
numbers = []

for digit in digits:
    numbers.append(int(digit))