CP104 Notes: Files

File Access

A file is a collection of data saved under a unique name. In general there are two types of files: binary and text. Text files can be read by humans and consist of characters. Binary files are just that - binary, i.e. photographs, executable code, etc. Python works with either.

We will not be working with binary files in CP104.

We talk about "writing to a file" and "reading from a file".

The contents of a file can be accessed in order (sequential access) or a particular position in the file can be accessed (direct access). The access method you choose depends on how the data is structured within the file.

The usual steps in a program when dealing with a file are:

Open the file
Process the file
Close the file

Opening a File

In Python the syntax for opening a text file is

fh = open(filename, mode, encoding="utf-8")

fh (file handle): the variable that Python uses to represent the file while processing it - it is not the name of the file as stored on disk (that is the variable filename).
filename: the name of the file on disk. The only time the name of the file is used is when opening the file.
mode: determines how the file is opened - for reading, writing, or appending.
encoding="utf-8": determines how the text in the file is to be read. There are a number of encodings, but Python 3.0+ generally works with an encoding called 'utf-8'. This works with plain old ASCII text as well as a wide range of non-English characters. It is a good default encoding.

File Modes
Code	Description
`"r"`	Read: Open the file for reading only - `filename` must already exist. The file handle points to the beginning of the file.
`"w"`	Write: Open the file for writing only - if `filename` already exists it is overwritten, otherwise it is created
`"a"`	Append: Open the file for appending - if `filename` exists data is appended to the end of it, otherwise it is created. The file handle points to the end of the file.
`"r+"`	Read and Write - `filename` must already exist. The file handle points to the beginning of the file.
`"w+"`	Read and Write - if `filename` exists it is overwritten, otherwise it is created
`"a+"`	Read and Write - if `filename` exists data is appended to the end of it, otherwise it is created. The file handle points to the end of the file.

An example:

fh = open("customers.txt", "r", encoding="utf-8")

opens the file "customers.txt" in the current folder for reading.

Errors are typically caused by:

attempting to open a file that does not exist in r and r+ modes
writing to a file open for reading only
reading from a file open for writing only
having the same file open more than once at a time

Closing a File

To close a file:

fh.close()

In every program you write you should close a file if you have opened it. However, it is generally good practice for a function that opens a file to close it after the file is processed. For example, if a function accepts a file handle as a parameter it should not close that file since it did not open it.

Failing to properly close a file may result in loss of data.

Writing to a File

In Python the syntax for writing to a file is:

fh.write(f".. {value1} ... {value2} ...")

i.e. just a typical format function where the resulting string is written to the file handle. Note that you may explicitly have to append the newline (end-of-line) character '\n' if the lines being written to the file do not already have a newline.

So we could fill a file by using the following code:

    
def file_fill(destinations_file, destinations):
    # Write the trip locations to the file
    for destination in destinations:
        destinations_file.write(f"{destination}\n")
    return

# Call the function
filename = "travelplans.txt"
output_file = open(filename, "w", encoding="utf-8")
city_list = ["Paris", "Prague", "London"]
file_fill(output_file, city_list)
output_file.close()

Our file travelplans.txt contains:

Paris
Prague
London

Reading from a File

We can read data in much the same way once a file has been created:

    
def file_read(travel_file):
    data = []
    # Read the first line of the file.
    line = travel_file.readline()

    while line != "":
        data.append(line.strip())
        line = travel_file.readline()
    return data

filename = "travelplans.txt"
travel_file = open(filename, "r", encoding="utf-8")
travel_contents = file_read(travel_file)
travel_file.close()

When we use a while loop we need to know when the file is finished. Note that the readline method returns an empty string ("") when it attempt to read beyond the end of the file.

Note also in this example we have passed the file handle, not the file name, as the parameter to the function. The file has already been opened and we can just use it.

Note also the use of the string method strip (discussed in Strings), to strip off the trailing spaces and end of line of each string read from the file. Without this you may end up with extra blank lines in your results.

The same can be done with a for loop:

    
def file_read(travel_file):
    data = []

    for line in travel_file:
        data.append(line.strip())
    return data

filename = input("Enter the file name: ")
travel_file = open(filename, "r", encoding="utf-8")
travel_contents = file_read(travel_file)
travel_file.close()

The for loop iterates once for each line in the file, one after another from beginning to end.

The for loop looks much simpler, so why use a while loop versus a for loop? For similar reasons to using a while loop over a for loop in other situations: the for loop processes the entire file every time. The while loop allows us to add extra conditions to stop processing the file before reaching its end. For example, say we had a function that just needed to know whether a particular city was one of the destinations we were interested in - say, 'Prague'. In that case we would want to stop reading the file just as soon as we found the word 'Prague' in it. Why read the rest of the file at that point? Here is an example:

    
def has_city(travel_file, city_name):
    # City name not yet found.
    city_found = False
    # Get the first city in the file.
    city = travel_file.readline()

    while city != "" and city.strip() != city_name:
        # Get the next city in the file.
        city = travel_file.readline()

    # See why loop stopped - end of file or city names match?
    if city != "":
        # End of file not reached.
        city_found = True
    return city_found

filename = input("Enter the file name: ")
city_name = input("Enter the city name to look for: ")
travel_file = open(filename, "r", encoding="utf-8")
result = has_city(travel_file, city_name)
travel_file.close()

You may check to see if a file exists by using the Python function os.path.exists(filename). This function returns True if filename exists, and False otherwise. You must import os in order to use this function. For example:

    
from os import path

filename = input("Enter file name: ")

if path.exists(filename):
    fh = open(filename, "r", encoding="utf-8")
    # process file
    ...
    fh.close()
else:
    print(f"File {filename} does not exist.")

A marks file example:

    
def marks_process(marks_file):
    total = 0
    count = 0

    for line in marks_file:
        mark = int(line)
        total = total + mark
        count = count + 1
    average = total / count
    return average

marks_file = open("marksfile.txt", "r", encoding="utf-8")
marks_average = marks_process(marks_file)
marks_file.close()

In general functions that work with files should work correctly with empty files. Which of the examples given on this page would work fine with empty files and which ones may have problems?

Different Types of Reading from a File

There are different ways of reading the contents of a file, depending on what you want to do with those contents.

Read from a file until the end of file is reached, or some other condition(s) is reached:

        
line = fh.readline()

while line != "" and other condition:
    # process line
    line = fh.readline()

Read and process the first line of, then process the file until the end of file is reached, or some other condition(s) is reached:
```
        
line = fh.readline()
# process first line
line = fh.readline()

while line != "" and other condition:
    # process line
    line = fh.readline()

      
```
Note that the second line is read before the loop starts in order to have it processed by the loop. Otherwise, the first line is processed by the loop contents.

Read all of the lines in a file.

        
for line in fh:
    # process line

Read and process the first line of a file, and then read the rest of the file.

        
line = fh.readline()
# process first line

for line in fh:
    # process line

Read all of the lines of a file into a list of strings, one line per list item.
```
        
data = fh.readlines()
# process data

      
```

Read all the lines of a file into a single string.

        
big_string = fh.read()
# process big_string

Using `split`

Data stored in a file as a string may not be in the format we want it. We may want our data in separate strings, or in a list. Once a line has been read it can be broken up into appropriate chunks using either string slicing or the split method. split is a string method that breaks a string up into chunks by spaces (default), or based upon some other string passed as a parameter, and returns the chunks in a list. The separator character is discarded.

In order to avoid having end of line characters (\n, \r, or \n\r), strip() the string before splitting it. If a file contains the lines:

12345,Tom,Black,300.00,1998-01-30
23456,Alice,Smith,1200.50,1998-02-20
...

then the following:


line = fh.readline()
data = line.split(",")
print(data)

produces:

['12345', 'Tom', 'Black', '300.00', '1998-01-30\n']

(note the \n on the last list element) whereas:


line = fh.readline()
data = line.strip().split(",")
print(data)

produces:

['12345', 'Tom', 'Black', '300.00', '1998-01-30']

without the \n. If there is no trailing newline character, strip() has no effect.

Some examples:

Code	Result	Description
`line = "David Brown" names = line.strip().split()`	['David', 'Brown']	List from string separated by spaces
`line = "first,second,third" number_names = line.strip().split(",")`	['first', 'second', 'third']	List from string separated by commas
`line = "I ate hamburgers and hot dogs and pizza and ice cream." items = line.strip().split("and")`	['I ate hamburgers ', ' hot dogs ', ' pizza ', ' ice cream.'	List from string separated by "and"
`line = "No commas in this sentence." strings = line.strip().split(",")`	['No commas in this sentence.']	List from string without the requested separator
`line = "1,12,94,32,10" digits = line.strip().split(",")`	['1', '12', '94', '32', '10']	List of digits

Once the list is obtained its contents can be processed. In the last example, the list of digits could be turned into a list of integers with the following code:

    
numbers = []

for digit in digits:
    numbers.append(int(digit))