split
A file is a collection of data saved under a unique name. In general there are two types of files: binary and text. Text files can be read by humans and consist of characters. Binary files are just that - binary, i.e. photographs, executable code, etc. Python works with either.
We will not be working with binary files in CP104.
We talk about "writing to a file" and "reading from a file".
The contents of a file can be accessed in order (sequential access) or a particular position in the file can be accessed (direct access). The access method you choose depends on how the data is structured within the file.
The usual steps in a program when dealing with a file are:
In Python the syntax for opening a text file is
fh = open(filename, mode,
encoding="utf-8")
fh
(file handle
):
the variable that Python uses to represent the file while processing it
- it is not the name of the file as stored on disk (that is the
variable filename
).filename
: the name of the file on disk. The
only time the name of the file is used is when opening the file.mode
: determines how the file is opened -
for reading, writing, or appending.encoding="utf-8"
: determines how the text in the
file is to be read. There are a number of encodings, but Python 3.0+
generally works with an encoding called 'utf-8'. This works with plain
old ASCII text as well as a wide range of non-English characters. It is
a good default encoding.Code | Description |
---|---|
"r" |
Read: Open the file for reading only - filename
must already exist. The file handle points to the beginning of the
file.
|
"w" |
Write: Open the file for writing only - if filename
already exists it is overwritten, otherwise it is created
|
"a" |
Append: Open the file for appending - if filename
exists data is appended to the end of it, otherwise it is created.
The file handle points to the end of the file.
|
"r+" |
Read and Write - filename must already exist.
The file handle points to the beginning of the file.
|
"w+" |
Read and Write - if filename exists it is
overwritten, otherwise it is created
|
"a+" |
Read and Write - if filename exists data is
appended to the end of it, otherwise it is created. The file handle
points to the end of the file.
|
An example:
fh = open("customers.txt", "r", encoding="utf-8")
opens the file "customers.txt" in the current folder for reading.
Errors are typically caused by:
r
and r+
modes
To close a file:
fh.close()
In every program you write you should close a file if you have opened it. However, it is generally good practice for a function that opens a file to close it after the file is processed. For example, if a function accepts a file handle as a parameter it should not close that file since it did not open it.
Failing to properly close a file may result in loss of data.
In Python the syntax for writing to a file is:
fh.write(f".. {value1} ... {value2} ...")
i.e. just a typical format
function where the resulting
string is written to the file handle. Note that you may explicitly have to
append the newline (end-of-line) character '\n
' if
the lines being written to the file do not already have a newline.
So we could fill a file by using the following code:
def file_fill(destinations_file, destinations):
# Write the trip locations to the file
for destination in destinations:
destinations_file.write(f"{destination}\n")
return
# Call the function
filename = "travelplans.txt"
output_file = open(filename, "w", encoding="utf-8")
city_list = ["Paris", "Prague", "London"]
file_fill(output_file, city_list)
output_file.close()
Our file travelplans.txt contains:
Paris Prague London
We can read data in much the same way once a file has been created:
def file_read(travel_file):
data = []
# Read the first line of the file.
line = travel_file.readline()
while line != "":
data.append(line.strip())
line = travel_file.readline()
return data
filename = "travelplans.txt"
travel_file = open(filename, "r", encoding="utf-8")
travel_contents = file_read(travel_file)
travel_file.close()
When we use a while loop we need to know when the file is finished. Note
that the readline
method returns an empty string (""
)
when it attempt to read beyond the end of the file.
Note also in this example we have passed the file handle, not the file name, as the parameter to the function. The file has already been opened and we can just use it.
Note also the use of the string method strip
(discussed in Strings), to strip off the trailing spaces and
end of line of each string read from the file. Without this you may end up
with extra blank lines in your results.
The same can be done with a for
loop:
def file_read(travel_file):
data = []
for line in travel_file:
data.append(line.strip())
return data
filename = input("Enter the file name: ")
travel_file = open(filename, "r", encoding="utf-8")
travel_contents = file_read(travel_file)
travel_file.close()
The for
loop iterates once for each line in the file, one
after another from beginning to end.
The for
loop looks much simpler, so why use a while
loop versus a for
loop? For similar reasons to using a while
loop over a for
loop in other situations: the for
loop processes the entire file every time. The while loop allows
us to add extra conditions to stop processing the file before reaching its
end. For example, say we had a function that just needed to know whether a
particular city was one of the destinations we were interested in - say,
'Prague'. In that case we would want to stop reading the file just as soon
as we found the word 'Prague' in it. Why read the rest of the file at that
point? Here is an example:
def has_city(travel_file, city_name):
# City name not yet found.
city_found = False
# Get the first city in the file.
city = travel_file.readline()
while city != "" and city.strip() != city_name:
# Get the next city in the file.
city = travel_file.readline()
# See why loop stopped - end of file or city names match?
if city != "":
# End of file not reached.
city_found = True
return city_found
filename = input("Enter the file name: ")
city_name = input("Enter the city name to look for: ")
travel_file = open(filename, "r", encoding="utf-8")
result = has_city(travel_file, city_name)
travel_file.close()
You may check to see if a file exists by using the Python function os.path.exists(filename)
. This function returns True
if filename
exists, and False
otherwise. You must import os
in order to use this function. For example:
from os import path
filename = input("Enter file name: ")
if path.exists(filename):
fh = open(filename, "r", encoding="utf-8")
# process file
...
fh.close()
else:
print(f"File {filename} does not exist.")
A marks file example:
def marks_process(marks_file):
total = 0
count = 0
for line in marks_file:
mark = int(line)
total = total + mark
count = count + 1
average = total / count
return average
marks_file = open("marksfile.txt", "r", encoding="utf-8")
marks_average = marks_process(marks_file)
marks_file.close()
In general functions that work with files should work correctly with empty files. Which of the examples given on this page would work fine with empty files and which ones may have problems?
There are different ways of reading the contents of a file, depending on what you want to do with those contents.
Read from a file until the end of file is reached, or some other condition(s) is reached:
line = fh.readline()
while line != "" and other condition:
# process line
line = fh.readline()
Read and process the first line of, then process the file until the end of file is reached, or some other condition(s) is reached:
line = fh.readline()
# process first line
line = fh.readline()
while line != "" and other condition:
# process line
line = fh.readline()
Note that the second line is read before the loop starts in order to have it processed by the loop. Otherwise, the first line is processed by the loop contents.
Read all of the lines in a file.
for line in fh:
# process line
Read and process the first line of a file, and then read the rest of the file.
line = fh.readline()
# process first line
for line in fh:
# process line
Read all of the lines of a file into a list of strings, one line per list item.
data = fh.readlines()
# process data
Read all the lines of a file into a single string.
big_string = fh.read()
# process big_string
split
Data stored in a file as a string may not be in the format we want it. We
may want our data in separate strings, or in a list. Once a line has been
read it can be broken up into appropriate chunks using either string
slicing or the split
method. split
is a string
method that breaks a string up into chunks by spaces (default), or based
upon some other string passed as a parameter, and returns the chunks in a
list. The separator character is discarded.
In order to avoid having end of line characters (\n
, \r
,
or \n\r
), strip()
the string before splitting
it. If a file contains the lines:
12345,Tom,Black,300.00,1998-01-30 23456,Alice,Smith,1200.50,1998-02-20 ...
then the following:
line = fh.readline()
data = line.split(",")
print(data)
produces:
['12345', 'Tom', 'Black', '300.00', '1998-01-30\n']
(note the \n
on the last list element) whereas:
line = fh.readline()
data = line.strip().split(",")
print(data)
produces:
['12345', 'Tom', 'Black', '300.00', '1998-01-30']
without the \n
. If there is no trailing newline character, strip()
has no effect.
Some examples:
Code | Result | Description |
---|---|---|
|
['David', 'Brown'] | List from string separated by spaces |
|
['first', 'second', 'third'] | List from string separated by commas |
|
['I ate hamburgers ', ' hot dogs ', ' pizza ', ' ice cream.' | List from string separated by "and" |
|
['No commas in this sentence.'] | List from string without the requested separator |
|
['1', '12', '94', '32', '10'] | List of digits |
Once the list is obtained its contents can be processed. In the last example, the list of digits could be turned into a list of integers with the following code:
numbers = []
for digit in digits:
numbers.append(int(digit))