By the end of this lesson, you should be able to:
Files provide the functionality to store data on a disk. Computers have two kinds of storage for saving data. One is Random Access Memory (RAM) and the other is a disk, which is also called a hard drive.
RAM has the following characteristics:
A hard drive or a disk has the following features:
Installing software like Eclipse stores it on the disk on your computer. When you open Eclipse for writing programs, it is loaded from the disk to the RAM (working memory), and it is executed from there. It remains in the RAM as long as you are using it, and once you close Eclipse, it can be removed from the RAM to make space for other software. It takes longer to start Eclipse, as it is on the disk initially (slower) and it is being loaded in the RAM, but it works interactively when it is open as it is present in the RAM (faster).
When you write a program, it is being stored in the RAM as you code. Once you save the program by specifying a file name, it is written on the disk. That is why you can access your programs even after you have shut down the computer and restarted it.
In this lesson we will learn how we can work with the files. The files store the data, and the directories are used to maintain a hierarchical scheme to save files. Hierarchical structure helps users organize the files.
Let's look at a real life scenario that we can use to understand filing and its key concepts.
Consider you went on a trip with your family over the summers to see the Great Pyramids in Egypt. On your return, your parents ask you and your younger sister Maria to co-produce a writeup on your journey, so that it can be shared with your grandparents. Your younger sister decides to make the first draft that you can proof read and modify later.
Below is the sequence of events that took place.
Events | Technical Term to note | |
---|---|---|
1. | Maria opens a notebook to write about the pyramids. | Open the file for writing |
2. | Maria writes into the notebook. | Write into the file |
3. | Maria closes her notebook. | Close the file which was opened for writing |
4. | You go into her room and reach into the drawer to access the notebook. | Path to the file |
5. | You open her notebook. | Open the file for reading |
6. | You read her writeup. | Read the file |
7. | You close her notebook to reflect on her writing. | Close the file which was opened for reading |
8. | She is young and didn't do a good job, as she didn't like the
pyramids and was not interested in them. You open and erase what she wrote and rewrite again on the same page. |
Open the file for writingNote: Technically you just opened the file and everything is erased automatically. |
9. | You write into the notebook. | Write into the file |
10. | It's dinner time, and you also need a break. You close the notebook. | Close the file which was opened for writing. |
11. | You return to the notebook, open it, and start writing to complete your task. | Open the file to append Note: Nothing is deleted, and your earlier work is still available. |
12. | You close the notebook. | Close the file which was opened for appending |
There are some important technical points to note from the last column.
open
a file before reading or writing
into it. The mode for opening (read
, write
etc.) is mentioned while opening the file.
open
a file for writing.
open
a file for appending (mode: append
).
While working with the files, programmers first need to open
the file by specifying the path, the name, and the mode of opening. The
following function is used in Python:
file_object = open('C:\\trip\\pyramids.txt', 'r')
file_object
is the name of the variable that references the
file object. It can be used toperform filing operations. open
is the built-in function that is used for opening a file. Two arguments
are passed to the function. The first parameter specifies the path
(C:\\trip\\) and the filename (pyramids.txt) of the file to open,
whereas the second one specifies the mode. r
means the file
will be opened for reading only.
There are modes to perform reading and writing simultaneously, but they are outside the scope of this lesson. The rest of the modes are listed below:
Mode | Description | |
---|---|---|
1. | 'r' |
Opens a text file for reading only. It does
not allow writing into the file and does not create the file if it
does not exist already. Text file: The data in such files is encoded using an ASCII or a Unicode scheme. It is in human readable format and it can be opened in computer programs like TextPad or TextEdit. |
2. | 'rb' |
Opens a binary file for reading only. It
does not allow writing into the file and does not create the file if
it does not exist already. Binary file: The data in the binary file is not in human readable format. Images, videos and sound files are stored as binary files. |
3. | 'w' |
Opens a text file for writing only. It does not allow the
reading of the file. If the file does not exist, it will be created,
and if the file already exists it will be overwritten. Do not use this modifier if you want to keep the current contents of the file. |
4. | 'wb' |
Opens a binary file for writing only. All the properties of w
are applicable here as well.
|
5. | 'a' |
Opens a text file for appending only. It does not allow the
reading of the file. If the file does not exist, it will be created,
and if the file exists it will not be overwritten. Use this modifier if you want to keep the current contents of the file. |
6. | 'ab' |
Opens a binary file for appending only. All the properties of
a are applicable here as well.
|
As explained earlier, any file that is opened, whether it is for reading, writing or appending, must be closed. The following syntax can be used to close an open file.
file_object.close()
Technical Note:
The end of a line is specified using \n in files. It is similar to the strings.
In this section, we will learn about reading and processing data from a text file. The following functions are commonly used for reading data.
Code | Description |
---|---|
contents = fl.read() |
fl is an object that refers to the file that is
open in read mode. read is a
built-in function that reads all the contents of a file as a single
string and returns it. The string will be
assigned to a variable named contents in this case. |
line = fl.readline() |
readline is a built-in function that reads a
single line from a file and returns it. Each line in a file is
terminated with \n except the last one. |
list = fl.readlines() |
readlines reads each line of a file as a
separate string and stores each of those strings in a list .
The list is then returned back. |
Code Listing 1 shows a simple program that reads the same file thrice using three different methods and displays the content on the console.
# Program Name: Prog 10-01
# This program is part of Lesson 10 of the course
def read_all (file_obj):
contents = file_obj.read()
print (contents)
print ("*** Printing Ends ***")
def read_line (file_obj):
line = file_obj.readline()
print(line)
line = file_obj.readline()
print(line)
line = file_obj.readline()
print(line)
print ("*** Printing Ends ***")
def read_list (file_obj):
a_list = file_obj.readlines()
print(a_list)
print ("*** Printing Ends ***")
2 def main ():
file_object = open("code_listing_1.txt", 'r')
read_all(file_object)
file_object.seek(0)
30. read_line(file_object)
3
3 file_object.seek(0)
3 read_list(file_object)
3
3 file_object.close()
3 main()
The text file that we use in this program is named code_listing_1.txt and it is present in the same directory where the Python program is saved too. The file is given below:
First line of text
Second line of text
Third line of text
It is important to note the following points about code_listing_1.txt:
\n
is present after line 1 and line
2
\n
is not displayed, but it is stored as part of
the file. It is used for line breaks when displaying the file.\n
after the third line.
Internally the file looks like:
First line of text\n
Second line of text\n
Third line of text
Prog 10-01 has four functions. The main function opens and closes the file, whereas the rest of the three functions print the contents of the file using three different methods.
The output of the program is given in Console 1. Let's go through the code line by line.
Lines 4, 9, 20, 25, 36 | Four function headers are read by the Python Interpreter and
the main is called for execution: read_all :
Reads and prints the entire contents of the file.read_line :
Reads and prints the three lines of the file individually.read_list : Reads the entire data of the file as a list
of strings and displays the list on the console.
|
Line 26 | The code_listing_1.txt is opened in read only
mode. A file object is assigned to file_object . There
is no path mentioned with the file name, as the program Prog
10-01 and code_listing_1.txt are in the same directory. We will
need to specify a path only when the program and the file are in
different directories.
|
Line 27, 4 | read _all function is called and the file object
is passed to the function. The function starts execution. |
Line 5 | The read method is called using read file_obj.
This method will read the entire data stored in code_listing_1.txt
and assign it to the variable named contents . \n
at the end of lines are also copied, and they are part of the string
assigned to contents . The contents
at this point will contain a string like the following:First line of text\nSecond line of text\nThird line of text |
Lines 6, 7 | The string assigned to contents is
printed. We expect a newline after the first and the second line of
text due to \n present in the string . The
third line of text does not have \n at the end but we
still expect a newline after the third line due to the print
function. You might remember from lesson 2 that print function automatically starts a new line at the
end. *** Printing Ends *** is displayed below
the contents. See the first four lines of Console 1.
|
Line 29 | As a child you might have read a page by placing your index
figure under the text. Python reads a file similarly. It starts
reading from position 0 (the start of the file) and goes forward
till the end-of-file. When a file has been read
completely (like in line 5) the position of the file reaches the
end. From there on you can:
seek function is called in Line 29 and passed 0
(starting position of the file) to reset the position.
|
Line 30,9 | The read_line function is called and it starts
execution.
|
Line 10 | The readline function is called to read one line
of text from the file. A line is read including the \n
at the end of the line.
|
Line 11 | The first line is printed on the console. We can expect two
newlines at the end of this print function. First, due
to the presence of \n at the end of the string ,
and second, due to the print function. This will result
in an empty line after the first line of text. See the 5th
and the 6th line of the Console 1.
|
Lines 13, 14 | The second line of text is printed with an empty line following it. |
Line 16 | The last line of text is read. This line does not have \n
at the end.
|
Lines 17, 18 | The last line is printed, but only one newline will be
present, only due to the print function. We should not
expect an empty line before *** Printing Ends *** . See
the 9th and the 10th line of Console 1.
|
Lines 32, 33, 20 | File position is reset to the start, and the read_lines
function starts execution.
|
Line 21 | All the lines of text in the file are read and each line (string )
is stored as an element of the list . The strings
will be read along with the \n .
|
Lines 22, 23 | The list is printed with ***
Printing Ends *** below it. See the last two lines of Console 1.
|
Line 35 | The file is closed. |
First line of text
Second line of text
Third line of text
*** Printing Ends ***
First line of text
Second line of text
Third line of text
*** Printing Ends ***
['First line of text\n', 'Second line of text\n', 'Third line of text']
*** Printing Ends ***
We have noticed that there are unwanted newlines when we read a file
using the readline
method. This can be solved by removing
the \n
at the end of the string. The following code can
help us achieve that.
line = file_obj.readline()
line = line.rstrip("\n")
print(line)
We can also overwrite the default behaviour of print
to not
initiate a new line at the end of printing:
print(line, end='')
The data in a file can be read conveniently using the for
and the while
loops. The data is often present as multiple
lines of text. The read_line
function in Prog
10-01 reads one line of data from the file at a time. It seems readline
and readlines
functions are ideally suited to extract data
using a loop.
Prog 10-02 in Code Listing 2 reads the data from
code_listing_1.txt line by line. The first word of each line is then
capitalized and displayed through string
processing.
# Program Name: Prog 10-02
# This program is part of Lesson 10 of the course
def main ():
file_object = open("code_listing_1.txt", 'r')
for line in file_object:
ind = line.index(' ')
str = line[:ind]
str = str.capitalize()
print(str)
file_object.close()
1 main()
The text file is opened in read
only mode. A for
loop is called with a target variable named line
. The file
object called file_object
is used in the loop header. The for
loop extracts one line of text from the text file in each iteration of
the loop. The line of text is assigned to the variable line
as a string
.
Inside the loop body, the index position of the first occurrence of
space is extracted using the index
function and is assigned
to the variable ind
. The value of ind
will be
5 for the first and the third iteration and 6 for the second iteration
of the loop. The ind
is used to slice the string
,
and the resulting substring is assigned to str
. str
is then capitalized and displayed on the console using the print
function.
The output of Prog 10-02 is shown in Console 2 below.
First
Second
Third
While loops are moregeneral purpose loops, and therefore programmers need to put in the termination condition to end the loop.
We will take the same file that we used in the last two programs and
read it twice using the while loop. The data is extracted using the readline
method first, and then using the readlines
method. Prog
10-03 in Code Listing 3 shows the code.
# Program Name: Prog 10-03
# This program is part of Lesson 10 of the course
def read_data1 (file_obj):
print("***Read Data 1***")
line = file_obj.readline()
while line != '':
line = line.rstrip("\n")
print(line.capitalize(), end=".\n")
line = file_obj.readline()
def read_data2 (file_obj):
print("***Read Data 2***")
lines = file_obj.readlines();
ind = 0
while (ind < len(lines)):
line = lines[ind]
line = line.rstrip("\n")
print(line.capitalize(), end=".\n")
ind += 1
2 def main ():
file_object = open("code_listing_1.txt", 'r')
read_data1 (file_object)
file_object.seek(0)
read_data2 (file_object)
file_object.close()
30.
3 main()
The main
function opens the file (code_listing_1.txt) in read
only mode, calls the read_data1
function and passes the
file object. Once the read_data
function has been
completely executed, it is expected that the file position has been
changed. The position is reset to zero before calling the read_line2
function. The file is closed before the program ends.
The line by line description of read_line1
is given below:
Line 4 | The read_line1 function starts execution with a
file object of an open file passed to it as a parameter.
|
Line 5 | ***Read Data 1*** is printed on the console. |
Line 6 | The first line of the file is read as a string
and assigned to the variable line . The string assigned
to the variable line will be: First line of text\n The readline function returns an empty string if the
file is empty.
|
Line 7 | While loop starts execution, and it will iterate
as long as the readline function does not return an
empty string. |
Line 8 | The \n at the end of the line is removed.
|
Line 9 | The text is capitalized, a period is added to the end of the
string and \n is also appended to the
output to start a newline.
|
Line 10 | The next line is extracted from the file and the while
loop will start the next iteration.
|
The line by line description of read_line2
is given below:
Line 12 | The function read_data2 starts execution with a
file object passed as a parameter.
|
Line 13 | ***Read Data 2*** is displayed on the console. |
Line 14 | The entire data in the file is read a list of strings .
The data assigned to the variable lines will be:['First line of text\n', 'Second line of text\n', 'Third line of text'] |
Line 15 | ind is a loop variable, and it is initialized
with a zero. |
Lines 16, 20 | A while loop starts iterating, and it will
iterate as long as the value of ind is less than the
length of lines . The value of ind is
incremented by one in each iteration of the loop.
|
Line 17 | One string from the list is
extracted in each iteration.
|
Line 18 | The \n at the end of the string is
removed
|
Line 19 | The string is capitalized and .\n
is appended to the end before printing on the console.
|
The output of Prog 10-03 is shown in Console 3.
***Read Data 1***
First line of text.
Second line of text.
Third line of text.
***Read Data 2***
First line of text.
Second line of text.
Third line of text.
Files with comma separated values (csv) are commonly used when dealing with numeric data. Data elements in csv are separated by commas from each other. A csv file named code_listing_4.txt is shown below:
Quizzes,10,5,5,8,9
Assignments,10,10,9,7
Activities,5,5,10,10,10,9,9,8,7
We want to write a program to calculate the average of the data in each line of the file. Prog 10-04 shown in Code Listing 4 extracts each line of data, and then calculates the average of all the values except the first one.
# Program Name: Prog 10-04
# This program is part of Lesson 10 of the course
def main ():
file_object = open("code_listing_4.txt", 'r')
for line in file_object:
line = line.rstrip('\n')
a_list = line.split(',')
print(f"Average of {a_list[0]}", end = ':')
ind = 1
total = 0
while ind < len(a_list):
total += int(a_list[ind])
ind += 1
ind -= 1
average = total / ind
print(f"{average:.2f}")
file_object.close()
main()
Let's go through the code line by line:
Lines 5, 21 | The file is opened in read only mode, and the
file object is stored in the file_object . The code for
closing the file is in line 21.
|
Line 6 | A for loop is initiated, which will extract the
data in the file line by line as strings . The extracted
line of text is assigned to the target variable named line .
The value of line for the first iteration of the for
loop will be:Quizzes,10,5,5,8,9\n
|
Line 7 | \n is removed from the end of the string
and the line becomes:Quizzes,10,5,5,8,9
|
Line 8 | A strip method is called to divide the string
into multiple strings based on position of the commas, and the
resulting substrings are returned as a list . The value
of a_list will be:['Quizzes',
'10', '5', '5', '8', '9'] As expected, all the values are strings , and we need to typecast them to integers
before applying mathematical operations.
|
Line 10 | Average of is printed on the console, followed
by the first element of the list , which is Quizzes
in case of the first iteration of the loop. A newline is not
started, and instead just a colon is displayed. Later, we plan to
calculate the average of the marks and print that. |
Lines 11, 12 | A loop variable called ind is initialized to 1.
We know from the structure of the file that the first value in each
line of data is a string, and the numeric data starts from the
element at the index number 1 onwards. The number of elements in
each line can vary. The total is initialized to zero,
as we will use this variable to calculate the total of all the
numbers in a line of data.
|
Line 14 | A while loop starts execution. It is an inner
loop, and it will fully iterate for each iteration of the outer for
loop. The first iteration will begin from the index position 1 and
not 0. For the first iteration of the for loop the while
loop will iterate 5 times.
|
Line 15 | One value in the list is type casted to integer
and then added to the total . We know that for the first
iteration of the for loop the value of a_list
will be:['Quizzes', '10', '5', '5', '8', '9']
The values of ind (at the start of
the loop) and total in each iteration of the while
loop are shown below:
|
Line 16 | The value of ind is incremented by one in each
iteration of the loop.
|
Lines 18, 19 | The value of ind will be one more than the
number of elements when the while loop terminates. We
decrement the value of ind by one so that it can be
used to calculate the average. The average is calculated, and it is
assigned to a variable named average .
|
Line 20 | The value of average is printed on the console.
It will be displayed in front of the string printed in
line 10.
|
The output of Prog 10-04 is shown below:
Average of Quizzes:7.40
Average of Assignments:9.00
Average of Activities:8.11
The two most important tasks in filing are the ability to read a file and to write a file. In this section we will learn how we can write data into the text files. Two modes are used for opening a file for writing.
Mode 'a', which stands for append
, is used when you want to
add more contents to a file without deleting it. An example can be a
Word document. You may start writing a report and later on you might
want to take a break and then begin from where you left off.
Mode 'w' stands for write
. When a file is opened in this
mode, any contents that exist in the file are overwritten. Examples of
its use cases can be a file that stores the highest score of a game or a
file that contains the application's settings.
Technical Note:
Be careful in choosing the correct mode when writing into a file. A
mistake of 'w
' instead of an 'a
' can result
in losing all the contents of a file. Python won't even warn you
before deleting everything inside a file.
We have already discussed that it is not uncommon to forget to close a
file that was opened by a programmer. The Python language provides a with
keyword which automates the closing of an open file. A file is opened
using the with
keyword and the file remains open only in
the body of the with
block. As soon as the program
execution leaves the with
block, the file is automatically
closed by the Python Interpreter.
Following syntax is useful when writing into the files or when using the
with
keyword:
fo = open("name.txt", 'w') |
Opens the file name.txt in write mode. The file
object is assigned to fo .
|
fo = open("name.txt", 'a') |
Opens the file name.txt in append mode. The file
object is assigned to fo .
|
num = fo.write('Line of text') |
The string Line of text is written into the file
using the file object. The total number of characters written into
the file are returned and assigned to num .
|
fo.flush() |
The flush method flushes the internal buffer.
When we write into a file using the write method, the
data is not directly written into the file. It is first stored in a
buffer, and it is then copied into the file when the buffer is full.
The flush method immediately results in the data being
written into the file without waiting for the buffer to fill up. It
is usually used when you have completed writing into the file and
waiting for the buffer to fill up is of no use.
|
with open('na.txt', 'r') as fo: |
The with keyword is used to open a file named
na.txt in read only mode. The name of the file object
is fo .
|
In Prog 10-05 we demonstrate the file writing as well
as the use of with
keyword.
The program asks the user to write a story which is being saved line by line as the user writes. Later on the story is displayed back on the console for proof reading.
# Program Name: Prog 10-05
# This program is part of Lesson 10 of the course
def main ():
file_object = open("code_listing_5.txt", 'w')
print("Write your story:")
bytes_written = 0
str_input = ""
while (str_input != "exit"):
str_input = input(" ")
bytes_written += file_object.write(str_input + '\n')
file_object.flush()
file_object.close()
print(f"\nBytes Written: {bytes_written:d}")
print("\nProof read your story")
with open('code_listing_5.txt', 'r') as file_obj:
list_of_lines = file_obj.readlines()
for line in list_of_lines:
print(line, end = '')
main()
The line by line explanation of the code is given below:
Line 5 | The file code_listing_5.txt is opened in write
only mode. Any data present in the file will be deleted.
|
Line 7 | Write your story: is printed on the console. |
Line 8 | A variable named bytes_written is initialized
with a value of zero. We want to use this variable to record the
total number of characters in the user's story.
|
Line 9 | The str_input is a variable that will be used to
take user input as a string, and is then written into the file
object named file_object . The str_input is
initialized with an empty string .
|
Line 11 | A while loop starts iterating, and it will
terminate when the value of str_input is equal to exit .
As we know that the user input will be assigned to str_input ,
therefore the user needs to input exit when the story
is finished to terminate the loop.
|
Line 12 | User input is taken on the console and assigned to str_input .
|
Line 13 | The write function is used to save the string
assigned to str_input into the file and \n
is appended to signal the end of a line. The write
function returns the number of characters written into the file, and
this value is added to the bytes_written .
|
Lines 15, 16 | The flush function is used to write any
remaining data in the buffer to the file. The file is closed after
flushing the data.
|
Line 18 | The number of characters in the story are printed on the console. |
Line 20 | Proof read your story is printed on the console. |
Line 22 | The with keyword is used to open the file in read
only mode. The name of the file object is file_obj .
This file will remain open as long as the Python Interpreter is
executing the block of code which is part of the with
statement (just line 23 in this case).
|
Line 23 | The data in the file is read as a list of strings
using the readlines function. Each string
in the list is a line of text in the file.
|
Line 25 | A for loop iterates over the list .
|
Line 26 | Each element (string ) in the list
is printed on the console.
|
The output of the code is given in Console 5.
Write your story:
Very Short Story:
Once upon a time there was a child.
He wanted to become a programmer. He took admission at WLU.
On track to start writing code soon!
exit
Bytes Written: 156
Proof read your story
Very Short Story:
Once upon a time there was a child.
He wanted to become a programmer. He took admission at WLU.
On track to start writing code soon!
exit
In this lesson we learnt how to read and write data into text files and
in binary files. Several methods for reading data were discussed and the
with
keyword was also introduced. The assignment will
reinforce the concepts covered in this lesson and in the textbook.
We hope you are ready to attempt this week’s assessments. They are available on MyLearningSpace. You are encouraged to consult the syllabus for clarity about the deadlines for preparation activities, quizzes, and assignments. Please regularly visit MyLearningSpace for announcements, course material, and discussion boards.