Files

Tasks

  1. Complete reading this lesson, which involves the following:
    1. Introduction
    2. Reading a file
    3. Writing in a file
    4. Working with binary files
  2. Complete quiz 9 to verify your reading and understanding.
  3. Read and complete the activities at Zybook, Z10: Files.

Learning Outcomes

By the end of this lesson, you should be able to:

  1. Describe the difference between text files and binary files.
  2. Use modes in programs to open a file for appropriate filing operations.
  3. Write loops to read data from a file.
  4. Write loops to write data into a file.

Key Terms/Concepts

Introduction

Files provide the functionality to store data on a disk. Computers have two kinds of storage for saving data. One is Random Access Memory (RAM) and the other is a disk, which is also called a hard drive.

RAM has the following characteristics:

A hard drive or a disk has the following features:

Installing software like Eclipse stores it on the disk on your computer. When you open Eclipse for writing programs, it is loaded from the disk to the RAM (working memory), and it is executed from there. It remains in the RAM as long as you are using it, and once you close Eclipse, it can be removed from the RAM to make space for other software. It takes longer to start Eclipse, as it is on the disk initially (slower) and it is being loaded in the RAM, but it works interactively when it is open as it is present in the RAM (faster).

When you write a program, it is being stored in the RAM as you code. Once you save the program by specifying a file name, it is written on the disk. That is why you can access your programs even after you have shut down the computer and restarted it.

In this lesson we will learn how we can work with the files. The files store the data, and the directories are used to maintain a hierarchical scheme to save files. Hierarchical structure helps users organize the files.

10.1.1 Key Concepts

Let's look at a real life scenario that we can use to understand filing and its key concepts.

Consider you went on a trip with your family over the summers to see the Great Pyramids in Egypt. On your return, your parents ask you and your younger sister Maria to co-produce a writeup on your journey, so that it can be shared with your grandparents. Your younger sister decides to make the first draft that you can proof read and modify later.

Below is the sequence of events that took place.

Events Technical Term to note
1. Maria opens a notebook to write about the pyramids. Open the file for writing
2. Maria writes into the notebook. Write into the file
3. Maria closes her notebook. Close the file which was opened for writing
4. You go into her room and reach into the drawer to access the notebook. Path to the file
5. You open her notebook. Open the file for reading
6. You read her writeup. Read the file
7. You close her notebook to reflect on her writing. Close the file which was opened for reading
8. She is young and didn't do a good job, as she didn't like the pyramids and was not interested in them.
You open and erase what she wrote and rewrite again on the same page.
Open the file for writing
Note: Technically you just opened the file and everything is erased automatically.
9. You write into the notebook. Write into the file
10. It's dinner time, and you also need a break. You close the notebook. Close the file which was opened for writing.
11. You return to the notebook, open it, and start writing to complete your task. Open the file to append
Note: Nothing is deleted, and your earlier work is still available.
12. You close the notebook. Close the file which was opened for appending

There are some important technical points to note from the last column.

  1. Files have a path that you need to access them.
  2. You need to open a file before reading or writing into it. The mode for opening (read, write etc.) is mentioned while opening the file.
  3. Any file that you opened MUST be closed. You can end up with some of your text missing from the file or issues with accessing your file from outside the program if you do not properly close the file.
  4. The contents of the file are overwritten when you open a file for writing.
  5. If you plan to add or modify to what you already have in a file, you need to open a file for appending (mode: append).

10.1.2 Key Syntax

While working with the files, programmers first need to open the file by specifying the path, the name, and the mode of opening. The following function is used in Python:


file_object = open('C:\\trip\\pyramids.txt', 'r')

file_object is the name of the variable that references the file object. It can be used toperform filing operations. open is the built-in function that is used for opening a file. Two arguments are passed to the function. The first parameter specifies the path (C:\\trip\\) and the filename (pyramids.txt) of the file to open, whereas the second one specifies the mode. r means the file will be opened for reading only.

There are modes to perform reading and writing simultaneously, but they are outside the scope of this lesson. The rest of the modes are listed below:

Mode Description
1. 'r' Opens a text file for reading only. It does not allow writing into the file and does not create the file if it does not exist already.

Text file: The data in such files is encoded using an ASCII or a Unicode scheme. It is in human readable format and it can be opened in computer programs like TextPad or TextEdit.
2. 'rb' Opens a binary file for reading only. It does not allow writing into the file and does not create the file if it does not exist already.

Binary file: The data in the binary file is not in human readable format. Images, videos and sound files are stored as binary files.
3. 'w' Opens a text file for writing only. It does not allow the reading of the file. If the file does not exist, it will be created, and if the file already exists it will be overwritten.

Do not use this modifier if you want to keep the current contents of the file.
4. 'wb' Opens a binary file for writing only. All the properties of w are applicable here as well.
5. 'a' Opens a text file for appending only. It does not allow the reading of the file. If the file does not exist, it will be created, and if the file exists it will not be overwritten.

Use this modifier if you want to keep the current contents of the file.
6. 'ab' Opens a binary file for appending only. All the properties of a are applicable here as well.

As explained earlier, any file that is opened, whether it is for reading, writing or appending, must be closed. The following syntax can be used to close an open file.


file_object.close()

Technical Note:

The end of a line is specified using \n in files. It is similar to the strings.

Reading a File

In this section, we will learn about reading and processing data from a text file. The following functions are commonly used for reading data.

Code Description
contents = fl.read() fl is an object that refers to the file that is open in read mode. read is a built-in function that reads all the contents of a file as a single string and returns it. The string will be assigned to a variable named contents in this case.
line = fl.readline() readline is a built-in function that reads a single line from a file and returns it. Each line in a file is terminated with \n except the last one.
list = fl.readlines() readlines reads each line of a file as a separate string and stores each of those strings in a list. The list is then returned back.

10.2.1 Reading a Text File

Code Listing 1 shows a simple program that reads the same file thrice using three different methods and displays the content on the console.


 # Program Name: Prog 10-01
 # This program is part of Lesson 10 of the course 
  
 def read_all (file_obj):
        contents = file_obj.read()
        print (contents)
        print ("*** Printing Ends ***")
    
 def read_line (file_obj):
        line = file_obj.readline()
        print(line)
        
        line = file_obj.readline()
    print(line)
    
        line = file_obj.readline()
    print(line)
    print ("*** Printing Ends ***")

def read_list (file_obj):
    a_list = file_obj.readlines()
    print(a_list)
    print ("*** Printing Ends ***")

2 def main ():
    file_object = open("code_listing_1.txt", 'r')
    read_all(file_object)

    file_object.seek(0)
30.         read_line(file_object)
3   
3       file_object.seek(0)
3       read_list(file_object)
3  
3       file_object.close()
3 main()


Code Listing 1: (Prog 10-01 ) Prog 10-01.

The text file that we use in this program is named code_listing_1.txt and it is present in the same directory where the Python program is saved too. The file is given below:


First line of text
Second line of text
Third line of text

It is important to note the following points about code_listing_1.txt:

Internally the file looks like:


First line of text\n
Second line of text\n
Third line of text

Prog 10-01 has four functions. The main function opens and closes the file, whereas the rest of the three functions print the contents of the file using three different methods.

The output of the program is given in Console 1. Let's go through the code line by line.

Lines 4, 9, 20, 25, 36 Four function headers are read by the Python Interpreter and the main is called for execution:
read_all: Reads and prints the entire contents of the file.
read_line: Reads and prints the three lines of the file individually.
read_list: Reads the entire data of the file as a list of strings and displays the list on the console.  
Line 26 The code_listing_1.txt is opened in read only mode. A file object is assigned to file_object. There is no path mentioned with the file name, as the program Prog 10-01 and code_listing_1.txt are in the same directory. We will need to specify a path only when the program and the file are in different directories.
Line 27, 4 read _all function is called and the file object is passed to the function. The function starts execution.
Line 5 The read method is called using readfile_obj. This method will read the entire data stored in code_listing_1.txt and assign it to the variable named contents. \n at the end of lines are also copied, and they are part of the string assigned to contents.
The contents at this point will contain a string like the following:
First line of text\nSecond line of text\nThird line of text
Lines 6, 7 The string assigned to contents is printed. We expect a newline after the first and the second line of text due to \n present in the string. The third line of text does not have \n at the end but we still expect a newline after the third line due to the print function.

You might remember from lesson 2 that print function automatically starts a new line at the end.
*** Printing Ends *** is displayed below the contents. See the first four lines of Console 1.
Line 29 As a child you might have read a page by placing your index figure under the text. Python reads a file similarly. It starts reading from position 0 (the start of the file) and goes forward till the end-of-file. When a file has been read completely (like in line 5) the position of the file reaches the end. From there on you can:
  • Read again, which will read nothing as it will try to read from the current position onwards.
  • Close the file and then reopen it. It will reset the position to the start of the file.
  • Use the seek function to place the position at any location in the file.
The seek function is called in Line 29 and passed 0 (starting position of the file) to reset the position.
Line 30,9 The read_line function is called and it starts execution.
Line 10 The readline function is called to read one line of text from the file. A line is read including the \n at the end of the line.
Line 11 The first line is printed on the console. We can expect two newlines at the end of this print function. First, due to the presence of \n at the end of the string, and second, due to the print function. This will result in an empty line after the first line of text. See the 5th and the 6th line of the Console 1.
Lines 13, 14 The second line of text is printed with an empty line following it.
Line 16 The last line of text is read. This line does not have \n at the end.
Lines 17, 18 The last line is printed, but only one newline will be present, only due to the print function. We should not expect an empty line before *** Printing Ends ***. See the 9th and the 10th line of Console 1.
Lines 32, 33, 20 File position is reset to the start, and the read_lines function starts execution.
Line 21 All the lines of text in the file are read and each line (string) is stored as an element of the list. The strings will be read along with the \n.
Lines 22, 23 The list is printed with *** Printing Ends *** below it. See the last two lines of Console 1.
Line 35 The file is closed.

First line of text
Second line of text
Third line of text
*** Printing Ends ***
First line of text

Second line of text

Third line of text
*** Printing Ends ***
['First line of text\n', 'Second line of text\n', 'Third line of text']
*** Printing Ends *** 

Console 1: Output of Code Listing 1.

We have noticed that there are unwanted newlines when we read a file using the readline method. This can be solved by removing the \n at the end of the string. The following code can help us achieve that.


    line = file_obj.readline()
    line = line.rstrip("\n")
    print(line)

We can also overwrite the default behaviour of print to not initiate a new line at the end of printing:


    print(line, end='')

10.2.2 Using Loops for Reading Data

The data in a file can be read conveniently using the for and the while loops. The data is often present as multiple lines of text. The read_line function in Prog 10-01 reads one line of data from the file at a time. It seems readline and readlines functions are ideally suited to extract data using a loop.

For Loop

Prog 10-02 in Code Listing 2 reads the data from code_listing_1.txt line by line. The first word of each line is then capitalized and displayed through string processing.


 # Program Name: Prog 10-02
 # This program is part of Lesson 10 of the course 
  
 def main ():
        file_object = open("code_listing_1.txt", 'r')
    
        for line in file_object:
            ind = line.index(' ')
            str = line[:ind]
            str = str.capitalize()
            print(str)
        file_object.close()
  
1 main()

Code Listing 2: (Prog 10-02 ).

The text file is opened in read only mode. A for loop is called with a target variable named line. The file object called file_object is used in the loop header. The for loop extracts one line of text from the text file in each iteration of the loop. The line of text is assigned to the variable line as a string.

Inside the loop body, the index position of the first occurrence of space is extracted using the index function and is assigned to the variable ind. The value of ind will be 5 for the first and the third iteration and 6 for the second iteration of the loop. The ind is used to slice the string, and the resulting substring is assigned to str. str is then capitalized and displayed on the console using the print function.

The output of Prog 10-02 is shown in Console 2 below.


First
Second
Third

Console 2: Output of Code Listing 2.

While Loop

While loops are moregeneral purpose loops, and therefore programmers need to put in the termination condition to end the loop.

We will take the same file that we used in the last two programs and read it twice using the while loop. The data is extracted using the readline method first, and then using the readlines method. Prog 10-03 in Code Listing 3 shows the code.


 # Program Name: Prog 10-03
 # This program is part of Lesson 10 of the course 
  
 def read_data1 (file_obj):
        print("***Read Data 1***")
        line = file_obj.readline()
        while line != '':
            line = line.rstrip("\n")
            print(line.capitalize(), end=".\n")
            line = file_obj.readline()
  
 def read_data2 (file_obj):
        print("***Read Data 2***")
    lines = file_obj.readlines();
    ind = 0
    while (ind < len(lines)):
        line = lines[ind]
        line = line.rstrip("\n")
        print(line.capitalize(), end=".\n")
            ind += 1

2 def main ():
    file_object = open("code_listing_1.txt", 'r')

    read_data1 (file_object)
    file_object.seek(0)
    read_data2 (file_object)

    file_object.close()
30.     
3 main()

Code Listing 3: (Prog 10-03 ).

The main function opens the file (code_listing_1.txt) in read only mode, calls the read_data1 function and passes the file object. Once the read_data function has been completely executed, it is expected that the file position has been changed. The position is reset to zero before calling the read_line2 function. The file is closed before the program ends.

The line by line description of read_line1 is given below:

Line 4 The read_line1 function starts execution with a file object of an open file passed to it as a parameter.
Line 5 ***Read Data 1*** is printed on the console.
Line 6 The first line of the file is read as a string and assigned to the variable line. The string assigned to the variable line will be: First line of text\n
The readline function returns an empty string if the file is empty.
Line 7 While loop starts execution, and it will iterate as long as the readline function does not return an empty string.
Line 8 The \n at the end of the line is removed.
Line 9 The text is capitalized, a period is added to the end of the string and \n is also appended to the output to start a newline.
Line 10 The next line is extracted from the file and the while loop will start the next iteration.

The line by line description of read_line2 is given below:

Line 12 The function read_data2 starts execution with a file object passed as a parameter.
Line 13 ***Read Data 2*** is displayed on the console.
Line 14 The entire data in the file is read a list of strings. The data assigned to the variable lines will be:
['First line of text\n', 'Second line of text\n', 'Third line of text']
Line 15 ind is a loop variable, and it is initialized with a zero.
Lines 16, 20 A while loop starts iterating, and it will iterate as long as the value of ind is less than the length of lines. The value of ind is incremented by one in each iteration of the loop.
Line 17 One string from the list is extracted in each iteration.
Line 18 The \n at the end of the string is removed
Line 19 The string is capitalized and .\n is appended to the end before printing on the console.

The output of Prog 10-03 is shown in Console 3.


***Read Data 1***
First line of text.
Second line of text.
Third line of text.
***Read Data 2***
First line of text.
Second line of text.
Third line of text.

Console 3: Output of Code Listing 3.

10.2.3 Comma Separated Values

Files with comma separated values (csv) are commonly used when dealing with numeric data. Data elements in csv are separated by commas from each other. A csv file named code_listing_4.txt is shown below:


Quizzes,10,5,5,8,9
Assignments,10,10,9,7
Activities,5,5,10,10,10,9,9,8,7

We want to write a program to calculate the average of the data in each line of the file. Prog 10-04 shown in Code Listing 4 extracts each line of data, and then calculates the average of all the values except the first one.


 # Program Name: Prog 10-04
 # This program is part of Lesson 10 of the course 
  
 def main ():
        file_object = open("code_listing_4.txt", 'r')
        for line in file_object:
            line = line.rstrip('\n')
            a_list = line.split(',')
        
            print(f"Average of {a_list[0]}", end = ':')
            ind = 1
            total = 0
  
        while ind < len(a_list):
            total += int(a_list[ind])
            ind += 1
    
        ind -= 1
        average = total / ind   
            print(f"{average:.2f}")
    file_object.close()

main()

Code Listing 4: (Prog 10-04 ).

Let's go through the code line by line:

Lines 5, 21 The file is opened in read only mode, and the file object is stored in the file_object. The code for closing the file is in line 21.
Line 6 A for loop is initiated, which will extract the data in the file line by line as strings. The extracted line of text is assigned to the target variable named line. The value of line for the first iteration of the for loop will be:
Quizzes,10,5,5,8,9\n
Line 7 \n is removed from the end of the string and the line becomes:
Quizzes,10,5,5,8,9
Line 8 A strip method is called to divide the string into multiple strings based on position of the commas, and the resulting substrings are returned as a list. The value of a_list will be:
['Quizzes', '10', '5', '5', '8', '9']

As expected, all the values are strings, and we need to typecast them to integers before applying mathematical operations.
Line 10 Average of is printed on the console, followed by the first element of the list, which is Quizzes in case of the first iteration of the loop. A newline is not started, and instead just a colon is displayed. Later, we plan to calculate the average of the marks and print that.
Lines 11, 12 A loop variable called ind is initialized to 1. We know from the structure of the file that the first value in each line of data is a string, and the numeric data starts from the element at the index number 1 onwards. The number of elements in each line can vary. The total is initialized to zero, as we will use this variable to calculate the total of all the numbers in a line of data.
Line 14 A while loop starts execution. It is an inner loop, and it will fully iterate for each iteration of the outer for loop. The first iteration will begin from the index position 1 and not 0. For the first iteration of the for loop the while loop will iterate 5 times.
Line 15 One value in the list is type casted to integer and then added to the total. We know that for the first iteration of the for loop the value of a_list will be:
['Quizzes', '10', '5', '5', '8', '9']

The values of ind (at the start of the loop) and total in each iteration of the while loop are shown below:

Iteration 1: ind = 1, total = 10
Iteration 2: ind = 2, total = 15
Iteration 3: ind = 3, total = 20
Iteration 4: ind = 4, total = 28
Iteration 5: ind = 5, total = 37

Line 16 The value of ind is incremented by one in each iteration of the loop.
Lines 18, 19 The value of ind will be one more than the number of elements when the while loop terminates. We decrement the value of ind by one so that it can be used to calculate the average. The average is calculated, and it is assigned to a variable named average.
Line 20 The value of average is printed on the console. It will be displayed in front of the string printed in line 10.

The output of Prog 10-04 is shown below:


Average of Quizzes:7.40
Average of Assignments:9.00
Average of Activities:8.11

Console 4: Output of Code Listing 4.

Writing a File

The two most important tasks in filing are the ability to read a file and to write a file. In this section we will learn how we can write data into the text files. Two modes are used for opening a file for writing.

Mode 'a', which stands for append, is used when you want to add more contents to a file without deleting it. An example can be a Word document. You may start writing a report and later on you might want to take a break and then begin from where you left off.

Mode 'w' stands for write. When a file is opened in this mode, any contents that exist in the file are overwritten. Examples of its use cases can be a file that stores the highest score of a game or a file that contains the application's settings.

Technical Note:

Be careful in choosing the correct mode when writing into a file. A mistake of 'w' instead of an 'a' can result in losing all the contents of a file. Python won't even warn you before deleting everything inside a file.

10.3.1 with Keyword

We have already discussed that it is not uncommon to forget to close a file that was opened by a programmer. The Python language provides a with keyword which automates the closing of an open file. A file is opened using the with keyword and the file remains open only in the body of the with block. As soon as the program execution leaves the with block, the file is automatically closed by the Python Interpreter.

Following syntax is useful when writing into the files or when using the with keyword:

fo = open("name.txt", 'w') Opens the file name.txt in write mode. The file object is assigned to fo.
fo = open("name.txt", 'a') Opens the file name.txt in append mode. The file object is assigned to fo.
num = fo.write('Line of text') The string Line of text is written into the file using the file object. The total number of characters written into the file are returned and assigned to num.
fo.flush() The flush method flushes the internal buffer. When we write into a file using the write method, the data is not directly written into the file. It is first stored in a buffer, and it is then copied into the file when the buffer is full. The flush method immediately results in the data being written into the file without waiting for the buffer to fill up. It is usually used when you have completed writing into the file and waiting for the buffer to fill up is of no use.
with open('na.txt', 'r') as fo: The with keyword is used to open a file named na.txt in read only mode. The name of the file object is fo.

In Prog 10-05 we demonstrate the file writing as well as the use of with keyword.

The program asks the user to write a story which is being saved line by line as the user writes. Later on the story is displayed back on the console for proof reading.


 # Program Name: Prog 10-05
 # This program is part of Lesson 10 of the course 
  
 def main ():
        file_object = open("code_listing_5.txt", 'w')
  
        print("Write your story:")
        bytes_written = 0
        str_input = ""
 
        while (str_input != "exit"):
            str_input = input(" ")
            bytes_written += file_object.write(str_input + '\n')

    file_object.flush()
    file_object.close()

    print(f"\nBytes Written: {bytes_written:d}")

        print("\nProof read your story")

    with open('code_listing_5.txt', 'r') as file_obj:
        list_of_lines = file_obj.readlines()

    for line in list_of_lines:
        print(line, end = '')

main() 

Code Listing 5: (Prog 10-05 ).

The line by line explanation of the code is given below:

Line 5 The file code_listing_5.txt is opened in write only mode. Any data present in the file will be deleted.
Line 7 Write your story: is printed on the console.
Line 8 A variable named bytes_written is initialized with a value of zero. We want to use this variable to record the total number of characters in the user's story.
Line 9 The str_input is a variable that will be used to take user input as a string, and is then written into the file object named file_object. The str_input is initialized with an empty string.
Line 11 A while loop starts iterating, and it will terminate when the value of str_input is equal to exit. As we know that the user input will be assigned to str_input, therefore the user needs to input exit when the story is finished to terminate the loop.
Line 12 User input is taken on the console and assigned to str_input.
Line 13 The write function is used to save the string assigned to str_input into the file and \n is appended to signal the end of a line. The write function returns the number of characters written into the file, and this value is added to the bytes_written.
Lines 15, 16 The flush function is used to write any remaining data in the buffer to the file. The file is closed after flushing the data.
Line 18 The number of characters in the story are printed on the console.
Line 20 Proof read your story is printed on the console.
Line 22 The with keyword is used to open the file in read only mode. The name of the file object is file_obj. This file will remain open as long as the Python Interpreter is executing the block of code which is part of the with statement (just line 23 in this case).
Line 23 The data in the file is read as a list of strings using the readlines function. Each string in the list is a line of text in the file.
Line 25 A for loop iterates over the list.
Line 26 Each element (string) in the list is printed on the console.

The output of the code is given in Console 5.


Write your story:
 Very Short Story:
 Once upon a time there was a child.
 He wanted to become a programmer. He took admission at WLU.
 On track to start writing code soon!
 exit

Bytes Written: 156

Proof read your story
Very Short Story:
Once upon a time there was a child.
He wanted to become a programmer. He took admission at WLU.
On track to start writing code soon!
exit 

Console 5: Output of Code Listing 5.

Conclusion

In this lesson we learnt how to read and write data into text files and in binary files. Several methods for reading data were discussed and the with keyword was also introduced. The assignment will reinforce the concepts covered in this lesson and in the textbook.

We hope you are ready to attempt this week’s assessments. They are available on MyLearningSpace. You are encouraged to consult the syllabus for clarity about the deadlines for preparation activities, quizzes, and assignments. Please regularly visit MyLearningSpace for announcements, course material, and discussion boards.