By the end of this lesson, you should be able to:
Speech is the main communication medium for humans. We regularly speak, listen, read, write and communicate our thoughts using human language. Languages are constructed using a composition of sounds that form words which are organized in sentences. For most languages, sounds are also scripted using characters, allowing for a written medium of communication. Anything that humans write to convey words (compared to other forms of expression like paintings) is referred to, using computer programming language, as strings.
Strings in programming languages play a central role in data processing. Most computer files exist in the form of strings, or at least store some strings in the form of file metadata. When users interact with the console using basic input and output operations, strings are the main medium. If you are using a Linux operating system, string manipulation is essential to your interaction with the machine. The applications of strings are countless, so approach this lesson with motivation.
In this lesson, you will learn how to construct and manipulate strings in the Python language. You will be introduced to a variety of built-in tools to inspect and manipulate strings. You will also undertake some steps to create your own functions to manipulate strings. The lesson is rich with small coding examples. As you progress in your programming skills, you tend to speak less and code more.
You are familiar with the concept of creating strings. However, let us have a quick refresher.
Strings in Python are created using single quotes, double quotes or triple quotes. For instance, the following variables are all strings:
str1 = 'Wilfrid Laurier'
str2 = "Waterloo"
str3 = """Ontario"""
Once the above initializations are executed, Python creates objects of
type str
, which is one of the basic data types in the
language. You can verify this by printing print(type(str1))
which will produce: <class 'str'>
.
When we want to create an empty string, we simply use quotes with no characters in between, like:
str1 = ''
str2 = ""
str3 = """"""
However, for simplicity, it is a convention for programmers to use single quotes when constructing an empty string.
Strings could also be constructed using the plus + operator. This process of combining two strings together is called string concatenation. An example of this:
str1 = 'AB'
str2 = 'CD'
str3 = str1 + str2
The value of str3 will be 'ABCD'
.
We can also use the star operator to concatenate multiple instances of the strings. For example,
str1 = 'AB'
str2 = str1*4
The value of str2 will be 'ABABABAB'
.
As you see, constructing strings in Python is simple and there is a flexibility in how this could be achieved. Next, we will learn about some basic features of strings in Python.
As noted in Lesson 7, Python strings are iterable
objects. This means we can use for
loops to iterate through
its members, which are characters. The format for doing this is:
for <var> in <str var>:
<block of code>
Strings display similar features to Lists, which you studied in Lesson
8. For instance, the len
operator is used to count the
number of characters in a string, just as it is used to count the number
of items in a list. Observe the following console.
>>> str1 = 'Waterloo'
>>> print(len(str1))
8
Strings are also indexed and ordered objects in Python. Therefore, we can access characters using square brackets [ ] similar to how you would do for a list.
>>> str1 = 'Waterloo'
>>> str1[0]
'W'
>>> str1[2]
't'
>>> str1[-1]
'o'
We can also access a group of characters at the same time. These are called substrings. The process is similar to lists.
>>> str1 = 'Waterloo'
>>> str1[:2]
'Wa'
>>> str1[2:]
'terloo'
>>> str1[:]
'Waterloo'
This strong similarity to lists is good to celebrate. However, there is one important distinction which you should be aware of.
Unlike lists, strings are immutable objects. This means you can not change them once you have created them. You can modify the elements of a list as you like, either by adding elements, deleting elements or updating the values, but you cannot do this in strings. Study the following example:
>>> list1 = ['a', 'b','c']
>>> list1[1] = 'Z'
>>> list1
['a', 'Z', 'c']
>>> str1 = 'abc'
>>> str1[1] = 'Z'
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
str1[1] = 'Z'
TypeError: 'str' object does not support item assignment
As you see, the interpreter generated an error when attempting to modify the middle character in the string, but allowed changing the middle element of a list.
Remember this: Strings are immutable objects in Python. Forgetting this point is a common programming mistake for beginners.
To summarize, strings are iterable, indexed and immutable objects in Python.
As noted in the previous section, strings are immutable. You might be wondering how strings could be modified. It is actually very common to come across scenarios that require string modification. For instance, a clerk might need to change a typo in a client's name, or you could update your address in your online bank account. We call this string manipulation.
The term string manipulation is a generic term that includes a wide variety of operations on strings. This includes changing a string, parsing a string, analyzing a string, searching a string and extracting substrings. We will examine string manipulation in more details in Section 9.4. For now, we want to focus on the simple scenario of changing a character in a string.
The main idea to manually change a string is to construct a new string that gets rid of the undesired character replaced with the new one.
Assume you have the following string:
my_str = 'Waterfoo'
You recognize that there is a typo in the above string and you are interested in correcting it.
The character that we want to change is the 'f'
character
which is located at position 5. Therefore, all characters before and
after 5 should remain the same, and we need to replace the character at
position 5. We can do this through the following command:
my_str = my_str[:5] + 'l' + my_str[6:]
The above statement contains both slicing and concatenation operations.
The slicing is detected at the expression: my_str[:5]
and
my_str[6:]
. The concatenation process is detected through
the use of the + operator. The above statement is equivalent to writing:
my_str = 'Water' + 'l' + 'oo'
If there are multiple typos in the string, we can reconstruct the string in the same manner.
For instance, to correct the following typos:
my_str = 'Wilfrik Lauriar'
We can use:
my_str = my_str[:6] + 'd' + my_str[7:13] + 'e' + my_str[14:]
As you see, the above construction is not as simple as changing elements in a list. At the same time, it is not that difficult. You just need to do little extra work. If the string is short, as in the above example, we can opt for rewriting the string, i.e. my_str = 'Wilfrik Lauriar'.
However, for a large text, e.g. editing a book, rewriting the entire text would not make sense. As you become more familiar with string manipulation techniques, you will be able to modify strings in more efficient ways.
In the previous section, you saw how strings could be constructed and modified manually. To provide some convenience, the Python language has several built-in string manipulation methods. This would save us the burden of performing 'manual' string manipulation, because we use these auto-tools. These methods can be classified into two main categories:
The first category inspects the contents of a string and returns some result about that. These methods do not change the contents of the string. Examples include searching for the position of a character in a string and checking if the string is uppercase or lowercase.
String analysis methods can be further divided into two sub-groups: searching methods and checker methods. The searching methods return some result normally an index or a character in the string, while the checker methods return True or False.
String formatting methods return a modified version of the string. Examples include stripping, reversing or converting the case of characters.
Let us explore these methods in more depth. We will be using the interactive mode to demonstrate the functionality of each method.
The Count Method:
This method returns the number of times a character (or a substring) appears in a string. If the given character does not appear in the string, the method returns 0. The method is case sensitive, dealing with upper case and lower case characters as different characters. Below is a demonstration of how the method works:
>>> my_str = 'Wilfrid Laurier'
>>> my_str.count('i')
3
>>> my_str.count('L')
1
>>> my_str.count('z')
0
>>> my_str.count('ie')
1
>>> my_str.count('ri')
2
The Find Method:
This method searches for a character (or a substring) within a string. The method returns the index of the first occurrence of the character in the string. If the character is not found, it returns -1. The method is case sensitive. Below is an example of how the method could be used:
>>> my_str = 'Wilfrid Laurier'
>>> my_str.find('W')
0
>>> my_str.find('r')
4
>>> my_str.find('z')
-1
>>> my_str.find('ri')
4
If you are interested in finding the last occurrence of the character,
then you can use the rfind
method. This method searches the
string from right to left and returns the first occurrence it finds.
Below is an example:
>>> my_str = 'Wilfrid Laurier'
>>> my_str.rfind('W')
0
>>> my_str.rfind('r')
14
>>> my_str.rfind('z')
-1
>>> my_str.rfind('ri')
11
The Index Method:
This method searches for a character (or a substring) within a string. The method returns the index of the first occurrence of the character in the string. If the character is not found, it throws an error message. The method is case sensitive.
Hmm… this looks exactly like the find
method, other than
the minor change of how to handle the situation when the character is
not found!! The reason Python maintains the index
method is
for consistency purposes. The index
method is applicable to
lists, tuples and dictionaries, so why not apply it to strings too? On
the other hand, the find
method is only applicable to
strings. Therefore, think of index
and find
methods as two approaches to searching in strings.
Similar to the rfind
method, Python supports rindex
which searches from right to left. Below is an example for using the
index and rindex
method:
>>> my_str = 'Wilfrid Laurier'
>>> my_str.index('W')
0
>>> my_str.index('r')
4
>>> my_str.index('ri')
4
>>> my_str.index('z')
Traceback (most recent call last):
File "<pyshell#59>", line 1, in <module> my_str.index('z')
ValueError: substring not found
>>> my_str.rindex('r')
14
>>> my_str.rindex('ri')
11
>>> my_str.rindex('z')
Traceback (most recent call last):
File "<pyshell#62>", line 1, in <module> my_str.rindex('z')
ValueError: substring not found
Case Checker Methods:
Python supports the islower
and isupper
methods to inspect the case of characters within a string. If all
characters in a string are lower case, the islower
method
returns True
, otherwise it returns False
.
Similarly, if all characters in a string are upper case, the isupper
method returns True
, otherwise it returns False
. Both
methods ignore non-alphabetical characters, and return False
for an empty string. Below are examples for using the two methods:
>>> 'ABC'.isupper()
True
>>> 'ABc'.isupper()
False
>>> '1 A B C'.isupper()
True
>>> 'abc'.islower()
True
>>> 'Abc'.islower()
False
>>> 'a-b-c'.islower()
True
>>> ''.islower()
False
A relevant method to case checking is the istitle
method.
This method checks if the first character of each word is uppercase,
while everything else is lower case. Below is an example of how it is
used:
>>> 'Wilfrid Laurier'.istitle()
True
>>> 'Ontario'.istitle()
True
>>> 'Computer Science department'.istitle()
False
>>> 'ON'.istitle()
False
>>> ''.istitle()
False
Checking Character Type:
Python offers a variety of tools to check the type of characters in a
string. These methods return True or False depending on whether all
characters in the string are of the given category. For instance, the
method isalpha( )
returns True
if all
characters in the string are alphabetical characters. If there is a
single character that is non-alphabetical, like a space or a digit, then
the method returns False
. The following table summarizes
the commonly used methods:
Method | Description |
---|---|
isalpha() |
Check if all characters in a string are alphabetical characters. The method is not case sensitive. |
isdecimal() |
Check if all characters in a string are decimal characters. |
isdigit() |
Check if all characters in a string are digit characters |
isnumeric() |
Check if all characters in a string are numeric characters. |
isspace() |
Check if all characters in a string are white space. This includes a space, a tab and a newline characters. |
isalnum() |
Check if all characters in a string are either alphabetic or numeric characters. |
Technical Note:
The methods isdecimal()
isdigit()
and isnumeric()
perform the same thing, which is verifying if the string contains only
number characters. They are only different in how they handle special
Unicode characters, an aspect of no concern to this course.
All decimal characters are also digit characters, and all digit
characters are also numeric characters. Therefore, the method isnumeric()
is the most comprehensive one. Many Python programmers prefer to use
this method over the other two.
It is interesting to see that Python does not have a method to check for
special characters. Also, if you pass a negative number or a float
number to the isnumeric()
method, it will return False.
Such method would be useful if you are writing a piece of software that
checks if a password contains special characters. Based on the language
constraints, we would need to write our own customized function to
achieve the task.
The string formatting methods manipulate a string by changing its contents and return the modified version. It is important to realize that these methods do not change the variable contents. It works by creating a copy of the string contents, manipulates it and then returns the result.
These methods can be grouped into three main groups:
Our focus will be on the most commonly used methods under the above categories:
Case conversion methods:
As the name suggests, these methods apply some changes to the case of the characters. The following table summarizes the description of these methods:
Method | Description |
---|---|
capitalize() |
Convert the first character into upper case |
lower() |
Convert all characters into lower case |
upper() |
Convert all characters into upper case |
swapcase() |
Convert all lower case characters into upper case and all upper case characters into lower case |
title() |
Convert the first character of every word into upper case |
The following screenshot display examples of how the above methods could be used:
>>> str1 = 'good'
>>> str2 = 'morning'
>>> print(str1.upper())
GOOD
>>> print(str2.capitalize())
Morning
>>> str1 = 'good'
>>> str2 = 'morning'
>>> str3 = str1.upper()
>>> print(str3)
GOOD
>>> print(str3.lower())
good
>>> str4 = str2.capitalize()
>>> print(str4)
Morning
>>> str5 = str1 + ' ' + str3
>>> print(str5)
good GOOD
>>> print(str5.swapcase())
GOOD good
>>> print(str5.title())
Good Good
Removal methods:
Removal methods return a string which contains the original string with
some characters omitted. Perhaps the most useful methods are the strip
and split
methods. Learning about these two methods will be
useful to the next lesson, when you learn about file handling.
The strip
method removes white spaces from the beginning
and end of a string. White spaces include the space character, the tab
and the newline character. This method will be used frequently when
reading from files, as will be outlined in Lesson Below is a
demonstration of how the strip
method works:
>>> x = 'ab\n'
>>> print(x.strip())
ab
>>> y = 'c d \t\n'
>>> print(y.strip())
c d
>>> z = 'abcdef'
>>> print(z.strip('ef'))
abcd
>>> a = ' abcdef '
>>> print(a.strip())
abcdef
The split
method receives an input parameter, called a
delimiter. The method searches for every occurrence of the delimiter and
splits the string at that position. The result is stored in a list.
For instance, if my_str = 'ab-cd-ef' then
calling: my_str.split('-')
will return ['ab', 'cd', 'ef']
.
If the delimiter does not exist, then the split
method will
return the full string as an element in a list. Using the above value: my_str.split('?')
will return ['ab-cd-ef']
.
Formatting methods:
Formatting methods return a copy of the original string after applying
some additional formatting features. The most commonly used method is a
method that you have been using all the time, but probably did not
recognize that it is a string method. It is the famous .format
method that you have been using in the print statements.
Since you are familiar with this method, it will suffice to show how it can be directly used to create and update strings:
>>> x = f"{'price of oranges = $1.53 per lb':>40s}"
>>> x
' price of oranges = $1.53 per lb'
>>> x = x[:29] + '{1.53:5.3f}' + x[33:]
>>> x
' price of oranges = $1.530 per lb'
The following table provides a summary of all methods presented in this section, ordered alphabetically.
Method | Description |
---|---|
capitalize() |
Convert the first character into upper case |
count(<s>) |
Returns number of times the given substring appears in the string |
endswith(<s>) |
Check if string ends with given substring |
find(<s>) |
Returns position of first occurrence of given substring. Returns -1 if not found |
format() |
Formats a string using formatting specifiers |
index(<s>) |
Returns position of first occurrence of given substring. Throughs an exception if not found |
isalnum() |
Check if all characters in a string are either alphabetic or numeric characters. |
isalpha() |
Check if all characters in a string are alphabetical characters. The method is not case sensitive. |
isdecimal() |
Check if all characters in a string are decimal characters. |
isdigit() |
Check if all characters in a string are digit characters |
islower() |
Check if all characters in string are lower case |
isnumeric() |
Check if all characters in a string are numeric characters. |
isspace() |
Check if all characters in a string are white space. This includes a space, a tab and a newline characters. |
istitle() |
Check if first character of every word is an uppercase |
isupper() |
Check if all characters in string are upper case |
join(<iter>) |
Joins every item in the iterable object with the string separator |
lower() |
Convert all characters into lower case |
lstrip(<s>) |
Removes the given substring from the start of the string. |
partition(<d>) |
Splits a string at the first occurrence of given delimiter. The result is a tuple containing substring before delimiter, the delimiter and substring after the delimiter |
rfind(<s>) |
Returns position of last occurrence of given substring. Returns -1 if not found |
rindex(<s>) |
Returns position of last occurrence of given substring. Throughs an exception if not found |
rstrip(<s>) |
Removes the given substring from the end of the string. |
split() |
Splits a string at every occurrence of given delimiter. The result is stored in a list. |
startswith(<s>) |
Check if string starts with given substring |
strip() |
Removes all white spaces from the start and end of a string. If a specific substring is passed, then it removes it from the start and end. |
swapcase() |
Convert all lower case characters into upper case and all upper case characters into lower case |
title() |
Convert the first character of every word into upper case |
upper() |
Convert all characters into upper case |
You are not expected to memorize the above methods. You can use the above table as your reference as you program. The more you use these methods, the less you will need to revisit the table.
When users interact with the console through basic input/output operations or when users read or write to files the primary used data format is strings. This string data is normally converted to other data types for processing.
You have seen in Lesson how a string could be converted into an integer and vice versa. The same process could be used to convert from and to the float data type. If you recall, this process is called casting. The following interactive commands provide an example:
>>> x_int = 3
>>> x_float = 3.5
>>> x_str = str(x_int)
>>> x_str
'3'
>>> x_str = str(x_float)
>>> x_str
'3.5'
>>> print(float(x_str))
3.5
Another essential conversion to learn is converting from strings to
lists and vice versa. You can convert from a string to a list by simply
using the following casting syntax: list(<str_var>)
.
The result converts the string into a list in which each element is a
character from the string, preserving the character order. Below is an
example:
>>> my_str = 'CP104'
>>> my_list = list(my_str)
>>> my_list
['C', 'P', '1', '0', '4']
>>> my_str = ''
>>> print(list(my_str))
[]
Unfortunately, using the reverse casting, i.e. str(<list_var)
does not produce the desired output. The casting will result in a string
similar to how the list appears when it is printed to the console.
Similar to the following:
>>> my_list = ['a','b','c']
>>> my_str = str(my_list)
>>> my_str
"['a', 'b', 'c']"
To properly convert a list of characters into a string we need to use
another method called join
. The join
method
takes the elements of the list, assuming they are strings, and joins
them in one string. If the elements are non-strings, the method throws
an error.
>>> my_list = ['a','b','c']
>>> my_str = ''.join(my_list)
>>> my_str
'abc'
>>> my_list = [1,2,3]
>>> my_str = ''.join(my_list)
Traceback (most recent call last):
File "<pyshell#224>", line 1, in <module> my_str = ''.join(my_list)
Notice how an empty string is used to construct the join, i.e. ''.join(my_list)
which means the list is being joined to an empty string. This suggests
that the join
method is not limited to list-to-string
conversion and could be used as a general method for joining two
strings. This is true, but more accurately, the join method works for
any iterable object with a string. Formally, the syntax of the join
method is as follows:
<str var>.join(<iterable>)
The iterable could be a list, tuple, dictionary or another string. However, all elements should be strings. The method takes every item in the iterable object and separates it with the given string. The result is the concatenation of all of the iterable with the separators. Here is an example:
>>> str1 = '-'
>>> list1 = ['a','b','c']
>>> print(str1.join(list1))
a-b-c
>>> str1 = ':'
>>> str2 = 'Books'
>>> print(str1.join(str2))
B:o:o:k:s
>>> str1 = '##'
>>> tuple1 = ('A','B')
>>> print(str1.join(tuple1))
A##B
>>> print(''.join(['O','n','t','a','r','i','o']))
Ontario
Technical Note:
There is a built-in method for every object in Python called: __str__()
.This
method converts the object into a string. For instance, try:
x = 3
print(x.__str__())
When performing string casting, Python invokes this method. In CP164, you will be using this method to convert your customized objects into strings.
In this section, we will provide two examples of string manipulation. In the developed solutions, we will use some of the built-in string methods, but also write our own methods. As a reminder, string manipulation refer to any operation we do with strings, which may be analysis, editing, or formatting. There is no way to cover all string manipulation scenarios. Therefore, you will not able to directly use what we develop here in your solutions. Instead, you would need to use problem solving skills to customize the solution to meet your objectives.
Example 9.5.B:
Implement the function format_cities(cities)
.
The function receives a string composed of city names separated by a space character. Assume all cities are single word.
The function format the cities by doing the following:
The function returns a string similar to the input but with the above formatting applied.
We will start by providing the function prototype and setting the return value.
# Program Name: Prog 9-03
# Solution to Example 9.5.B
def format_cities(cities):
cities_formatted = ''
return cities_formatted
Since we will be applying formatting steps to each city, it will be better to separate the cities and handle each one at a time. We can do this through the split method which converts the input string into a list of cities.
# Program Name: Prog 9-03
# Solution to Example 9.5.B
def format_cities(cities):
cities_formatted = ''
# create a list of cities
cities_list = cities.split(' ')
return cities_formatted
We will also create a temporary list to store the formatted cities. So, the program will maintain two lists, one with non-formatted cities and one with formatted cities. As we start the non-formatted list is full, and the formatted list is empty.
# Program Name: Prog 9-03
# Solution to Example 9.5.B
def format_cities(cities):
cities_formatted = ''
# create a list of cities
cities_list = cities.split(' ')
#create a temporary list to store formatted list
formatted_list = []
return cities_formatted
Since we have a list, we can loop through each city and capitalize each city at a time, adding it to the formatted list at the end of each iteration.
# Program Name: Prog 9-03
# Solution to Example 9.5.B
def format_cities(cities):
cities_formatted = ''
# create a list of cities
cities_list = cities.split(' ')
#create a temporary list to store formatted list
formatted_list = []
for city in cities_list:
#capitalize first character
city = city.capitalize()
# add formatted city to formatted list
formatted_list.append(city)
return cities_formatted
Before we apply the second formatting, it would be better to test the first formatting step. To do that, we need to convert the list back to a string. This could be done using the join method. As the following:
# Program Name: Prog 9-03
# Solution to Example 9.5.B
def format_cities(cities):
# create a list of cities
cities_list = cities.split(' ')
#create a temporary list to store formatted list
formatted_list = []
for city in cities_list:
#capitalize first character
city = city.capitalize()
# add formatted city to formatted list
formatted_list.append(city)
#convert list to a string
cities_formatted = ' '.join(formatted_list)
return cities_formatted
For testing the function, you can use the following code:
print('cities before formatting:')
print(cities)
print('cities after formatting:')
cities_formatted = format_cities(cities)
print(cities_formatted)
print()
# city_list2 = ["chicago", "charlotte", "seattle"]
cities = "chicago charlotte seattle"
print('cities before formatting:')
print(cities)
print('cities after formatting:')
cities_formatted = format_cities(cities)
print(cities_formatted)
Finally, we can add code to convert each lower case 'e' to upper case. To do that, we need to find the position of the character 'e' within the city name. Then we need to reconstruct the string with the new formatting. We can use a special method called replace which would replace every instance of 'e' with 'e'. However, for learning purposes, we will show how to do that manually.
The final version of the code becomes:
# Program Name: Prog 9-03
# Solution to Example 9.5.B
def format_cities(cities):
# create a list of cities
cities_list = cities.split(' ')
#create a temporary list to store formatted list
formatted_list = []
for city in cities_list:
#capitalize first character
city = city.capitalize()
#capitalize every e
while city.find('e') != -1:
pos = city.index('e')
city = city[:pos] + 'E' + city[pos+1:]
# add formatted city to formatted list
formatted_list.append(city)
#convert list to a string
cities_formatted = ' '.join(formatted_list)
return cities_formatted
If you run the above code with the testing code, you should get the following output:
cities before formatting:
baltimore milwaukee boston
cities after formatting:
BaltimorE MilwaukEE Boston
cities before formatting:
chicago charlotte seattle
cities after formatting:
Chicago CharlottE SEattlE
In Lesson 9, you learned about Python strings. You should now create strings using simple initialization or using concatenation. You are now familiar with the built-in string methods. These methods offer a variety of options from analyzing strings to editing and formatting them. You should also be able to write commands to convert from basic data types to strings and vice versa. You studied some examples of how you can create your own functions to manipulate strings. These functions use a combination of the built-in methods along with your own code.
Overall, string manipulation is not a difficult topic in itself. You just need to be familiar with the available tools and know how to properly use them in your solution.
What you learned in this lesson will be useful to the next lesson when you learn about files. Data stored in files appear in the form of strings. Therefore, you will be applying concepts learned here to process data stored in files.