CP367: Lab 05 - Winter 2026 - Text Processing II

Due 11:59 PM, Friday, February 13, 2026

Regular Expressions

A regular expression (or regex) is a set of characters that specify a string pattern. This pattern formally describes a set of strings. (We've already seen a brief discussion of these patterns when using grep in the Text Processing lab.) For example, the regular expression pattern " r(a|i|u)ng " describes the set containing the three strings " rang ", " ring ", " rung ". More prosaically, regular expression patterns are used to process text: to find matching strings or extract portions of strings that fit a pattern. They are particularly useful with tools such as grep and sed . Be warned - there are different standards of regular expressions. EMACs, for example, has its own set of regular expressions.

Metacharacters are special characters used in regular expressions for various purposes.

Metacharacters
Metacharacter Description Example
? Matches 0 or one of the preceeding element hoa?rse matches horse and hoarse
* Matches 0 or more of the preceeding element Khaa*n! matches Khan! and Khaaaaaaaan!
+ Matches 1 or more of the preceeding element Kha+n! matches Khan! and Khaaaaaaaan!
| Separates alternatives string|strong matches string and strong
(…) Groups elements str(i|o)ng matches string and strong
^… Match beginning of line ^H matches Hello and Hi!
…$ Match end of line go$ matches no go and way to go
. Match any character ^.$ matches any line with a single character
[…] Matches a range of characters ^[1-9][0-9]*$ matches a line with a single integer value
[^…] Matches strings that do not contain the range of characters [^0-9] matches strings that have no digits

Using sed

sed (Stream EDitor) performs basic transformations on text read from a file or a pipe. The result is sent to standard output. The syntax for the sed command has no output file specification, but results can be saved to a file using output redirection. The editor does not modify the original input.

What distinguishes sed from other editors, such as vi , is its ability to filter text that it gets from a pipeline feed. You do not need to interact with the editor while it is running; that is why sed is sometimes called a batch editor. This feature allows editing commands in scripts, greatly easing repetitive editing tasks. When facing replacement of text in a large number of files, sed is a great help.

Commands

sed has simple editing commands:

sed Commands
Command Result
a Append text below current line.
c Change text in the current line with new text.
d Delete text.
i Insert text above current line.
p Print text.
r Read a file.
s Search and replace text.
g Copy or append text.
w Write to a file.

Options

sed has a number of options:

sed Options
Option Effect
-e SCRIPT Add the commands in SCRIPT to the set of commands to be run while processing the input.
-f SCRIPT-FILE Add the commands contained in the file SCRIPT-FILE to the set of commands to be run while processing the input.
-n Suppresses default output (silent mode).

The following examples use the file example.txt :

example.txt
This is the first line of an example text.
It is a text with erors.
Lots of erors.
So much erors, all these erors are making me sick.
This is a line not containing any errors.
This is the last line.

Search and Replace

We can use grep to do a simple search through this file to look for the misspelling of errors :

grep
/home/dbrown> grep erors example.txt
It is a text with erors.
Lots of erors.
So much erors, all these erors are making me sick.

However, grep does not allow us to change the contents of the file. sed does. First, we can look for the strings as we did with grep . sed requires its regular expressions to be placed between forward slashes:

sed -n
/home/dbrown> sed -n '/erors/p' example.txt
It is a text with erors.
Lots of erors.
So much erors, all these erors are making me sick.

The -n suppresses messages we don't need to see, and the p 'prints' the lines that fit the regex pattern. To search and replace the bad string, use the s command. The g option at the end of the replacement string tells sed to do a global replacement - otherwise only the first occurrence of erors per line is replaced.

sed Search and Replace
/home/dbrown> sed 's/erors/errors/g' example.txt
This is the first line of an example text.
It is a text with errors.
Lots of errors.
So much errors, all these errors are making me sick.
This is a line not containing any errors.
This is the last line.

This has not corrected the contents of example.txt ! sed writes its output to the console. To save the corrections in a file, redirect the command output to a new file:

sed Search and Replace Redirect
/home/dbrown> sed 's/erors/errors/g' example.txt > good.txt

Multiple find and replace commands are separated with individual -e options:

sed Multiple Search and Replace
/home/dbrown> sed -e 's/erors/errors/g' -e 's/last/final/g' example.txt
This is the first line of an example text.
It is a text with errors.
Lots of errors.
So much errors, all these errors are making me sick.
This is a line not containing any errors.
This is the final line.

Deletion

sed can delete lines that match a pattern with the d command:

sed Deletion
/home/dbrown> sed '/erors/d' example.txt
This is the first line of an example text.
This is a line not containing any errors.
This is the last line.

Ranges

sed also works with ranges of lines. The follow example deletes lines 2 through 4:

sed With Range
/home/dbrown> sed '2,4d' example.txt
This is the first line of an example text.
This is a line not containing any errors.
This is the last line.

Metacharacters

Use regex metacharacters for operations. For example, inserting a string at the beginning of each line in the file:

sed With Metacharacters
/home/dbrown> sed 's/^/> /' example.txt
> This is the first line of an example text.
> It is a text with erors.
> Lots of erors.
> So much erors, all these erors are making me sick.
> This is a line not containing any errors.
> This is the last line.

Scripts

Complicated sed commands can be put into scripts and executed by sed with the -f option. Multiple scripts may be called from the command line. Here are two different sed scripts:

fix.sed contains the replacement commands above in a separate script file:

sed Simple Command File
s/erors/errors/g

wraphtml.sed contains commands to wrap a simple HTML page around text:

sed Commands File
1i\
<html>\
<head><title>sed generated html</title></head>\
<body>\
<pre>
$a\
</pre>\
</body>\
</html>

wraphtml.sed contains commands we haven't seen yet:

These scripts can now be called from the command line. The scripts are applied in the order they appear on the command line:

sed Execute Commands File
/home/dbrown> sed -f fix.sed -f wraphtml.sed example.txt
<html>
<head><title>sed generated html</title></head>
<body>
<pre>
This is the first line of an example text.
It is a text with errors.
Lots of errors.
So much errors, all these errors are making me sick.
This is a line not containing any errors.
This is the last line.
</pre>
</body>
</html>

Reusing a Matched String

sed allows you to resuse a matched string. This is particularly useful when you want to add data to a string rather than just replace it. The ampersand ( & ) character corresponds to the match. The following example fixes the spelling of 'error', then wraps asterisks around the word 'error':

sed Reuse
/home/dbrown> sed -e 's/erors/errors/g' -e 's/errors/*&*/g' example.txt
This is the first line of an example text.
It is a text with *errors*.
Lots of *errors*.
So much *errors*, all these *errors* are making me sick.
This is a line not containing any *errors*.
This is the last line.

The sed stream editor is a powerful command line tool which can handle streams of data: it can take input from a pipe. This makes it fit for non-interactive use. The sed editor uses vi-like commands and accepts regular expressions.

The sed tool can read commands from the command line or from a script. It is often used to perform find-and-replace actions on lines containing a pattern.

  1. Create a new script file p.sed that wraps <p>…</p> tags around every line in a file.


  2. Create a new script file newwrap.sed that wraps the body of an HTML document around an entire file - i.e. it should do the same as wraphtml.sed , but without the <pre>…</pre> tags.


  3. Combine the previous two tasks to produce the following output:

    <html>
    <head><title>sed generated html</title></head>
    <body>
    <p>This is the first line of an example text.</p>
    <p>It is a text with erors.</p>
    <p>Lots of erors.</p>
    <p>So much erors, all these erors are making me sick.</p>
    <p>This is a line not containing any errors.</p>
    <p>This is the last line.</p>
    </body>
    </html>