A regular expression (or regex) is a set of
characters that specify a string pattern. This pattern formally
describes a set of strings. (We've already seen a brief discussion
of these patterns when using
grep
in the Text Processing
lab.) For example, the regular expression pattern "
r(a|i|u)ng
" describes the set containing the three strings "
rang
", "
ring
", "
rung
". More prosaically, regular expression patterns are used to process
text: to find matching strings or extract portions of strings that
fit a pattern. They are particularly useful with tools such as
grep
and
sed
. Be warned - there are different standards of regular expressions.
EMACs, for example, has its own set of regular expressions.
Metacharacters are special characters used in regular expressions for various purposes.
| Metacharacter | Description | Example |
|---|---|---|
| ? | Matches 0 or one of the preceeding element | hoa?rse matches horse and hoarse |
| * | Matches 0 or more of the preceeding element | Khaa*n! matches Khan! and Khaaaaaaaan! |
| + | Matches 1 or more of the preceeding element | Kha+n! matches Khan! and Khaaaaaaaan! |
| | | Separates alternatives | string|strong matches string and strong |
| (…) | Groups elements | str(i|o)ng matches string and strong |
| ^… | Match beginning of line | ^H matches Hello and Hi! |
| …$ | Match end of line | go$ matches no go and way to go |
| . | Match any character | ^.$ matches any line with a single character |
| […] | Matches a range of characters | ^[1-9][0-9]*$ matches a line with a single integer
value |
| [^…] | Matches strings that do not contain the range of characters | [^0-9] matches strings that have no digits |
sed (Stream EDitor) performs basic transformations on text read from a file or a pipe. The result is sent to standard output. The syntax for the sed command has no output file specification, but results can be saved to a file using output redirection. The editor does not modify the original input.
What distinguishes sed from other editors, such as vi , is its ability to filter text that it gets from a pipeline feed. You do not need to interact with the editor while it is running; that is why sed is sometimes called a batch editor. This feature allows editing commands in scripts, greatly easing repetitive editing tasks. When facing replacement of text in a large number of files, sed is a great help.
sed has simple editing commands:
| Command | Result |
|---|---|
| a | Append text below current line. |
| c | Change text in the current line with new text. |
| d | Delete text. |
| i | Insert text above current line. |
| p | Print text. |
| r | Read a file. |
| s | Search and replace text. |
| g | Copy or append text. |
| w | Write to a file. |
sed has a number of options:
| Option | Effect |
|---|---|
| -e SCRIPT | Add the commands in SCRIPT to the set of commands to be run while processing the input. |
| -f SCRIPT-FILE | Add the commands contained in the file SCRIPT-FILE to the set of commands to be run while processing the input. |
| -n | Suppresses default output (silent mode). |
The following examples use the file
example.txt
:
This is the first line of an example text.
It is a text with erors.
Lots of erors.
So much erors, all these erors are making me sick.
This is a line not containing any errors.
This is the last line.
We can use
grep
to do a simple search through this file to look for the misspelling
of
errors
:
grep/home/dbrown> grep erors example.txt It is a text with erors. Lots of erors. So much erors, all these erors are making me sick.
However, grep does not allow us to change the contents of the file. sed does. First, we can look for the strings as we did with grep . sed requires its regular expressions to be placed between forward slashes:
sed -n/home/dbrown> sed -n '/erors/p' example.txt It is a text with erors. Lots of erors. So much erors, all these erors are making me sick.
The
-n
suppresses messages we don't need to see, and the
p
'prints' the lines that fit the regex pattern. To search and replace
the bad string, use the
s
command. The
g
option at the end of the replacement string tells
sed
to do a global replacement - otherwise only the first occurrence of
erors
per line is replaced.
sed Search and Replace/home/dbrown> sed 's/erors/errors/g' example.txt This is the first line of an example text. It is a text with errors. Lots of errors. So much errors, all these errors are making me sick. This is a line not containing any errors. This is the last line.
This has not corrected the contents of
example.txt
!
sed
writes its output to the console. To save the corrections in a file,
redirect the command output to a new file:
sed Search and Replace Redirect/home/dbrown> sed 's/erors/errors/g' example.txt > good.txt
Multiple find and replace commands are separated with individual -e options:
sed Multiple Search and Replace/home/dbrown> sed -e 's/erors/errors/g' -e 's/last/final/g' example.txt This is the first line of an example text. It is a text with errors. Lots of errors. So much errors, all these errors are making me sick. This is a line not containing any errors. This is the final line.
sed can delete lines that match a pattern with the d command:
sed Deletion/home/dbrown> sed '/erors/d' example.txt This is the first line of an example text. This is a line not containing any errors. This is the last line.
sed also works with ranges of lines. The follow example deletes lines 2 through 4:
sed With Range/home/dbrown> sed '2,4d' example.txt This is the first line of an example text. This is a line not containing any errors. This is the last line.
Use regex metacharacters for operations. For example, inserting a string at the beginning of each line in the file:
sed With Metacharacters/home/dbrown> sed 's/^/> /' example.txt > This is the first line of an example text. > It is a text with erors. > Lots of erors. > So much erors, all these erors are making me sick. > This is a line not containing any errors. > This is the last line.
Complicated sed commands can be put into scripts and executed by sed with the -f option. Multiple scripts may be called from the command line. Here are two different sed scripts:
fix.sed
contains the replacement commands above in a separate script file:
sed Simple Command Files/erors/errors/g
wraphtml.sed
contains commands to wrap a simple HTML page around text:
sed Commands File1i\ <html>\ <head><title>sed generated html</title></head>\ <body>\ <pre> $a\ </pre>\ </body>\ </html>
wraphtml.sed
contains commands we haven't seen yet:
1i (insert) command tells sed to
insert the following lines before the first line - if this is left
off, the text will be inserted before every line.
$a (append) command tells sed to
append the following lines after the text to be wrapped.
\) allow sed commands to spread
across multiple lines. The insert and append commands each use
multiple lines. (Do not use a backslash on the last line.)
These scripts can now be called from the command line. The scripts are applied in the order they appear on the command line:
sed Execute Commands File/home/dbrown> sed -f fix.sed -f wraphtml.sed example.txt <html> <head><title>sed generated html</title></head> <body> <pre> This is the first line of an example text. It is a text with errors. Lots of errors. So much errors, all these errors are making me sick. This is a line not containing any errors. This is the last line. </pre> </body> </html>
sed
allows you to resuse a matched string. This is particularly useful
when you want to add data to a string rather than just replace it.
The ampersand (
&
) character corresponds to the match. The following example fixes
the spelling of 'error', then wraps asterisks around the word
'error':
sed Reuse/home/dbrown> sed -e 's/erors/errors/g' -e 's/errors/*&*/g' example.txt This is the first line of an example text. It is a text with *errors*. Lots of *errors*. So much *errors*, all these *errors* are making me sick. This is a line not containing any *errors*. This is the last line.
The sed stream editor is a powerful command line tool which can handle streams of data: it can take input from a pipe. This makes it fit for non-interactive use. The sed editor uses vi-like commands and accepts regular expressions.
The sed tool can read commands from the command line or from a script. It is often used to perform find-and-replace actions on lines containing a pattern.
Create a new script file
p.sed
that wraps
<p>…</p>
tags around every line in a file.
Create a new script file
newwrap.sed
that wraps the body of an HTML document around an entire file -
i.e. it should do the same as
wraphtml.sed
, but without the
<pre>…</pre>
tags.
Combine the previous two tasks to produce the following output:
<html>
<head><title>sed generated html</title></head>
<body>
<p>This is the first line of an example text.</p>
<p>It is a text with erors.</p>
<p>Lots of erors.</p>
<p>So much erors, all these erors are making me sick.</p>
<p>This is a line not containing any errors.</p>
<p>This is the last line.</p>
</body>
</html>