CP367: Lab 04 - Winter 2026 - Text Processing I

Due 11:59 PM, Friday, February 6, 2026

Finding and Filtering Text

grep options pattern files : General Regular Expression Print

grep examines the contents of a file and returns all lines that contain pattern . For example:

grep
/home/dbrown/CP367> grep printf heat.c
  printf("How tall is the tree (ft): ");
  printf("Radius of the tree (ft): ");
  printf("Wood to heat house for one day (ft^3): ");
  printf("Volume: %d ft^3\n", volume);
  printf("Days: %d\n", days);

The command grep printf heat.c looks through the file heat.c for the string printf , and prints out all lines that that contain that string.

grep has a number of options, the most useful of which are:

grep -n
/home/dbrown/CP367> grep -n printf *.c
ctest.c:11:        printf("%c\n", letter);
heat.c:18:  printf("How tall is the tree (ft): ");
heat.c:20:  printf("Radius of the tree (ft): ");
heat.c:22:  printf("Wood to heat house for one day (ft^3): ");
heat.c:27:  printf("Volume: %d ft^3\n", volume);
heat.c:28:  printf("Days: %d\n", days);

The command grep -n printf *.c looks through all the files that end in .c for the string printf , and prints out the line numbers and contents of all lines that that contain that string.

grep may use various regular expressions in its patterns:

More grep
/home/dbrown/CP367> grep -n {$ *.c
ctest.c:8:int main() {
ctest.c:10:    for (letter = 'A'; letter <= 'z'; letter++) {
heat.c:10:int main() {

The command grep -n {$ *.c looks through all the files that end in .c for lines that end with the string { , and prints out the line numbers and contents of all lines that that contain that string.


Pipes

The pipe ( | ) operator allows data to pass from one process to another. It is useful for combining system utilities to create more complex functions than one utility could accomplish alone. The pipe operator passes the output from one process to the input of the next process (and a third, or fourth, etc.)

grep with pipe
/home/dbrown/CP367> grep printf *.c | grep -v %
heat.c:  printf("How tall is the tree (ft): ");
heat.c:  printf("Radius of the tree (ft): ");
heat.c:  printf("Wood to heat house for one day (ft^3): ");

The command grep printf *.c | grep -v % looks through all the files that end in .c for lines that contain the string printf , and then pipes that through a filter that lists only lines that do not contain the string % .

ls and grep with pipe
/home/dbrown/CP367> ls -l | grep "Jan 12"
drwxr-x---   2 dbrown student      512 Jan 12 16:21 a1
-rwxr-x---   1 dbrown student     8605 Jan 12 16:21 dbrown.zip

The command ls -l | grep "Jan 12" creates a long file listing then filters the list for lines that contain the string Jan 12 .


Redirecting I/O

The output from programs is usually written to the screen (standard output) while their input usually comes from the keyboard (standard input - but only if no file arguments are provided). There is a third 'standard' file: standard error. Error messages are sent to standard error, which by default is also the screen. However, there are various operators available to redirect input and output to other places, usually files. (The pipe operator provides a form of redirection.) The operators are:

Redirection
/home/dbrown/CP367> ls -la > output.txt
/home/dbrown/CP367> cat output.txt
total 64
-rwxr-x---   1 dbrown student      230 Jan  8 20:14 #ctest.c#
drwxr-x---   3 dbrown student      512 Feb  1 14:34 .
drwx--x--x   5 dbrown student      512 Jan 18 08:54 ..
drwxr-x---   2 dbrown student      512 Jan 12 16:21 a1
-rwxr-x---   1 dbrown student     8605 Jan 12 16:21 dbrown.zip
-rwxr-x---   1 dbrown student     7332 Jan 16 10:58 ctest
-rwxr-x---   1 dbrown student      230 Jan  3 23:36 ctest.c
-rwxr-x---   1 dbrown student     7832 Jan  8 20:09 heat
-rwxr-x---   1 dbrown student      768 Jan  8 20:14 heat.c
-rwxr-x---   1 dbrown student      767 Jan  8 20:04 heat.c~
-rw-------   1 dbrown student        0 Feb  1 14:47 output.txt

The command ls -l > output.txt redirects the output of the long list command to the file output.txt . The command cat output.txt then displays the contents of the file to the screen.

standard output and standard error can be redirected separately by putting a number (1 for standard output, 2 for standard error) in front of the output redirection symbols.

Error Redirection
/home/dbrown/CP367> grep hello nofile.txt 2> error.txt
/home/dbrown/CP367> cat error.txt
grep: can't open nofile.txt

The command grep hello nofile.txt 2> error.txt redirects the error output (if any) of the command to the file error.txt . Because the file nofile.txt does not exist, error code is generated and sent to error.txt . The command cat error.txt then displays the contents of the file to the screen. If there are no errors the error output file is empty.

Both types of output can be redirected at the same time. Example:

The command grep printf *.c 1> output.txt 2> error.txt sends the normal command output (if any) to the file output.txt , and the error output (if any) to the file error.txt .


Sorting Files

There are two simple facilities for sorting files in unix.

Their default output is to the screen. Both of these command can be used with pipes and/or redirection. uniq is particularly useful when used in conjunction with sort .

Sorting
/home/dbrown/CP367> grep printf heat.c | sort
  printf("Days: %d\n", days);
  printf("How tall is the tree (ft): ");
  printf("Radius of the tree (ft): ");
  printf("Volume: %d ft^3\n", volume);
  printf("Wood to heat house for one day (ft^3): ");

The command grep printf heat.c | sort searches the contents of heat.c for lines that contain printf . These lines are then displayed in alphabetical order.


Quotation

Metacharacters are characters that have special meanings on the command line - characters such as *, $, >, <, ^ etc. If you need to use these characters as actual data, they need to be escaped. Metacharacters can be escaped either by wrapping them in single quotes, or putting a \ character in front.

No Metacharacter
/home/dbrown/CP367> grep * heat.c

This command attempts to look for the * character in the file heat.c . This may make you think that the file contains no asterisks. If the metacharacter * is escaped instead:

With Metacharacter
/home/dbrown/CP367> grep '*' heat.c
/*
 * heat.c
 *
 *  Created on: 2011-01-08
 */
  volume = height * radius * radius * PI;

which is what we were actually looking for.

  1. Create a file in1.txt containing the numbers:

    1
    2
    3
    4
    4
    5
    6
    6
    6
    7
    

    and a file in2.txt containing the numbers:

    3
    4
    9
    9
    9
    12
    

    Issue a one-line command that concatenates the contents of the two files into the file output.txt . The numbers in the new file must be in numeric order with duplicates removed.


  2. Generate and append at least three different error messages to a file named errors.txt .


  3. Issue a one line command that lists only the files or directories in your hopper workspace that have rwx as one of their permissions.

    Sample Output
    drwx--x--x   5 dbrown student       512 Feb  1 16:19 .
    drwxr-xr-x   2 dbrown student       512 Jan  3 21:57 ./public_html
    drwxr-x---   3 dbrown student       512 Feb  1 16:06 ./CP367
    -rwxr-x---   1 dbrown student       230 Jan  3 23:36 ./CP367/ctest.c
    -rwxr-x---   1 dbrown student      7332 Jan 16 10:58 ./CP367/ctest
    -rwxr-x---   1 dbrown student      7832 Jan  8 20:09 ./CP367/heat
    -rwxr-x---   1 dbrown student       768 Jan  8 20:14 ./CP367/heat.c
    drwxr-x---   2 dbrown student       512 Jan 12 16:21 ./CP367/a1
    -rwx------   1 dbrown student         6 Feb  1 16:19 ./test.txt