Patterns in awk



Last revision August 6, 2004

Table of Contents:
  1. Running awk
  2. awk commands
  3. Fields and variables in awk
  4. Patterns in awk
  5. Actions in awk
  6. Simple awk examples

The simplest pattern is a regular expression, enclosed within slashes, just like sed or vi.

awk goes beyond other Unix editing utilities in that you can combine regular expressions with logical operators, as follows:

      /pat1/ || /pat2/
The line matches if either pat1 or pat2 is found.

      /pat1/ && /pat2/
The line matches only if both pat1 and pat2 are found.

      /pat1/ && !/pat2/
the line matches only if it contains pat1 and does not contain pat

Another class of patterns consists of logical relations between variables and constants (not regular expressions). Examples:

NR > 8 The line matches if its line number is greater than 8.
NF < 5 The line matches if it has fewer than five fields.
$1 == "erebus" The line matches if its first field is exactly equal to the string of characters erebus.

You can mix and match variables with regular expressions in a pattern. In particular, your pattern can specify that a regular expression should match or not match a particular variable, rather than the entire line. Examples:

     $1 ~ /pat1/
The line matches if the first field ($1) contains pat1.

     $3 !~ /pat1/
The line matches if the third field ($3) does not contain pat1.

A program can specify a range of lines to be matched. Provide two patterns of any type (string comparisons, regular expressions, numeric comparisons, etc) separated by a comma. The range includes all lines beginning with the first line to match the first pattern, and ending with the first subsequent line that matches the second pattern (or the end of the file, if no subsequent line matches the second pattern).

Like sed, the range pattern is inclusive of the line that matches the second pattern.

More than one set of lines in the input can match the same range pattern. That is, after a range has been matched, the searching is "reset" and awk begins to look for the first pattern again in the remainder of the file to see if it can find another matching range, etc.

Built-in patterns: BEGIN and END

The keyword BEGIN "matches" the beginning of the file; any action that goes with this pattern is executed before any lines are read from the input file. This can be used to initialize variables; write header lines to the output file, etc. Example:
     BEGIN {print "Inventory of hardware on erebus"}

The keyword END "matches" the end of the file; any action that goes with this pattern is executed after all input lines have been processed. This is useful for printing sums and totals. Example:
     END {print "Number of records processed = " NR}

Comments or Questions?