awk pattern matching and processing language
Last revision August 6, 2004
Table of Contents: |
awk is a simple programming language especially useful for specifying actions that are to be taken when a pattern is found in an input file. Some of the basic features are:
awk operates as a filter, transforming or selecting from the input to create the output.
awk allows you to perform initializations or other actions before any data is read; then loop through the data line by line, applying any number of tests (patterns to match) and associated actions; and finally perform ending or cleanup actions that can include printing summaries.
Patterns can include tests on numeric values, as well as string matches.
Complicated patterns can be created from arbitrary logical combinations of individual patterns.
The "actions" to be taken when a pattern is found can be more general than simple editing of text.
You can increment or update variables or arrays that can then be printed at the end of processing or other times.
You can re-arrange text in very complicated ways.
You have complete "flow of control" programming constructs (if-then, loops, etc) available for conditional or repetitive actions.
While more general, awk is not as efficient as sed or grep for simple editing or selecting tasks.
awk is also very useful for prototyping programs that can then be re-written in C once the basic algorithms are devised and shown to work.
The current version of awk included with most modern Unix systems (including pangea) dates from 1985. It has many powerful general purpose programming features, such as functions and command line arguments. It is fully covered, along with many interesting examples, in the book The AWK Programming Language, by Alfred Aho, Brian Kernighan, and Peter Weinberger, 1988, Addison-Wesley Pub. Co., ISBN: 020107981X
An alternate implementation has been created as open-source by the Free Software Foundation and is called "GNU awk". This version is available on pangea under the program name gawk. It is basically compatible with the standard awk, with some extensions. It is fully documented on the GNU web site.
Some older Unix systems may have an earlier, less powerful version of awk that was developed in 1978; they may refer to the current (1985) version as "new awk" or nawk.
All functions described in these notes work for all versions. In general, the current (1985) version of awk is a superset of the old version, and gawk is a superset of the current (1985) awk. there are a few subtle differences between the old and current versions of awk, particularly in the handling of multiple statements on a single line. For this reason, the old version is still available on some systems in combination with the new version, rather than simply being replaced in all cases.