Fields and variables in awk



Last revision August 6, 2004

Table of Contents:
  1. Running awk
  2. awk commands
  3. Fields and variables in awk
  4. Patterns in awk
  5. Actions in awk
  6. Simple awk examples

Throughout your awk "script" or program, you have access to built-in and user defined variables.

Variables may be used in either patterns or actions.

Built-in variables

The main built-in variables are the words or "fields" of the input line. awk normally separates each input line into fields using blank spaces as delimiters.

You can change the delimiter character to character x with the option -Fx on the awk command line, or by setting the variable FS = "x" within your script.

Note that most awk implementations (including pangea's) are limited to 99 fields per line, and no more than 3000 characters total per line. If any input lines exceed those values, the awk program will quit at that point with an error.

The first field on the line is referred to as $1, the second as $2, and so on. The entire line may be referred to as a unit as $0.

Other useful built-in variables are:

NR Line number of current input line.
NF Number of fields in the current line.

All built-in variables (except the field variables) have names that are all uppercase letters.

User-defined variables

You can create as many variables as you like, giving them arbitrary names. These names can only contain alphabetic letters, numerals, and the underscore character (_). No other punctuation symbols are allowed in variable names. Upper and lower case alphabetic characters are different. You can use either, but must be consistent. Variable names must start with an alphabetic character.

Variables are "typeless". They are treated as either character strings or numeric values, depending upon context.

Variables do not need to be declared or initialized; they "spring into being" when first used, and are automatically initialized to the null (empty) string (or numeric value 0).

You can create arrays - see the awk documentation.

Variable expressions and operations.

Expressions involving multiple variables may be used as part of either "patterns" or "actions". Assigning the value of an expression to a new variable can only be done as part of an "action".

Variables may be assigned or operated on with the standard set of "C" language operators. Operations can be grouped with parentheses to indicate the proper order of operation.

Variable values are interpreted as "strings" of characters, except that they are automatically converted to numeric values for arithmetic operations. The numeric value of a string is the value of the longest prefix (beginning portion) of the string that looks numeric. If the string does not start with a digit, then its numeric value is 0 (zero). See The AWK Programming Language for details of the conversion rules.

Assignment of an expression to a variable:

= Variable on left set equal to value of expression on right.
+= Variable on left incremented by value of expression on right.
-= Variable on left decremented by value of expression on right.

Arithmetic operations between two expressions or variables:

+ Addition
- Subtraction
* Multiplication
/ Division

Concatenation: place two variable names next to each other (blank between) to create an expression that concatenates their string values into a single string, for example:
     newvar = oldvar1 oldvar2
Here, newvar is given a string value that is simply the concatenation of the string values of oldvar1 and oldvar2.

Logical operators connect two expressions or variables to create a "logical expression" that is evaluated to be either "true" or "false". These are used in if statements that can control the logical flow of processing in the script.

< True if left side is numerically less than right side.
<= True if left side is numerically less than or equal to right side.
> True if left side is numerically greater than right side.
>= True if left side is numerically greater than or equal to right side.
== True if left side has string value exactly same as right side, except if both sides can be interpreted as numbers, then true if their numerical values are equal.
!= True if left side not equal to right side, with same interpretation rules for string or numeric value comparison as ==, above.
|| Logical "or" operator. Joins two separate logical expressions and indicates that overall result is true if either separate expression is true.
&& Logical "and" operator. Joins two separate logical expressions and indicates that overall result is true only if both separate expressions are true.
! Logical "not" operator. Use in front of an entire logical expression to negate its value (true becomes false and false becomes true). Example:
      ! ($1 == 5 && $2 == 10)
is false, rather than true, if $1 has the value 5 and $2 has the value 10.

Many built-in arithmetic (sqrt, exp, etc) and string functions (length, substr, index) are available to use in expressions. The "new" awk has more built-in functions than the old.

Comments or Questions?