|
Regular expressionsLast revision August 2, 2004
A regular expression is a pattern or template used in a string matching or searching operation. Regular expressions are used by many programs that need to search for text in a file or perform substitutions or other operations on text. Such programs include the vi editor, the grep file searching program, and the data matching and manipulation utilities expr, awk, and sed. For either searching or replacing, regular expressions allow you to work with patterns of characters, not just fixed strings of characters. This allows greater power and flexibility in your commands. In addition to regular characters, regular expressions contain special characters called "metacharacters". These characters mean something other than what they appear to be. They are like variables in a program. Remember that many metacharacters in regular expressions also have a special (different) meaning to the shell. So if you are typing a command at the shell prompt, such as grep, that requires a regular expression as an argument, be sure to enclose the regular expression in a pair of single quotes. In regular expressions, all characters that are not "special" match themselves only. If you want a metacharacter to stand for itself, rather than its special meaning, precede it with the escape character \ (backslash). To match a backslash character itself, you need two in a row (the first "escapes" the special meaning of the second backslash as an escape character). The basic metacharacters permit matches of arbitrary characters.
A second group of metacharacters allow you to "anchor" the match to a location in the line.
^ and $ only have special meanings if used at the beginning or end, respectively, of a regular expression (or for the case of ^, also if at the beginning of a list of characters in square brackets) -- otherwise they are ordinary characters. The general rule is that a regular expression matches the longest among the possible leftmost matches in a line. For example, if you use the regular expression t.*e with a substitution command in an editor such as vi, and the next line has "the tree is bare", the expression will match not just the first word "the", but the entire phrase "the tree is bare", which starts with the "t" character, has any number of other characters following, and ends with the "e" character.
|