File searching with grep
Last revision August 2, 2004
Table of Contents: |
Grep is an acronym for "global regular expression printer". It finds and prints (to standard output) all lines in the standard input (or specified files) that match or contain a specified regular expression. This allows you to filter the contents of a file, copying only certain lines to a second file (or to a pipeline).
Basic syntax:
grep regular_expression filenames
The regular expression can be a fixed string (like Geology) or it can use metacharacters (such as ., *, ^, or $). If it uses metacharacters, be sure to enclose the entire regular expression within single quotes (apostrophes) to prevent the shell itself from trying to interpret some of these metacharacters for filename expansion, thus preventing grep from even seeing them.
Examples:
grep '^[A-Z]' note |
prints (lists) all lines in file
note
that begin with a capital letter.
|
grep '^$' note |
prints all empty (null) lines.
|
grep '^[ ^I][ ^I]*' note |
prints all lines that begin with at least one, and possibly
multiple, blank or tab character. Basically, shows all lines
that begin with "white space".
|
grep '[gG]eology' note |
prints all lines with the word
geology, capitalized or not.
|
If you give more than one filename, grep searches each in turn for matching lines. As the matching lines are listed to standard output, they are prefixed with the name of the file in which they were found. For example, suppose you have three files named red, green, and blue. Files green and blue contain the word "Geology". If you use grep to search all three files at once with the command
grep Geology red green blue
it will list the matching lines to standard output prefixed by the filename, for example,
green:Geology ...
blue:The Geology Corner ...
If you just want to know which files contain a line that matches the regular expression, but don't need to actually see the line(s), then use the -l option to grep. This tells grep to simply list the names of the files (from the input argument list) that contain the regular expression. For example, you could just list the names of the files from the set above that contain at least one line with the word "Geology" using:
grep -l Geology red green blue
which would produce the output:
green
blue
If you want to filter (select) certain lines from a set of files, and don't want grep to prefix each matching line with the name of file where it was found, simply use cat to concatenate all the files together first into one data stream and pipe that to grep, for example:
cat red green blue | grep Geology
Warning: Be careful when using grep to search multiple files with output redirection. Don't ever try a command like this:
grep sometext * > textout
which may immediately fill up the disk if there are any matching lines in any of the files. Why? The problem here is that the shell first creates an empty output file textout, and then interprets the * wildcard character to match all files in the directory, including the new output file. The list of files, sorted alphabetically, is passed to grep. If grep finds any matching lines in the files before textout, it will add them to the end of the textout file. Then, when it opens textout to search it (remember, the shell ended up including textout itself in the list of files that match the * wildcard character) it will find those matching lines and append them to the end of textout, and then read those newly appended lines and append them again, ad infinitum, until the disk fills up completely.
On many Unix systems, there are actually several grep commands with slightly different names that are optimized for different situations. Typically, you will find these two programs in addition to standard grep. Check the online manual entries to see the special syntax for these variants.
fgrep |
A faster version that only uses fixed search strings, not regular
expressions
|
egrep |
An extended, generally faster version that handles logical
"and" or "or" of regular expressions, but cannot handle some very large files.
|