Text formatting in Unix



Last revision August 3, 2004

Table of Contents:
  1. Computer Typesetting Overview
  2. LaTeX Typesetting System

Simple formatting of ASCII files

These programs work for any text file and do not require any embedded formatting commands. All of these programs will read from standard input if no files are specified on the command line, so they can work in pipelines. With the exception of enscript, these programs send the formatted text to standard output; to print on paper, pipe to lpr. The enscript program sends directly to the specified printer.

The fold and fmt programs are also very useful to justify "ragged" lines within a file using the filter (!) operator in the vi editor.

fold - break long lines into shorter pieces

fold splits long lines into several shorter ones. You can specify the maximum allowable line length with the -w option. By default, the maximum allowable output line length is 80 characters (the screen width).

By default, fold just splits the line when the maximum length is reached, even if that is in the middle of a word. On pangea (but not all Unix systems), there is a -s option to fold that tells it to try to split the line on a blank, preserving whole words. If there are no blanks on the line, it just splits at the maximum line length.

fold does not fill short lines. That is, it does not "borrow" words from the next line to fill in a short line. Using tr to first convert newline characters to blanks and then piping that output to fold should result in filling and folding to get all lines at the right length. Paragraph breaks would be lost, however.

fold can be used to give the equivalent of "screen wrap-around" when printing to paper, or to split up long lines into shorter ones for more convenient editing.

In tests on pangea, fold has correctly handled input lines of at least 16,000 characters in length.

fmt - fill lines

fmt is an older, simpler program than fold, designed to adjust line lengths to get lines as close as possible to 72 characters long without exceeding that limit (you can not specify a different width). It is not as versatile as fold at splitting long lines, although it does have the advantage that it will also fill short lines.

Lines that begin with the same number of blank spaces (or tabs) are formatted so that they retain those leading blanks after filling and splitting.

On some older Unix systems, fmt suffers from a bug that causes it to truncate input lines at some low number (such as 132 or 255 characters) before folding those lines, with the result that data is lost. On pangea, the input line length limit for fmt is at least 8000 characters, as determined by empirical testing.

pr - format a file for printing

The basic purpose is to add header and footer lines to input text before printing.

Can also be used to expand tabs, indent text, or print multiple columns on the page, either from a single file, or multiple files side by side. This means that you can create an output file with input files in multiple columns, for example:

pr -t -m firstfile secondfile | more

It does not fold or fill any lines - if lines are too long, the excess is simply lost when printing.

pr sends its output to the standard output - to actually print the formatted file, you must pipe the output to lpr.

enscript - format text for PostScript printers

enscript is similar to pr in that it adds headers and footers giving the file name, page number, etc. It is specifically designed to format and convert the text to PostScript to then be sent to a PostScript printer (basically, all laser printers in the School).

Unlike pr, enscript will automatically fold lines that are too long to fit within the width of the page, and continue them onto multiple output lines.

It has additional options to specify the font to be used and to allow reduced printing of two normal pages on one.

Examples of use:

fmt infile | pr | lpr -Pa65-laserjet
fold infile | pr | lpr -Pa65-laserjet
enscript -Pa65-laserjet infile

Comments or Questions?