Data compression programs
Last revision July 20, 2004
Data and text files tend to have repeating patterns of characters or bytes. Those files can be compressed by replacing the repeated characters with single control characters, using a dictionary at the beginning of the compressed file to show the correspondence.
Two sets of programs are available on pangea and most Unix systems to compress and restore files:
- compress and uncompress are built-in to all Unix systems that derive from the old Berkeley Unix.
- gzip and gunzip are public domain "GNU" programs from the Free Software Foundation. Not all Unix vendors include these programs in their distributions. You can download and compile the source code in that case.
Both programs work with individual files. They do not create compressed archives of multiple files or an entire directory. To make a compressed archive, first make the archive with the tar (or gtar) program, and then compress it. Better yet, use tar and one of these compression programs in a pipeline to save disk space and time. Read the on-line manual pages for these programs to get the correct options for pipeline use.
Using gzip and gunzip
These programs are preferred for compressing files on Unix systems because they provide a very efficient and fast compression algorithm, and they are freely available for installation on all systems.
Despite the similarity in name, the gzip program does not produce compressed files in the same format as Windows "Zip" archives.
To compress a regular file (or archive) in order to save space, simply run the gzip program with the name of the file as argument. gzip will make a new compressed version of the file using the same filename, but with the suffix .gz added. Then the original uncompressed file will be deleted. Obviously, to compress files, you must have write access in the directory where they live. Also, there must be enough free space on the disk to temporarily hold both the uncompressed and compressed versions while gzip is running.
You can also list several files at once in a single gzip command. Each will be separately compressed and replaced by the compressed version, with a .gz suffix added.
Unix programs generally cannot work directly with compressed files. Instead, you have to expand them back to their original content. To reverse the operation and expand a file that has been compressed by gzip, simply run the gunzip command, giving the compressed version filename (including the .gz suffix. The compressed file will be expanded back to its original contents and filename, and the compressed version with the ".gz" suffix will be deleted.
The same notes about write permissions and disk free space apply to gunzip as well as gzip. gunzip can also be told to expand multiple compressed files by listing them all on the same command line.
Using compress and uncompress
Only use these programs if the preferred gzip and gunzip programs are not available on your system, or if you have old files already compressed with compress
Like gzip, compress compresses the contents of the file (or files) specified as arguments on the command line, and then replaces that file on disk with the compressed version. The compressed version is given the suffix .Z to indicate that it is in compressed format.
To return the compressed file to its uncompressed state, you simply type
Here, the .Z suffix is a signal to uncompress that this file is compressed. Its contents will be expanded, and the compressed file will be replaced on disk with the uncompressed one, minus the .Z suffix.