Reading and writing raw data files with dd
Last revision July 20, 2004
dd is used to create large data sets or transfer data to non-UNIX systems in a raw format that you can control. dd only copies the contents of the files; it does not save any information about the filename used on disk, ownership, etc.
dd is actually a general purpose data copying and converting program that can be used with either disk or tape files.
The dd command takes arguments that all follow the format of
option=value
The most important options for reading and writing tapes are:
if=infile | Specify the input file in place of infile. Use a normal filename for disk. For tape, use the appropriate device name, such as /dev/nrmt1h. |
of=outfile | Specify the output file in place of outfile. Same considerations as for if option. |
ibs=n | Input block size equals n bytes. Use only when reading tape files (system knows disk block sizes). The abbreviation k can be used for Kilobytes (1024 bytes), for example, 10k. |
obs=n | When writing the output file, use a block size equal to n bytes. Use when writing tapes to indicate what output block size to use. Large sizes make more efficient use of the tape, but many tape drives cannot handle sizes larger than 65,536 bytes (64k) and many programs cannot handle more than 10,240 bytes (10k). dd will read multiple input blocks or break input blocks as needed to put the correct amount of data in each output block. Kilobytes can be abbreviated with the suffix k as for input block size. |
cbs=n | Use a conversion buffer size equal to n bytes. Use when specifying a conversion (see below) to indicate how many bytes are operated on at one time. This is especially important for the "block" conversion to indicate "line" length in the input. Kilobytes can be abbreviated with the suffix k as for input block size. |
conv=type | Here, "type" specifies a type of data conversion to perform. Several different conversions can be specified, separated by commas. Particularly useful conversions are: |
ibm | Converts normal ASCII text encoding to the "EBCDIC" codes understood by IBM mainframe computers. |
ascii | Converts EBCDIC codes from tapes written by IBM mainframe computers to ASCII codes. |
block |
Converts variable length records to fixed length. Good if sending a
Unix text file on disk (such as a program source file), which separates lines
with new-line characters, to an IBM mainframe computer
that wants to deal with lines of
fixed lengths. To use this, specify the
cbs
option to be the fixed "line" length (for
example, 80 bytes, which is the width of the terminal screen). Then each Unix
line will be read, the new-line character removed, and blanks added to pad out
to the length specified with the
cbs
option before being sent to the output. Input lines will be
truncated if they are longer than the
cbs option specification.
For example, to make a fixed-line length version of a Fortran
source file created with vi that has varying length lines, none longer than 80
characters, use:
dd if=file.f cbs=80 conv=ibm,block of=fixedlengthfile.f Here, you also used the ibm conversion option to get EBCDIC characters so you could send this copy to an IBM mainframe computer. Every line in this file will be exactly 80 bytes long, padded with blanks if needed, with no Unix "newline" characters. If you want to put the file onto tape, use an output block size (obs option) that is an integral multiple of the conversion buffer size (cbs option). |
unblock | Reverse the action of block. Useful for
reading files from IBM mainframe computers where records are fixed length instead
of having new-line characters. Use the cbs option to indicate the
input "line length". That many bytes will be read, trailing blanks removed, and
a newline character added. You can "undo" the example for the block
option above with:
dd if=fixedlengthfile.f cbs=80 conv=ascii,unblock of=file.f |