Reading and writing raw data files with dd



Last revision July 20, 2004

dd is used to create large data sets or transfer data to non-UNIX systems in a raw format that you can control. dd only copies the contents of the files; it does not save any information about the filename used on disk, ownership, etc.

dd is actually a general purpose data copying and converting program that can be used with either disk or tape files.

The dd command takes arguments that all follow the format of

option=value

The most important options for reading and writing tapes are:

if=infile Specify the input file in place of infile. Use a normal filename for disk. For tape, use the appropriate device name, such as /dev/nrmt1h.
of=outfile Specify the output file in place of outfile. Same considerations as for if option.
ibs=n Input block size equals n bytes. Use only when reading tape files (system knows disk block sizes). The abbreviation k can be used for Kilobytes (1024 bytes), for example, 10k.
obs=n When writing the output file, use a block size equal to n bytes. Use when writing tapes to indicate what output block size to use. Large sizes make more efficient use of the tape, but many tape drives cannot handle sizes larger than 65,536 bytes (64k) and many programs cannot handle more than 10,240 bytes (10k). dd will read multiple input blocks or break input blocks as needed to put the correct amount of data in each output block. Kilobytes can be abbreviated with the suffix k as for input block size.
cbs=n Use a conversion buffer size equal to n bytes. Use when specifying a conversion (see below) to indicate how many bytes are operated on at one time. This is especially important for the "block" conversion to indicate "line" length in the input. Kilobytes can be abbreviated with the suffix k as for input block size.
conv=type Here, "type" specifies a type of data conversion to perform. Several different conversions can be specified, separated by commas. Particularly useful conversions are:
ibm Converts normal ASCII text encoding to the "EBCDIC" codes understood by IBM mainframe computers.
ascii Converts EBCDIC codes from tapes written by IBM mainframe computers to ASCII codes.
block Converts variable length records to fixed length. Good if sending a Unix text file on disk (such as a program source file), which separates lines with new-line characters, to an IBM mainframe computer that wants to deal with lines of fixed lengths. To use this, specify the cbs option to be the fixed "line" length (for example, 80 bytes, which is the width of the terminal screen). Then each Unix line will be read, the new-line character removed, and blanks added to pad out to the length specified with the cbs option before being sent to the output. Input lines will be truncated if they are longer than the cbs option specification. For example, to make a fixed-line length version of a Fortran source file created with vi that has varying length lines, none longer than 80 characters, use:

dd if=file.f cbs=80 conv=ibm,block of=fixedlengthfile.f

Here, you also used the ibm conversion option to get EBCDIC characters so you could send this copy to an IBM mainframe computer. Every line in this file will be exactly 80 bytes long, padded with blanks if needed, with no Unix "newline" characters. If you want to put the file onto tape, use an output block size (obs option) that is an integral multiple of the conversion buffer size (cbs option).

unblock Reverse the action of block. Useful for reading files from IBM mainframe computers where records are fixed length instead of having new-line characters. Use the cbs option to indicate the input "line length". That many bytes will be read, trailing blanks removed, and a newline character added. You can "undo" the example for the block option above with:

dd if=fixedlengthfile.f cbs=80 conv=ascii,unblock of=file.f

Comments or Questions?