Using dbx to see why a program is crashing
Last revision August 4, 2004
This brief tutorial shows how to use the dbx debugger on pangea to find out which functions and statements were being executed when a program crashed. This assumes that all conditions for use of dbx as outlined above have been met:
- Compiled program with -g debugging flag and no optimizations.
- Set coredumpsize shell resource limit large enough to allow core dump.
- Ran program that crashes from same directory as source code.
- Run debugger in same directory as source code.
First, start the dbx program and tell it to load your program and the core dump file. For example, if your program was named myprog, you would issue this command:
dbx myprog core
dbx will print various informative messages about itself and the program and then give you a (dbx) prompt. You type commands at this prompt for dbx to execute.
Type the dbx command where to request a "stack trace" of the program. This shows you a list of functions or routines that were active at the time of the crash. That is, it shows you the most recently called function, followed by the function that called it, etc., back down to the main program. For each function in this "stack", it lists the values of the calling arguments, and the line number in the source file that was being executed. For the most recently called function, that line number is the actual statement that was executing when the crash occurred. For the "calling" functions further down the stack, that line number is the statement that called the next function up in the stack.
For example, a small C program named dumpcore that dereferences a NULL pointer, causing a crash. It consists of a main program and a subroutine dumpcore, both stored in the single source file dumpcore.c in the pangea directory ~gp111ins/Errorprog. When it is run, it crashes and produces a core dump file. After starting dbx to look at this core file, if you ran the where command and got this output:
(dbx) where
> 0 dumpcore(lim = 5) ["dumpcore.c":17, 0x12000128c]
1 main() ["dumpcore.c":8, 0x12000121c]
This tells you that the crash occurred while the subroutine dumpcore was active, at the statement found in line 17 of the source code file dumpcore.c. This subroutine was called with the lim argument variable set to the value 5, from the main program at line 8 of the same source file.
In some cases, the actual crash may occur while a system routine is active, for example, while a disk read or write is active. In that case, there will be no information on the system routine variable names or line number, because those routines are stripped of symbolic debugging information to improve efficiency. However, you will be able to follow back down the stack to the line in your program that called the system routine. Most likely, the error will be due to invalid arguments supplied by your program.
Once you have the line number of the source file where the crash occurred, you can ask dbx to list the source lines around that point with the list command. For example, for the dumpcore program that crashed at line 17, you could give a command to list lines 15 through 20, with results as follows:
(dbx) list 15,20
15 {
16 if (lim >= LIMIT)
>* 17 *ip = lim;
18 }
The crashing line is highlighted for you with the >* symbols. You could also ask dbx to list all the source lines in the entire subroutine that was executing, if you needed more context, for example:
(dbx) list dumpcore
11 }
12 int *ip;
13 dumpcore(lim)
14 int lim;
15 {
16 if (lim >= LIMIT)
>* 17 *ip = lim;
18 }
Because the crash occurred while assigning a value to the integer memory location pointed to by *ip, you might want to see what its value actually was as a clue to the problem. You can see values of variables with the print command, for example:
(dbx) print *ip
can't read from process (address 0x0)
(dbx) print ip
(nil)
Trying to print the value of the actual memory location (*ip) gave me an error message. Printing the value of the pointer itself (the address of the memory location) showed the problem. This pointer has the value (nil). That is, it does not point to any location in memory, so the assignment statement cannot work. Looking at my whole program again, you can see that this pointer was never initialized. That is the bug in this program.
Use the quit command to end your dbx debugging session. Don't forget to remove the core file when you are done in order to save disk space. On pangea, a daily system management program removes any file named "core" in the user disks that is more than three days old.
A longer tutorial for dbx using this same dumpcore example can be found in the article "Debugging with dbx", number PS1:11 in the 4.3BSD Unix Programmer's Manual Supplementary Documents, Volume 1. That book is on permanent reserve in Branner Library. This article goes on to illustrate how the program can be run under the control of dbx, with "stop" points set up to stop execution at a certain point in the program near where the error occurred, and then step through the program line by line, examining variable values as you go. That can be a very powerful way to understand what is actually happening in your program. You can even use dbx to force variables to take on other values in order to see what effect that has on the program execution, such as whether it correctly handles "out of bounds" cases.
Please note, however, that the specific syntax of many dbx commands on pangea differs from the version described in this tutorial article. In general, different Unix systems have different versions of the dbx program, with possibly different command syntaxes. Use the dbx online manual entry to see complete information on the commands available inside dbx to do things like set breakpoints, turn on statement tracing, step through the program, examine variable values, etc. On pangea, and many other systems, there is also on-line help for all commands accessible from the (dbx) prompt. Simply type the command help to see a list of topics.