3.6. Debugging multi-threaded programs with GDB

Debugging multi-threaded programs can be tricky due to the multiple streams of execution and due to interactions between the concurrently executing threads. In general, here are some things to make debugging multi-threaded programs a bit easier:

  • Try to debug a version of the program with as few threads as possible.

//out the executing thread's ID and then call `fflush(stdout)` after
//the call to `printf` to avoid any buffering of the `printf` output.
  • Limit the amount of debug output by having only one of the threads print its information and common information. For example, if each thread stores its logical ID in a local variable named my_tid, then a conditional statement on the my_tid can be used to limit printing debug output to one thread:

if (my_tid == 1) {
    printf("Tid:%d: value of count is %d and my i is %d\n", my_tid, count, i);
    fflush(stdout);
}

3.6.1. GDB and Pthreads

The GDB debugger has specific support for debugging threaded programs, including setting breakpoints for individual threads and examining the Stacks of individual threads. One thing to note when debugging Pthreads programs in GDB is that there are at least three identifiers for each thread:

  1. The Pthreads library’s ID for the thread (its pthread_t value).

  2. The operating system’s Lightweight Process (LWP) ID value for the thread. This ID is used in part for the OS to keep track of this thread for scheduling purposes.

  3. The GDB ID for the thread: this is the ID to use when specifying a specific thread in GDB commands.

The specific relationship between thread IDs can differ from one OS and Pthreads library implementation to another, but on most systems there is a one-to-one-to-one correspondence between a Pthreads ID, an LWP ID, and a GDB thread ID.

We present a few GDB basics for debugging threaded programs in GDB. See the following for more information about debugging threaded programs in GDB.

3.6.2. GDB thread-specific commands:

  • Enable printing thread start and exit events:

    set print thread-events
  • List all existing threads in the program (the GDB thread number is the first value listed and the thread that hit the breakpoint is denoted with an *):

    info threads
  • Switch to a specific thread’s execution context (e.g., to examine its stack when executing where), specify the thread by its thread ID:

    thread <threadno>
    
    thread 12         Switch to thread 12's execution context
    where             Thread 12's stack trace
  • Set a breakpoint for just a particular thread. Other threads executing at the point in the code where the breakpoint is set will not trigger the breakpoint to pause the program and print the GDB prompt:

    break <where> thread <threadno>
    
    break foo thread 12     Break when thread 12 executes function foo
  • To apply a specific GDB command to all or to a subset of threads, by adding the prefix thread apply <threadno | all> to a GDB command, where threadno refers to the GDB thread ID:

    thread apply <threadno|all> command

This doesn’t work for every GDB command, setting breakpoints in particular, so use this syntax instead for setting thread-specific breakpoints:

break <where> thread <threadno>

Upon reaching a breakpoint, by default, GDB pauses all threads until the user types cont. The user can change the behavior to request that GDB only pause the threads that hit a breakpoint, allowing other threads to continue executing.

3.6.3. Examples:

We show some GDB commands and output from a GDB run on the multi-threaded executable named racecond.c.

This errant program lacks synchronization around accesses to the shared variable count. As a result, different runs of the program produce different final values for count, indicating a race condition. For example, here are two runs of the program with 5 threads that produce different results:

./a.out 5
hello I'm thread 0 with pthread_id 139673141077760
hello I'm thread 3 with pthread_id 139673115899648
hello I'm thread 4 with pthread_id 139673107506944
hello I'm thread 1 with pthread_id 139673132685056
hello I'm thread 2 with pthread_id 139673124292352
count = 159276966

./a.out 5
hello I'm thread 0 with pthread_id 140580986918656
hello I'm thread 1 with pthread_id 140580978525952
hello I'm thread 3 with pthread_id 140580961740544
hello I'm thread 2 with pthread_id 140580970133248
hello I'm thread 4 with pthread_id 140580953347840
count = 132356636

The fix is to put accesses to count inside a critical section, using a pthread_mutex_t variable. If the user was not able to see this fix by examining the C code alone, then running in gdb and putting breakpoints around accesses to the count variable may help the programmer discover the problem.

Here is a list of some example commands from a GDB run of this program:

(gdb) break worker_loop     Set a breakpoint for all spawned threads
(gdb) break 77 thread 4     Set a breakpoint just for thread 4
(gdb) info threads          List information about all threads
(gdb) where                 List stack of thread that hit the breakpoint
(gdb) print i               List values of its local variable i
(gdb) thread 2              Switch to different thread's (2) context
(gdb) print i               List thread 2's local variables i

Shown below is partial output of a GDB run of the racecond program with 3 threads (run 3), showing examples of GDB thread commands in the context of a GDB debugging session. The main thread is always GDB thread number 1, and the 3 spawned threads are GDB threads 2 to 4.

When debugging multi-threaded programs, the GDB user must keep track of which threads exist when issuing commands. For example, when the breakpoint in main is hit, only thread 1 (the main thread) exists. As a result, the GDB user must wait until threads are created before setting a breakpoint for only a specific thread (this example shows setting a breakpoint for thread 4 only at line 77 in the program). In viewing this output, note when breakpoints are set and deleted, and note the value of each thread’s local variable i when thread contexts are switched with GDB’s thread command:

$ gcc -g racecond.c -lpthread

$ gdb ./a.out
(gdb) break main
Breakpoint 1 at 0x919: file racecond.c, line 28.
(gdb) run 3
Starting program: ...
[Thread debugging using libthread_db enabled] ...

Breakpoint 1, main (argc=2, argv=0x7fffffffe388) at racecond.c:28
28	    if (argc != 2) {
(gdb) list 76
71	  myid = *((int *)arg);
72
73	  printf("hello I'm thread %d with pthread_id %lu\n",
74	      myid, pthread_self());
75
76	  for (i = 0; i < 10000; i++) {
77	      count += i;
78	  }
79
80	  return (void *)0;

(gdb) break 76
Breakpoint 2 at 0x555555554b06: file racecond.c, line 76.
(gdb) cont
Continuing.

[New Thread 0x7ffff77c4700 (LWP 5833)]
hello I'm thread 0 with pthread_id 140737345505024
[New Thread 0x7ffff6fc3700 (LWP 5834)]
hello I'm thread 1 with pthread_id 140737337112320
[New Thread 0x7ffff67c2700 (LWP 5835)]
[Switching to Thread 0x7ffff77c4700 (LWP 5833)]

Thread 2 "a.out" hit Breakpoint 2, worker_loop (arg=0x555555757280)
    at racecond.c:76
76	  for (i = 0; i < 10000; i++) {
(gdb) delete 2

(gdb) break 77 thread 4
Breakpoint 3 at 0x555555554b0f: file racecond.c, line 77.
(gdb) cont
Continuing.

hello I'm thread 2 with pthread_id 140737328719616
[Switching to Thread 0x7ffff67c2700 (LWP 5835)]

Thread 4 "a.out" hit Breakpoint 3, worker_loop (arg=0x555555757288)
    at racecond.c:77
77	      count += i;
(gdb) print i
$2 = 0
(gdb) cont
Continuing.
[Switching to Thread 0x7ffff67c2700 (LWP 5835)]

Thread 4 "a.out" hit Breakpoint 3, worker_loop (arg=0x555555757288)
    at racecond.c:77
77	      count += i;
(gdb) print i
$4 = 1

(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff6fc3700 (LWP 5834))]
#0  0x0000555555554b12 in worker_loop (arg=0x555555757284) at racecond.c:77
77	      count += i;
(gdb) print i
$5 = 0

(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff77c4700 (LWP 5833))]
#0  worker_loop (arg=0x555555757280) at racecond.c:77
77	      count += i;
(gdb) print i
$6 = 1