3.3. Debugging Memory with Valgrind

Valgrind’s Memcheck debugging tool highlights heap memory errors in programs. Heap memory is the part of a running program’s memory that is dynamically allocated by calls to malloc() and freed by calls to free() in C programs. The types of memory errors that Valgrind finds include:

  • Reading (getting) a value from uninitialized memory. For example:

    int *ptr, x;
    ptr = malloc(sizeof(int) * 10);
    x = ptr[3];    // reading from uninitialized memory
  • Reading (getting) or writing (setting) a value at an unallocated memory location, which often indicates an array out-of-bounds error. For example:

    ptr[11] = 100;  // writing to unallocated memory (no 11th element)
    x = ptr[11];    // reading from unallocated memory
  • Freeing already freed memory. For example:

    free(ptr);
    free(ptr); // freeing the same pointer a second time
  • Memory leaks. A memory leak is a chunk of allocated heap memory space that is not referred to by any pointer variable in the program, and thus it cannot be freed. That is, a memory leak occurs when a program loses the address of an allocated chunk of heap space. For example:

    ptr = malloc(sizeof(int) * 10);
    ptr = malloc(sizeof(int) * 5);  // memory leak of first malloc of 10 ints

Memory leaks can eventually cause the program to run out of heap memory space, resulting in subsequent calls to malloc() failing. The other types of memory access errors, such as invalid reads and writes, can lead to the program crashing or can result in some program memory contents being modified in seemingly mysterious ways.

Memory access errors are some of the most difficult bugs to find in programs. Often a memory access error does not immediately result in a noticeable error in the program’s execution. Instead, it may trigger an error that occurs later in the execution, often in a part of the program that seemingly has little to do with the source of the error. At other times, a program with a memory access error may run correctly on some inputs and crash on other inputs, making the cause of the error difficult to find and fix.

Using Valgrind helps a programmer identify these difficult to find and fix heap memory access errors, saving significant amounts of debugging time and effort. Valgrind also assists the programmer in identifying any lurking heap memory errors that were not discovered in the testing and debugging of their code.

3.3.1. An Example Program with a Heap Memory Access Error

As an example of how difficult it can be to discover and fix programs with memory access errors, consider the following small program (bigfish.c). This program exhibits a "write to unallocated heap memory" error in the second for loop, when it assigns values beyond the bounds of the bigfish array (note: the listing includes source code line numbers, and the print_array() function definition is not shown, but it behaves as described):

 1  #include <stdio.h>
 2  #include <stdlib.h>
 3
 4  /* print size elms of array p with name name */
 5  void print_array(int *p, int size, char *name) ;
 6
 7  int main(int argc, char *argv[]) {
 8      int *bigfish, *littlefish, i;
 9
10      // allocate space for two int arrays
11      bigfish = (int *)malloc(sizeof(int) * 10);
12      littlefish = (int *)malloc(sizeof(int) * 10);
13      if (!bigfish || !littlefish) {
14          printf("Error: malloc failed\n");
15          exit(1);
16      }
17      for (i=0; i < 10; i++) {
18          bigfish[i] = 10 + i;
19          littlefish[i] = i;
20      }
21      print_array(bigfish,10, "bigfish");
22      print_array(littlefish,10, "littlefish");
23
24      // here is a heap memory access error
25      // (write beyond bounds of allocated memory):
26      for (i=0; i < 13; i++) {
27          bigfish[i] = 66 + i;
28      }
29      printf("\nafter loop:\n");
30      print_array(bigfish,10, "bigfish");
31      print_array(littlefish,10, "littlefish");
32
33      free(bigfish);
34      free(littlefish);  // program will crash here
35      return 0;
36  }

In the main() function, the second for loop causes a heap memory access error when it writes to three indices beyond the bounds of the bigfish array (to indices 10, 11, and 12). The program does not crash at the point where the error occurs (at the execution of the second for loop); instead, it crashes later in its execution at the call to free(littlefish):

bigfish:
 10  11  12  13  14  15  16  17  18  19
littlefish:
  0   1   2   3   4   5   6   7   8   9

after loop:
bigfish:
 66  67  68  69  70  71  72  73  74  75
littlefish:
 78   1   2   3   4   5   6   7   8   9
Segmentation fault (core dumped)

Running this program in GDB indicates that the program crashes with a segfault at the call to free(littlefish). Crashing at this point may make the programmer suspect that there is a bug with accesses to the littlefish array. However, the cause of the error is due to writes to the bigfish array and has nothing to do with errors in how the program accesses the littlefish array.

The most likely reason that the program crashes is that the for loop goes beyond the bounds of the bigfish array and overwrites memory between the heap memory location of the last allocated element of bigfish and the first allocated element of littlefish. The heap memory locations between the two (and right before the first element of littlefish) are used by malloc() to store meta-data about the heap memory allocated for the littlefish array. Internally, the free() function uses this meta-data to determine how much heap memory to free. The modifications to indices 10 and 11 of bigfish overwrite these meta-data values, resulting in the program crash on the call to free(littlefish). We note, however, that not all implementations of the malloc() function use this strategy.

Because the program includes code to print out littlefish after the memory access error to bigfish, the cause of the error may be more obvious to the programmer: the second for loop is somehow modifying the contents of the littlefish array (its element 0 value "mysteriously" changes from 0 to 78 after the loop). However, even in this very small program, it may be difficult to find the real error: if the program didn’t print out littlefish after the second for loop with the memory access error, or if the for loop upper bound was 12 instead of 13, there would be no visible mysterious change to program variable values that could help a programmer see that there is an error with how the program accesses the bigfish array.

In larger programs, a memory access error of this type could be in a very different part of the program code than the part that crashes. There also may be no logical association between variables used to access heap memory that has been corrupted and the variables that were used to erroneously overwrite that same memory; instead, their only association is that they happen to refer to memory addresses that are allocated close together in the heap. Note that this situation can vary from run to run of a program and that such behavior is often hidden from the programmer. Similarly, sometimes bad memory accesses will have no noticeable effect on a run of the program, making these errors hard to discover. Whenever a program seems to run fine for some input, but crashes on other input, this is a sign of a memory access error in the program.

Tools like Valgrind can save days of debugging time by quickly pointing programmers to the source and type of heap memory access errors in their code. In the previous program, Valgrind delineates the point where the error occurs (when the program accesses elements beyond the bounds of the bigfish array). The Valgrind error message includes the type of error, the point in the program where the error occurs, and where in the program the heap memory near the bad memory access was allocated. For example, here is the information Valgrind will display when the program executes line 27 (some details from the actual Valgrind error message are omitted):

Invalid write
 at main (bigfish.c:27)
 Address is 0 bytes after a block of size 40 alloc'd
   by main (bigfish.c:11)

This Valgrind error message says that the program is writing to invalid (unallocated) heap memory at line 27 and that this invalid memory is located immediately after a block of memory that was allocated at line 11, indicating that the loop is accessing some elements beyond the bounds of the allocated memory in heap space to which bigfish points. A potential fix to this bug is to either increase the number of bytes passed to malloc() or change the second for loop bounds to avoid writing beyond the bounds of the allocated heap memory space.

In addition to finding memory access errors in heap memory, Valgrind can also find some errors with stack memory accesses, such as using uninitialized local variables or trying to access stack memory locations that are beyond the bounds of the current stack. However, Valgrind does not detect stack memory access errors at the same granularity as it does with heap memory, and it does not detect memory access errors with global data memory.

A program can have memory access errors with stack and global memory that Valgrind cannot find. However, these errors result in erroneous program behavior or program crashing that is similar to the behavior that can occur with heap memory access errors. For example, overwriting memory locations beyond the bounds of a statically declared array on the stack may result in "mysteriously" changing the values of other local variables or may overwrite state saved on the stack that is used for returning from a function call, leading to a crash when the function returns. Experience using Valgrind for heap memory errors can help a programmer identify and fix similar errors with accesses to stack and global memory.

3.3.2. How to Use Memcheck

We illustrate some of the main features of Valgrind’s Memcheck memory analysis tool on an example program, valgrindbadprog.c, which contains several bad memory access errors (comments in the code describe the type of error). Valgrind runs the Memcheck tool by default; we depend on this default behavior in the code snippets that follow. You can explicitly specify the Memcheck tool by using the --tool=memcheck option. In later sections, we will invoke other Valgrind profiling tools by invoking the --tool option.

To run Memcheck, first compile the valgrindbadprog.c program with the -g flag to add debugging information to the executable (e.g., a.out) file. Then, run the executable with valgrind. Note that for non-interactive programs, it may be helpful to redirect Valgrind’s output to a file for viewing after the program exits:

$ gcc -g valgrindbadprog.c
$ valgrind -v ./a.out

# re-direct valgrind (and a.out) output to file 'output.txt'
$ valgrind -v ./a.out >& output.txt

# view program and valgrind output saved to out file
$ vim output.txt

Valgrind’s Memcheck tool prints out memory access errors and warnings as they occur during the program’s execution. At the end of the program’s execution, Memcheck also prints out a summary about any memory leaks in the program. Even though memory leaks are important to fix, the other types of memory access errors are much more critical to a program’s correctness. As a result, unless memory leaks are causing a program to run out of heap memory space and crash, a programmer should focus first on fixing these other types of memory access errors before considering memory leaks. To view details of individual memory leaks, use the --leak-check=yes option.

When first using Valgrind, its output may seem a bit difficult to parse. However, the output all follows the same basic format, and once you know this format, it’s easier to understand the information that Valgrind is displaying about heap memory access errors and warnings. Here is an example Valgrind error from a run of the valgrindbadprog.c program:

==31059== Invalid write of size 1
==31059==    at 0x4006C5: foo (valgrindbadprog.c:29)
==31059==    by 0x40079A: main (valgrindbadprog.c:56)
==31059==  Address 0x52045c5 is 0 bytes after a block of size 5 alloc'd
==31059==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/...)
==31059==    by 0x400660: foo (valgrindbadprog.c:18)
==31059==    by 0x40079A: main (valgrindbadprog.c:56)

Each line of Valgrind output is prefixed with the process’s ID (PID) number (31059 in this example):

==31059==

Most Valgrind errors and warnings have the following format:

  1. The type of error or warning.

  2. Where the error occurred (a stack trace at the point in the program’s execution when the error occurs.)

  3. Where heap memory around the error was allocated (usually the memory allocation related to the error.)

In the preceding example error, the first line indicates an invalid write to memory (writing to unallocated memory in the heap — a very bad error!):

==31059== Invalid write of size 1

The next few lines show the stack trace where the error occurred. These indicate an invalid write occurred at line 29 in function foo(), which was called from function main() at line 56:

==31059== Invalid write of size 1
==31059==    at 0x4006C5: foo (valgrindbadprog.c:29)
==31059==    by 0x40079A: main (valgrindbadprog.c:56)

The remaining lines indicate where the heap space near the invalid write was allocated in the program. This section of Valgrind’s output says that the invalid write was immediately after (0 bytes after) a block of 5 bytes of heap memory space that was allocated by a call to malloc() at line 18 in function foo(), called by main() at line 56:

==31059==  Address 0x52045c5 is 0 bytes after a block of size 5 alloc'd
==31059==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/...)
==31059==    by 0x400660: foo (valgrindbadprog.c:18)
==31059==    by 0x40079A: main (valgrindbadprog.c:56)

The information from this error identifies that there is an unallocated heap memory write error in the program, and it directs the user to specific parts of the program where the error occurs (line 29) and where memory around the error was allocated (line 18). By looking at these points in the program, the programmer may see the cause of and the fix for the error:

 18   c = (char *)malloc(sizeof(char) * 5);
 ...
 22   strcpy(c, "cccc");
 ...
 28   for (i = 0; i <= 5; i++) {
 29       c[i] = str[i];
 30   }

The cause is that the for loop executes one time too many, accessing c[5], which is beyond the end of array c. The fix is to either change the loop bounds at line 29 or to allocate a larger array at line 18.

If examining the code around a Valgrind error is not sufficient for a programmer to understand or fix the error, using GDB might be helpful. Setting breakpoints around the points in the code associated with the Valgrind errors can help a programmer evaluate the program’s runtime state and understand the cause of the Valgrind error. For example, by putting a breakpoint at line 29 and printing the values of i and str, the programmer can see the array out-of-bounds error when i is 5. In this case, the combination of using Valgrind and GDB helps the programmer determine how to fix the memory access bugs that Valgrind finds.

Although this chapter has focused on Valgrind’s default Memcheck tool, we characterize some of Valgrind’s other capabilities later in the book, including the Cachegrind cache profiling tool (Chapter 11), the Callgrind code profiling tool (Chapter 12), and the Massif memory profiling tool (Chapter 12). For more information about using Valgrind, see the Valgrind homepage, and its online manual.