3.3. Debugging Memory with Valgrind
Valgrind’s Memcheck debugging tool highlights heap memory errors in
programs. Heap memory is the part of a running program’s memory that is
dynamically allocated by calls to malloc()
and freed by calls to free()
in C
programs. The types of memory errors that Valgrind finds include:
-
Reading (getting) a value from uninitialized memory. For example:
int *ptr, x; ptr = malloc(sizeof(int) * 10); x = ptr[3]; // reading from uninitialized memory
-
Reading (getting) or writing (setting) a value at an unallocated memory location, which often indicates an array out-of-bounds error. For example:
ptr[11] = 100; // writing to unallocated memory (no 11th element) x = ptr[11]; // reading from unallocated memory
-
Freeing already freed memory. For example:
free(ptr); free(ptr); // freeing the same pointer a second time
-
Memory leaks. A memory leak is a chunk of allocated heap memory space that is not referred to by any pointer variable in the program, and thus it cannot be freed. That is, a memory leak occurs when a program loses the address of an allocated chunk of heap space. For example:
ptr = malloc(sizeof(int) * 10); ptr = malloc(sizeof(int) * 5); // memory leak of first malloc of 10 ints
Memory leaks can eventually cause the program to run out of heap memory
space, resulting in subsequent calls to malloc()
failing.
The other types of memory access errors, such as invalid reads
and writes, can lead to the program crashing or can result in some program
memory contents being modified in seemingly mysterious ways.
Memory access errors are some of the most difficult bugs to find in programs. Often a memory access error does not immediately result in a noticeable error in the program’s execution. Instead, it may trigger an error that occurs later in the execution, often in a part of the program that seemingly has little to do with the source of the error. At other times, a program with a memory access error may run correctly on some inputs and crash on other inputs, making the cause of the error difficult to find and fix.
Using Valgrind helps a programmer identify these difficult to find and fix heap memory access errors, saving significant amounts of debugging time and effort. Valgrind also assists the programmer in identifying any lurking heap memory errors that were not discovered in the testing and debugging of their code.
3.3.1. An Example Program with a Heap Memory Access Error
As an example of how difficult it can be to discover and fix programs with
memory access errors, consider the following small program
(bigfish.c). This program exhibits a
"write to unallocated heap memory" error in the second for
loop, when it
assigns values beyond the bounds of the bigfish
array (note: the listing
includes source code line numbers, and the print_array()
function definition is
not shown, but it behaves as described):
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 /* print size elms of array p with name name */
5 void print_array(int *p, int size, char *name) ;
6
7 int main(int argc, char *argv[]) {
8 int *bigfish, *littlefish, i;
9
10 // allocate space for two int arrays
11 bigfish = (int *)malloc(sizeof(int) * 10);
12 littlefish = (int *)malloc(sizeof(int) * 10);
13 if (!bigfish || !littlefish) {
14 printf("Error: malloc failed\n");
15 exit(1);
16 }
17 for (i=0; i < 10; i++) {
18 bigfish[i] = 10 + i;
19 littlefish[i] = i;
20 }
21 print_array(bigfish,10, "bigfish");
22 print_array(littlefish,10, "littlefish");
23
24 // here is a heap memory access error
25 // (write beyond bounds of allocated memory):
26 for (i=0; i < 13; i++) {
27 bigfish[i] = 66 + i;
28 }
29 printf("\nafter loop:\n");
30 print_array(bigfish,10, "bigfish");
31 print_array(littlefish,10, "littlefish");
32
33 free(bigfish);
34 free(littlefish); // program will crash here
35 return 0;
36 }
In the main()
function, the second for
loop causes a heap memory access error when it
writes to three indices beyond the bounds of the bigfish
array (to indices
10, 11, and 12). The program does not crash at the point where the error
occurs (at the execution of the second for
loop); instead, it crashes later in its
execution at the call to free(littlefish)
:
bigfish: 10 11 12 13 14 15 16 17 18 19 littlefish: 0 1 2 3 4 5 6 7 8 9 after loop: bigfish: 66 67 68 69 70 71 72 73 74 75 littlefish: 78 1 2 3 4 5 6 7 8 9 Segmentation fault (core dumped)
Running this program in GDB indicates that the program crashes with a segfault
at the call to free(littlefish)
. Crashing at this point may make the
programmer suspect that there is a bug with accesses to the littlefish
array.
However, the cause of the error is due to writes to the bigfish
array and
has nothing to do with errors in how the program accesses the littlefish
array.
The most likely reason that the program crashes is that the for
loop goes beyond the
bounds of the bigfish
array and overwrites memory between
the heap memory location of the last allocated element of bigfish
and the
first allocated element of littlefish
. The heap memory locations between
the two (and right before the first element of littlefish
) are used
by malloc()
to store meta-data about the heap memory allocated
for the littlefish
array. Internally, the free()
function uses this
meta-data to determine how much heap memory to free. The modifications
to indices 10
and 11
of bigfish
overwrite these meta-data
values, resulting in the program crash on the call to free(littlefish)
.
We note, however, that not all implementations of the malloc()
function use this strategy.
Because the program includes code to print
out littlefish
after the memory access error to bigfish
,
the cause of the error may be more obvious to the programmer:
the second for
loop is somehow modifying the contents of the
littlefish
array (its element 0 value "mysteriously" changes
from 0
to 78
after the loop). However,
even in this very small program, it may be difficult to find the real
error: if the program didn’t
print out littlefish
after the second for
loop with the memory
access error, or if the for
loop upper bound was 12
instead of 13
,
there would be no visible mysterious change to program variable values that
could help a programmer see that there is an error with how the program
accesses the bigfish
array.
In larger programs, a memory access error of this type could be in a very different part of the program code than the part that crashes. There also may be no logical association between variables used to access heap memory that has been corrupted and the variables that were used to erroneously overwrite that same memory; instead, their only association is that they happen to refer to memory addresses that are allocated close together in the heap. Note that this situation can vary from run to run of a program and that such behavior is often hidden from the programmer. Similarly, sometimes bad memory accesses will have no noticeable effect on a run of the program, making these errors hard to discover. Whenever a program seems to run fine for some input, but crashes on other input, this is a sign of a memory access error in the program.
Tools like Valgrind can save days of debugging time by
quickly pointing programmers to the source and type of heap memory
access errors in their code. In the previous program, Valgrind
delineates the point where the error occurs (when
the program accesses elements beyond the bounds of the bigfish
array).
The Valgrind error message includes the type of error,
the point in the program where the error occurs, and where
in the program the heap memory near the bad memory access was allocated.
For example, here is the information Valgrind will display when
the program executes line 27 (some details from the actual Valgrind
error message are omitted):
Invalid write at main (bigfish.c:27) Address is 0 bytes after a block of size 40 alloc'd by main (bigfish.c:11)
This Valgrind error message says that the program is writing to invalid
(unallocated) heap memory at line 27 and that this invalid memory is located
immediately after a block of memory that was allocated at line 11, indicating
that the loop is accessing some elements beyond the bounds of the allocated memory
in heap space to which bigfish
points.
A potential fix to this bug is to
either increase the number of bytes passed to malloc()
or change the
second for
loop bounds to avoid writing beyond the bounds of the allocated
heap memory space.
In addition to finding memory access errors in heap memory, Valgrind can also find some errors with stack memory accesses, such as using uninitialized local variables or trying to access stack memory locations that are beyond the bounds of the current stack. However, Valgrind does not detect stack memory access errors at the same granularity as it does with heap memory, and it does not detect memory access errors with global data memory.
A program can have memory access errors with stack and global memory that Valgrind cannot find. However, these errors result in erroneous program behavior or program crashing that is similar to the behavior that can occur with heap memory access errors. For example, overwriting memory locations beyond the bounds of a statically declared array on the stack may result in "mysteriously" changing the values of other local variables or may overwrite state saved on the stack that is used for returning from a function call, leading to a crash when the function returns. Experience using Valgrind for heap memory errors can help a programmer identify and fix similar errors with accesses to stack and global memory.
3.3.2. How to Use Memcheck
We illustrate some of the main features of Valgrind’s Memcheck memory analysis tool on
an example program, valgrindbadprog.c,
which contains several bad memory access errors (comments in the code describe
the type of error). Valgrind runs the Memcheck tool by default; we depend on
this default behavior in the code snippets that follow. You can explicitly specify the
Memcheck tool by using the --tool=memcheck
option. In later sections, we will
invoke other Valgrind profiling tools by invoking the --tool
option.
To run Memcheck, first compile the valgrindbadprog.c
program with the -g
flag to add debugging
information to the executable (e.g., a.out
) file. Then, run the
executable with valgrind
. Note that for non-interactive programs, it may be helpful to
redirect Valgrind’s output to a file for viewing after the program exits:
$ gcc -g valgrindbadprog.c $ valgrind -v ./a.out # re-direct valgrind (and a.out) output to file 'output.txt' $ valgrind -v ./a.out >& output.txt # view program and valgrind output saved to out file $ vim output.txt
Valgrind’s Memcheck tool prints out memory access errors and warnings as they occur
during the program’s execution. At the end of the program’s execution,
Memcheck also prints out a summary about any memory leaks in the program.
Even though memory leaks are important to fix, the other types of memory access errors
are much more critical to a program’s correctness. As a result, unless
memory leaks are causing a program to run out of heap memory space and
crash, a programmer should focus first on fixing these other types of
memory access errors before considering memory leaks. To view details
of individual memory leaks, use the --leak-check=yes
option.
When first using Valgrind, its output may seem a bit difficult
to parse. However, the output all follows the same basic format, and
once you know this format, it’s easier to understand the
information that Valgrind is displaying about heap memory
access errors and warnings. Here is an example Valgrind error from
a run of the valgrindbadprog.c
program:
==31059== Invalid write of size 1 ==31059== at 0x4006C5: foo (valgrindbadprog.c:29) ==31059== by 0x40079A: main (valgrindbadprog.c:56) ==31059== Address 0x52045c5 is 0 bytes after a block of size 5 alloc'd ==31059== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/...) ==31059== by 0x400660: foo (valgrindbadprog.c:18) ==31059== by 0x40079A: main (valgrindbadprog.c:56)
Each line of Valgrind output is prefixed with the process’s ID (PID) number (31059 in this example):
==31059==
Most Valgrind errors and warnings have the following format:
-
The type of error or warning.
-
Where the error occurred (a stack trace at the point in the program’s execution when the error occurs.)
-
Where heap memory around the error was allocated (usually the memory allocation related to the error.)
In the preceding example error, the first line indicates an invalid write to memory (writing to unallocated memory in the heap — a very bad error!):
==31059== Invalid write of size 1
The next few lines show the stack trace where the error occurred.
These indicate an invalid write occurred at line 29 in
function foo()
, which was called from function main()
at line 56:
==31059== Invalid write of size 1 ==31059== at 0x4006C5: foo (valgrindbadprog.c:29) ==31059== by 0x40079A: main (valgrindbadprog.c:56)
The remaining lines indicate where the heap space near the
invalid write was allocated in the program.
This section of Valgrind’s output says that
the invalid write was immediately after (0 bytes after
) a block of 5
bytes of heap memory space that was allocated by a call to malloc()
at line 18 in function foo()
, called by main()
at line 56:
==31059== Address 0x52045c5 is 0 bytes after a block of size 5 alloc'd ==31059== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/...) ==31059== by 0x400660: foo (valgrindbadprog.c:18) ==31059== by 0x40079A: main (valgrindbadprog.c:56)
The information from this error identifies that there is an unallocated heap memory write error in the program, and it directs the user to specific parts of the program where the error occurs (line 29) and where memory around the error was allocated (line 18). By looking at these points in the program, the programmer may see the cause of and the fix for the error:
18 c = (char *)malloc(sizeof(char) * 5);
...
22 strcpy(c, "cccc");
...
28 for (i = 0; i <= 5; i++) {
29 c[i] = str[i];
30 }
The cause is that the for
loop executes one time too many,
accessing c[5]
, which is beyond the end of array c
.
The fix is to either change the loop bounds at line 29 or to
allocate a larger array at line 18.
If examining the code around a Valgrind error is not sufficient
for a programmer to understand or fix the error, using GDB
might be helpful. Setting breakpoints around the points in the code
associated with the Valgrind errors can help a programmer
evaluate the program’s runtime state and understand the cause of
the Valgrind error. For example, by putting a breakpoint at line 29 and printing
the values of i
and str
, the programmer can see the
array out-of-bounds error when i
is 5. In this case, the
combination of using Valgrind and GDB helps the programmer
determine how to fix the memory access bugs that Valgrind finds.
Although this chapter has focused on Valgrind’s default Memcheck tool, we characterize some of Valgrind’s other capabilities later in the book, including the Cachegrind cache profiling tool (Chapter 11), the Callgrind code profiling tool (Chapter 12), and the Massif memory profiling tool (Chapter 12). For more information about using Valgrind, see the Valgrind homepage, and its online manual.