12. Code Optimization
Code optimization is the process by which a program is improved by reducing its code size, complexity, memory use, or runtime (or some combination thereof) without changing the program’s inherent function. Many compilation systems include a code optimizer as an intermediate step. Specifically, an optimizing compiler applies code-improving transformations as part of the compilation process. Virtually all modern compilers (including GCC) are optimizing compilers. The GCC C compiler implements a wide variety of optimization flags that give programmers direct access to a subset of the implemented optimizations. Compiler optimization flags optimize code at the expense of compile time and ease of debugging. For simplicity, GCC wraps up a subset of these optimization flags into different optimization levels that the programmer can directly invoke. For example, the following command compiles a sample program with level 1 optimizations:
$ gcc -O1 -o program program.c
The level 1 (-O1
or -O
) optimizations in GCC perform basic optimizations to
reduce code size and execution time while attempting to keep compile time to a
minimum. Level 2 (-O2
) optimizations include most of GCC’s implemented
optimizations that do not involve a space-performance trade-off. Lastly, level
3 (-O3
) performs additional optimizations (such as function inlining,
discussed later in this chapter), and may cause the program to take
significantly longer to compile.
The GCC documentation
describes the implemented optimization flags in detail.
A detailed discussion of optimizing compilers and their construction and operation is beyond the scope of this textbook; we encourage interested readers to check out the seminal text, Compilers: Principles, Techniques, and Tools, by Aho, Sethi, and Ulman. Rather, the purpose of the chapter is to highlight some things that most compilers can (and cannot) do, and how programmers can partner with their compilers and profiling tools to help improve their code.
What Compilers Already Do
Several of the common optimizations performed by virtually every compiler are described briefly in the upcoming sections. Students should never manually implement these optimizations, because they are already implemented by the compiler.
- Constant Folding
-
Constants in the code are evaluated at compile time to reduce the number of resulting instructions. For example, in the code snippet that follows, macro expansion replaces the statement
int debug = N-5
withint debug = 5-5
. Constant folding then updates this statement toint debug = 0
.
#define N 5
int debug = N - 5; //constant folding changes this statement to debug = 0;
- Constant Propagation
-
Constant propagation replaces variables with a constant value if such a value is known at compile time. Consider the following code segment:
int debug = 0;
//sums up all the elements in an array
int doubleSum(int *array, int length){
int i, total = 0;
for (i = 0; i < length; i++){
total += array[i];
if (debug) {
printf("array[%d] is: %d\n", i, array[i]);
}
}
return 2 * total;
}
A compiler employing constant propagation will change if (debug)
to if (0)
.
- Dead Code Elimination
-
It is not uncommon for a program to be littered with unused variables, assignments, or statements. Even though these unneeded statements are rarely introduced intentionally, they are often a natural by-product of the constant iteration and refinement of the software development cycle. If left undetected, these so-called dead code sequences can cause compilers to output unnecessary assembly instructions that in turn waste processing time. Most compilers employ techniques such as dataflow analysis to identify unreachable code segments and thereby remove them. Dead code elimination often makes a program faster by shrinking code size and the associated set of instructions. As an example, let’s revisit the
doubleSum
function in which the compiler employed constant propagation to replacedebug
with0
in theif
statement:
int debug = 0;
//sums up all the elements in an array
int doubleSum(int *array, int length){
int i, total = 0;
for (i = 0; i < length; i++){
total += array[i];
if (0) { //debug replaced by constant propagation by compiler
printf("array[%d] is: %d\n", i, array[i]);
}
}
return 2 * total;
}
A compiler employing dataflow analysis recognizes that the if
statement always evaluates to false and that the printf
statement never executes. The compiler therefore eliminates the
if
statement and the call to printf
in the compiled executable. Another pass also eliminates the statement debug = 0
.
- Simplifying expressions
-
Some instructions are more expensive than others. For example, the
imul
andidiv
arithmetic instructions in assembly take a long time to execute. Compilers commonly attempt to reduce the number of expensive instructions by simplifying mathematical operations whenever possible. For example, in thedoubleSum
function, the compiler may replace the expression2 * total
withtotal + total
because the addition instruction is less expensive than multiplication:
//declaration of debug removed through dead-code elimination
//sums up all the elements in an array
int doubleSum(int *array, int length){
int i, total = 0;
for (i = 0; i < length; i++){
total += array[i];
//if statement removed through data-flow analysis
}
return total + total; //simplifying expression
}
Likewise, the compiler will transform code sequences with bit-shifting and other bitwise operators to simplify expressions. For example, the compiler may replace the expression total * 8
with total << 3
, or the expression total % 8
with total & 7
given that bitwise operations are performed with a single fast instruction.
What Compilers Cannot Always Do: Benefits of Learning Code Optimization
Given the benefits of optimizing compilers, it may not be immediately obvious why learning code optimization is useful. It may be tempting to think of the compiler as a magical black box that is "smart." At the end of the day, the compiler is a piece of software that performs a series of code transformations in an effort to speed up code. Compilers are also limited in the types of optimizations they can perform.
- Algorithmic Strength Reduction Is Impossible
-
The top reason for poor code performance is bad choices of data structures and algorithms. Compilers cannot magically fix these bad decisions. For example, a compiler will never optimize a program implementing bubble sort into one that implements quick sort. While the sophistication of compilers and their optimizations continues to improve, the quality of any individual compiler’s optimizations varies between platforms. The onus is therefore on the programmer to ensure that their code leverages the best algorithms and data structures.
- Compiler Optimization Flags Are Not Guaranteed to Make Code "Optimal" (or Consistent)
-
Increasing the level of compiler optimizations (e.g., from
-O2
to-O3
) may not always decrease the runtime of a program. Sometimes, the programmer may discover that updating the optimization flags from-O2
to-O3
slows down a program, or yields no performance increase at all. In other cases, a programmer may discover that a program compiled without the optimization flags seemingly yields no errors, whereas compiling it with-O2
or-O3
results in segmentation faults or other errors. These types of programming errors are especially difficult to debug, because GCC’s debug (-g
) flag is incompatible with its optimization (-O
) flags, as the transformations performed by compiler optimizations at the-O
levels interfere with the debugger’s ability to analyze the underlying code. The-g
flag is required by many common profiling tools, such as GDB and Valgrind.
One large reason for inconsistent behavior is that the C/C++ standard does not provide clear guidance for resolving undefined behavior. As a result, it is often up to the compiler to decide how to resolve ambiguity. Inconsistencies on how different optimization levels handle undefined behavior can cause answers to change. Consider the following example from John Regehr1:
int silly(int a) {
return (a + 1) > a;
}
Suppose that silly
was run with a = INT_MAX
. In this case, the computation a
results in integer overflow. However, the C/C++ standard does not define
how integer overflow should be handled by the compiler. In fact, compiling
the program with no optimizations causes the function to return 0, while
compiling it with
1-O3
optimizations results in the function returning 1.
In short, optimization flags should be used with caution, thoughtfully, and when necessary. Learning which optimization flags to employ can also help the programmer work with their compiler instead of against it.
The compiler is not required to handle undefined behavior
The |
- Pointers Can Prove Problematic
-
Recall that the compiler makes transformations that leave the fundamental behavior of the source program unchanged. If a transformation risks changing the behavior of the program, the compiler will not make the transformation. This is especially true in the case of memory aliasing where two different pointers point to the same address in memory. As an example, consider the following function
shiftAdd
that takes two integer pointers as its two parameters. The function multiplies the first number by 10 and adds the second number to it. So, if theshiftAdd
function were passed the integers 5 and 6, the result will be 56.
Unoptimized Version | Optimized Version |
---|---|
|
|
The shiftAddOpt
function optimizes the shiftAdd
function by removing an additional memory reference to a
, resulting in a smaller set of instructions in the compiled assembly.
However, the compiler will never make this optimization due to the risk of memory aliasing. To understand why, consider the following main
function:
int main(void){
int x = 5;
int y = 6;
shiftAdd(&x, &y); //should produce 56
printf("shiftAdd produces: %d\n", x);
x = 5; //reset x
shiftAddOpt(&x, &y); //should produce 56
printf("shiftAddOpt produces: %d\n", x);
return 0;
}
Compiling and running this program gives the expected output:
$ gcc -o shiftadd shiftadd.c $ ./shiftadd shiftAdd produces: 56 shiftAddOpt produces: 56
Suppose, instead, that the program were modified so that shiftAdd
now takes a pointer to x
as its two parameters:
int main(void){
int x = 5;
shiftAdd(&x, &x); //should produce 55
printf("shiftAdd produces: %d\n", x);
x = 5; //reset x
shiftAddOpt(&x, &x); //should produce 55
printf("shiftAddOpt produces: %d\n", x);
return 0;
}
The expected output is 55. However, recompiling and rerunning the updated code gives two different outputs:
$ gcc -o shiftadd shiftadd.c $ ./shiftadd shiftAdd produces: 100 shiftAddOpt produces: 55
Retracing through the shiftAdd
functions with the assumption that a
and b
are pointing to the same memory location reveals the issue. The multiplication
of a
by 10 in shiftAdd
updates x
to 50. Next, adding a
to b
in
shiftAdd
results in x
doubling to 100. The risk of memory aliasing reveals
that shiftAdd
and shiftAddOpt
are not in fact equivalent, though the
programmer may have intended them to be. To fix this issue, recognize that the
second parameter of shiftAdd
does not need to be passed in as a pointer.
Replacing the second parameter with an integer eliminates the risk of aliasing
and allows the compiler to optimize one function into the other:
Unoptimized (Fixed) Version | Optimized Version (Fixed) |
---|---|
|
|
Removing the unneeded memory reference allows the programmer to maintain the
readability of the original shiftAdd
function while enabling the compiler
to optimize the function.
Partnering with Your Compiler: A Sample Program
In the following sections, we concentrate on learning more about popular types of optimizations and discuss programming and profiling strategies to help make it easier for compilers to optimize our code. To guide our discussion, we will work to optimize the following (suboptimally written) program that attempts to find all the prime numbers between 2 and n (source code available at this link):
//helper function: checks to see if a number is prime
int isPrime(int x) {
int i;
for (i = 2; i < sqrt(x) + 1; i++) { //no prime number is less than 2
if (x % i == 0) { //if the number is divisible by i
return 0; //it is not prime
}
}
return 1; //otherwise it is prime
}
// finds the next prime
int getNextPrime(int prev) {
int next = prev + 1;
while (!isPrime(next)) { //while the number is not prime
next++; //increment and check again
}
return next;
}
// generates a sequence of primes
int genPrimeSequence(int *array, int limit) {
int i;
int len = limit;
if (len == 0) return 0;
array[0] = 2; //initialize the first number to 2
for (i = 1; i < len; i++) {
array[i] = getNextPrime(array[i-1]); //fill in the array
if (array[i] > limit) {
len = i;
return len;
}
}
return len;
}
int main(int argc, char **argv) {
//omitted for brevity
int *array = allocateArray(limit);
int length = genPrimeSequence(array, limit);
return 0;
}
Table 3 shows the timing results for producing the primes between 2 and 5,000,000 with the different optimization level flags using the following basic compilation command:
$ gcc -o optExample optExample.c -lm
Unoptimized | -O1 |
-O2 |
-O3 |
---|---|---|---|
3.86 |
2.32 |
2.14 |
2.15 |
The fastest observed time with optimization flags is approximately 2.14 seconds. Although using optimization flags does shave off more than a second from the runtime of this program, upping the optimization flags provides minimal improvement. In the next sections, we will discuss how we can modify our program to make it easier for the compiler to optimize.
References
-
John Regehr. "A Guide to Undefined Behavior in C and C++, Part 1". https://blog.regehr.org/archives/213
-
C FAQ. "comp.lang.c FAQ list: Question 11.33". http://c-faq.com/ansi/undef.html