12.4. Key Takeaways and Summary

Our short (and perhaps frustrating) journey into code optimization should convey one very important message to the reader: if you are thinking about manually optimizing your code, think carefully about what is worth your time and what should be left to the compiler. Below are some important tips to consider when looking to improve code performance:

Choose good data structures and algorithms

There is no substitute for using proper algorithms and data structures; it is often the top reason for poor performance in code. For example, the famous Sieve of Eratosthenes algorithm is a much more efficient way to generate prime numbers than our custom algorithm in optExample, and yields a significant improvement in performance. The listing below shows the time needed to generate all prime numbers between 2 and 5 million using an implementation of the sieve:

$ gcc -o genPrimes genPrimes.c
$ ./genPrimes 5000000
Found 348513 primes (0.122245 s)

The sieve algorithm requires only 0.12 seconds to find all the prime numbers between 2 and 5 million, compared to the 1.46 seconds it takes optExample2 to generate the same set of primes with the -O3 optimization flags turned on (12x improvement). The implementation of the sieve algorithm is left as an exercise to the reader; however, it should be clear that choosing a better algorithm upfront would have saved hours of tedious optimization effort. Our example demonstrates why a knowledge of data structures and algorithms is foundational for computer scientists.

Use standard library functions whenever possible

Don’t reinvent the wheel. If in the course of programming you need a function that should do something very standard (e.g. find the absolute value, find the max or min of a list of numbers), stop and check to see if the function already exists as part of the higher level language’s standard library. Functions in the standard libraries are well tested and tend to be optimized for performance. For example, if a reader manually implemented their own version of the sqrt() function, the compiler may not have known to automatically replace the function call with the fsqrt instruction.

Optimize based on data and not on feelings

If after choosing the best data structures and algorithms and employing standard library functions, additional improvements in performance are required, enlist the help of a good code profiler like Valgrind. Optimization should never be based on gut feelings. Concentrating too much on what one feels should be optimized (without the data to back up the thought) often leads to wasted time.

Split complex code into multiple functions

Manually in-lining code usually does not result in a sizable performance gain over what modern compilers can achieve. Instead, make it easier for your compiler to help optimize for you. Compilers have an easier time optimizing shorter code segments. Splitting complex operations into multiple functions simultaneously increases code readability and makes it easier for a compiler to optimize. Check to see if your compiler attempts in-lining by default or has a separate flag to attempt in-lining code. It is better to let your compiler perform in-lining instead of manually doing it yourself.

Prioritize code readability

In many applications today, readability is king. The truth is that code is read more often than it is written. Many companies spend considerable time training their software engineers to write code in a very particular way to maximize readability. If optimizing your code results in a noticeable hit to code readability, it is important to check if the gained performance improvement is worth the hit. For example, many compilers today have optimization flags that enable loop unrolling. Programmers should always use available optimization flags for loop unrolling instead of trying to manually unroll loops, which can lead to a significant hit in code readability. Reducing code readability often increases the likelihood that bugs are inadvertently introduced into code, which can lead to security vulnerabilities.

Pay attention to memory use

A program’s memory usage often has a bigger impact on a program’s execution time than the number of instructions that the program executes. The loop interchange example exemplifies this point. In both cases, the loop executes the same number of instructions. However, the ordering of the loops have a significant impact on memory access and locality. Remember to also explore memory profiling tools like massif and cachegrind when attempting to optimize a program.

Compilers are constantly improving

Compiler writers continually update compilers to perform more sophisticated optimizations safely. For example, GCC switched to the link: Static Single Assignment (SSA) form starting in version 4.0, which significantly improved the effects of some of its optimizations. The GRAPHITE branch of the GCC code base implements the polyhedral model, which allows the compiler to perform more complex types of loop transformations. As compilers get more sophisticated, the benefits of manual compilation significantly reduce.