14.8. Summary

This chapter provided an overview of multicore processors and how to program them. Specifically, we cover the POSIX threads (or Pthreads) library and how to use it to create correct multithreaded programs that speed up a single-threaded program’s performance. Libraries like POSIX and OpenMP utilize the shared memory model of communication, as threads share data in a common memory space.

Key Takeaways

Threads are the fundamental unit of concurrent programs

To parallelize a serial program, programmers utilize lightweight constructs known as threads. For a particular multithreaded process, each thread has its own allocation of stack memory, but shares the program data, heap and instructions of the process. Like processes, threads run nondeterministically on the CPU (i.e., the order of execution changes between runs, and which thread is assigned to which core is left up to the operating system).

Synchronization constructs ensure that programs work correctly

A consequence of shared memory is that threads can accidentally overwrite data residing in shared memory. A race condition can occur whenever two operations incorrectly update a shared value. When that shared value is data, a special type of race condition called a data race can arise. Synchronization constructs (mutexes, semaphores, etc.) help to guarantee program correctness by ensuring that threads execute one at a time when updating shared variables.

Be mindful when using synchronization constructs

Synchronization inherently introduces points of serial computation in an otherwise parallel program. It is therefore important to be aware of how one uses synchronization concepts. The set of operations that must run atomically is referred to as a critical section. If a critical section is too big, the threads will execute serially, yielding no improvement in runtime. Use synchronization constructs sloppily, and situations like deadlock may inadvertently arise. A good strategy is to have threads employ local variables as much as possible and update shared variables only when necessary.

Not all components of a program are parallelizable

Some programs necessarily have large serial components that can hinder a multithreaded program’s performance on multiple cores (e.g., Amdahl’s Law). Even when a high percentage of a program is parallelizable, speedup is rarely linear. Readers are also encouraged to look at other metrics such as efficiency and scalability when ascertaining the performance of their programs.

Further Reading

This chapter is meant to give a taste of concurrency topics with threads; it is by no means exhaustive. To learn more about programming with POSIX threads and OpenMP, check out the excellent tutorials on Pthreads and OpenMP by Blaise Barney from Lawrence Livermore National Labs. For automated tools for debugging parallel programs, readers are encouraged to check out the Helgrind and DRD Valgrind tools.

In the final chapter of the book, we give a high-level overview of other common parallel architectures and how to program them. Read on to learn more.