14.8. Summary

This chapter gives an overview of multicore processors and how to program them. Specifically, we cover the POSIX Threads (or Pthreads) library and how to use it to create correct multi-threaded programs that speed up a single-threaded program’s performance. Libraries like POSIX and OpenMP utilizes the shared memory model of communication, as threads share data in a common memory space.

Key take aways

Threads are the fundamental unit of concurrent programs

To parallelize a serial program, programmers utilize light-weight constructs known as threads. For a particular multi-threaded process, each thread has its own allocation of stack memory, but share the program data, heap and instructions of the process. Like processes, threads run non-deterministically on the CPU (i.e. the order of execution changes between runs, and which thread is assigned to what core is left up to the operating system).

Synchronization constructs ensure that programs work correctly

A consequence of shared memory is that threads can accidentally overwrite data residing in shared memory. A race condition can occur whenever two operations incorrectly update a shared value. When that shared value is data, a special type of race condition called a data race can arise. Synchronization constructs (i.e. mutexes, semaphores, etc.) help ensure program correctness by ensuring that threads execute one at a time when updating shared variables.

Be mindful when using synchronization constructs

Synchronization inherently introduces points of serial computation in an otherwise parallel program. It is therefore important to be aware of how one uses synchronization concepts. The set of operations that must run atomically is referred to as a critical section. If a critical section is too big, the threads will execute serially, yielding no improvement in run-time. Use synchronization constructs sloppily, and situations like deadlock may inadvertently arise. A good strategy is to have threads employ local variables as much as possible and update shared variables only when necessary.

Not all components of a program are parallelizable

Some programs necessarily have large serial components which can hinder a multi-threaded program’s performance on multiple cores (e.g. Amdahl’s Law). Even when a high percentage of a program is parallelizable, speed up is rarely linear. Readers are also encouraged to look at other metrics such as efficiency and scalability when ascertaining the performance of their programs.

Further Reading

This chapter is meant to give a taste of concurrency topics with threads; it is by no means exhaustive. To learn more about programming with POSIX threads and OpenMP check out the excellent tutorials on Pthreads and OpenMP by Blaise Barney from Lawrence Livermore National Labs. For automated tools for debugging parallel programs, readers are encouraged to check out the Helgrind and DRD Valgrind tools.

In the last chapter of the book, we give a high-level overview of other common parallel architectures and how to program them. Read on to learn more.