In the previous chapter, we discussed shared memory parallelism and multi-threaded programming. In this chapter, we introduce other parallel programming models and languages for different classes of architecture. Namely, we introduce parallelism for: hardware accelerators focusing on graphics processing units (GPUs) and general purpose computing on GPUs (GPGPU computing), using CUDA as an example; distributed memory systems and message passing, using MPI as an example; and Cloud computing, using MapReduce and Apache Spark as examples.
A Whole New World: Flynn’s Taxonomy of Architecture
Flynn’s Taxonomy is commonly used to describe the ecosystem of modern computing architecture (Figure 1).
The horizontal axis refers to the data stream, while the vertical axis refers to the instruction stream. A stream in this context is a flow of data or instructions. A single stream issues a one element per time unit, similar to a queue. In contrast, multiple streams typically issue many elements per time unit (think of multiple queues). Thus, a single instruction stream (SI) issues a single instruction per time unit, while a multiple instruction stream (MI) issues many instructions per time unit. Likewise a single data stream (SD) issues one data element per time unit, while a multiple data stream (MD) issues many data elements per time unit.
A processor can be classified into one of four categories based on the types of streams it employs:
SISD On a SISD machine, there is a single control unit processing a single stream of instructions, allowing it to execute only one instruction at a time. Likewise, the processor can only process a single stream of data, or process one data unit at a time. Most commercially-available processors prior to the mid-2000s were SISD machines.
MISD systems (Multiple Instruction/Single Data) have multiple instruction units performing on a single data stream. MISD systems were typically designed for incorporating fault tolerance in mission-critical systems, such as the flight control programs for NASA shuttles. That said, MISD machines are rarely used in practice anymore.
SIMD systems (Single Instruction/Multiple Data) execute the same instruction on multiple data simultaneously and in lock-step fashion. During a "Lock-step" execution, all instructions are placed into a queue, while data is distributed amongst different compute units. During execution, each compute unit executes the first instruction in the queue simultaneously, before simultaneously executing the next instruction in the queue, and then the next, and so forth. The most well known example of the SIMD architecture is the Graphics Processing Unit (GPU). Early supercomputers also followed the SIMD architecture. We discuss GPUs more in the next section.
MIMD systems (Multiple Instruction/Multiple Data) represent the most widely used architecture class. They are extremely flexible and have the ability to work on multiple instructions or multiple data streams. Since nearly all modern computers use multicore CPUs, most are classified as MIMD machines. We discuss another class of MIMD systems, distributed memory systems, in Section 15.2.