This chapter explored the characteristics of computer storage devices and their trade-offs with respect to key measures like access latency, storage capacity, transfer latency, and cost. Because devices embody many disparate design and performance trade-offs, they naturally form a memory hierarchy, which arranges them according to their capacity and access time. At the top of the hierarchy, primary storage devices like CPU caches and main memory quickly provide data directly to the CPU, but their capacity is limited. Lower in the hierarchy, secondary storage devices like solid-state drives and hard disks offer dense bulk storage at the cost of performance.
Because modern systems require both high capacity and good performance, system designers build computers with multiple forms of storage. Crucially, the system must manage which storage device holds any particular chunk of data. Systems aim to store data that’s being actively used in faster storage devices, and they relegate infrequently used data to slower storage devices.
To determine which data is being used, systems rely on program data access patterns known as locality. Programs exhibit two important types of locality:
Temporal Locality: Programs tend to access the same data repeatedly over time.
Spatial Locality: Programs tend to access data that is nearby other, previously accessed data.
Locality serves as the basis for CPU caches, which store a small subset of main memory in fast storage directly on the CPU chip. When a program attempts to access main memory, the CPU first checks for the data in the cache; if it finds the data there, it can avoid the more costly trip to main memory.
When a program issues a request to read or write memory, it provides the address of the memory location that it wants to access. CPU caches use three sections of the bits in a memory address to identify which subset of main memory a cache line stores:
The middle index bits of an address map the address to a storage location in the cache.
The high-order tag bits uniquely identify which subset of memory the cache location stores.
The low-order offset bits identify which bytes of stored data the program wants to access.
Finally, this chapter concluded by demonstrating how the Cachegrind tool can enable cache performance profiling for a running program. Cachegrind simulates a program’s interaction with the cache hierarchy and collects statistics about a program’s use of the cache (for example, the hit and miss rates).