4.7. Integer Byte Order

So far, this chapter has described several schemes for encoding numbers with bits, but it hasn’t mentioned how the values are organized in memory. For modern systems, the smallest addressable unit of memory is a byte, which consists of 8 bits. Consequently, to store a one-byte value (e.g., a variable of type char) starting at address X, you don’t really have any options — just store the byte at location X.

However, for multi-byte values (e.g., variables of type short or int), the hardware has more options for assigning a value’s bytes to memory addresses. For example, consider a two-byte short variable s whose bytes are labeled A (containing the high-order bits of s) and B (containing the low-order bits of s). When a system is asked to store a short like s at address X (i.e., in addresses X and X+1), it must define which byte of the variable (A or B) should occupy which address (X or X+1). Figure 1 shows the two options for storing s in memory:

In the first layout, byte A occupies address X, and byte B occupies address X+1.  In the other layout, their positions are reversed.
Figure 1. Two potential memory layouts for a two-byte short starting at memory address X.

The byte order (or endianness) of a system defines how its hardware assigns the bytes of a multi-byte variable to consecutive memory addresses. While byte order is rarely an issue for programs that only run on a single system, it might appear surprising if one of your programs attempts to print bytes one-at-a-time or if you’re examining variables with a debugger.

For example, consider the following program:

#include <stdio.h>

int main(int argc, char **argv) {
    // Initialize a four-byte integer with easily-distinguishable byte values
    int value = 0xAABBCCDD;

    // Initialize a character pointer to the address of the integer.
    char *p = (char *) &value;

    // For each byte in the integer, print its memory address and value.
    int i;
    for (i = 0; i < sizeof(value); i++) {
        printf("Address: %p, Value: %02hhX\n", p, *p);
        p += 1;
    }

    return 0;
}

The program above allocates a four-byte integer and initializes the bytes, in order from most to least significant, to the hexadecimal values 0xAA, 0xBB, 0xCC, and 0xDD. It then prints the bytes one at a time starting from the base address of the integer. You’d be forgiven for expecting the bytes to print in alphabetical order. However, commonly used CPU architectures (i.e., x86 and most ARM hardware) print the bytes in reverse order when executing the example program:

$ ./a.out
Address: 0x7ffc0a234928, Value: DD
Address: 0x7ffc0a234929, Value: CC
Address: 0x7ffc0a23492a, Value: BB
Address: 0x7ffc0a23492b, Value: AA

x86 CPUs store integers in a little-endian format — from the least significant byte ("little end") to the most significant byte in consecutive addresses. Other big-endian CPU architectures store multi-byte integers in the opposite order. Figure Figure 2 depicts a four-byte integer in the (a) big-endian and (b) little-endian layouts.

In the big-endian format, byte AA occupies position X, and the bytes proceed in alphabetical order in consecutive addresses.  In the little-endian format, byte DD occupies position X, and the bytes proceed in reverse alphabetical order.
Figure 2. The memory layout of a four-byte integer in the (a) big-endian and (b) little-endian formats.

The seemingly strange "endian" terminology originates from Jonathan Swift’s satirical novel Gulliver’s Travels (1726)1. In the story, Gulliver finds himself among two empires of six-inch-tall people who are fighting a war over the proper method for breaking eggs. The "big-endian" empire of Blefuscu cracks the large end of their eggs, whereas people in the "little-endian" empire of Lilliput crack the small end.

In the computing world, whether a system is big-endian or little-endian typically only affects programs that communicate across machines (e.g., over a network). When communicating data between systems, both systems must agree on the byte order for the receiver to properly interpret the value. In 1980, Danny Cohen authored a note to the Internet Engineering Task Force (IETF) titled On Holy Wars and a Plea for Peace 2. In that note, Cohen adopts Swift’s "endian" terminology and suggests that the IETF adopts a standard byte order for network transmissions. The IETF eventually adopted big-endian as the "network byte order" standard.

The C language provides two libraries that allow a program to reorder an integer’s bytes3,4 for communication purposes.

4.7.1. References