4.7. Integer Byte Order
So far, this chapter has described several schemes for encoding numbers with
bits, but it hasn’t mentioned how the values are organized in memory. For
modern systems, the smallest addressable unit of memory is a byte, which
consists of eight bits. Consequently, to store a one-byte value (e.g., a
variable of type char
) starting at address X, you don’t really have any
options — just store the byte at location X.
However, for multibyte values (e.g., variables of type short
or int
), the
hardware has more options for assigning a value’s bytes to memory addresses.
For example, consider a two-byte short
variable s
whose bytes are labeled A
(containing the high-order bits of s
) and B (containing the low-order bits of
s
). When a system is asked to store a short
like s
at address X (i.e.,
in addresses X and X+1), it must define which byte of the variable (A or B)
should occupy which address (X or X+1). Figure 1 shows the two
options for storing s
in memory.
The byte order (or endianness) of a system defines how its hardware assigns the bytes of a multibyte variable to consecutive memory addresses. Although byte order is rarely an issue for programs that only run on a single system, it might appear surprising if one of your programs attempts to print bytes one at a time or if you’re examining variables with a debugger.
For example, consider the following program:
#include <stdio.h>
int main(int argc, char **argv) {
// Initialize a four-byte integer with easily distinguishable byte values
int value = 0xAABBCCDD;
// Initialize a character pointer to the address of the integer.
char *p = (char *) &value;
// For each byte in the integer, print its memory address and value.
int i;
for (i = 0; i < sizeof(value); i++) {
printf("Address: %p, Value: %02hhX\n", p, *p);
p += 1;
}
return 0;
}
This program allocates a four-byte integer and initializes the bytes, in
order from most to least significant, to the hexadecimal values 0xAA
, 0xBB
,
0xCC
, and 0xDD
. It then prints the bytes one at a time starting from the
base address of the integer. You’d be forgiven for expecting the bytes to
print in alphabetical order. However, commonly used CPU architectures (i.e.,
x86 and most ARM hardware) print the bytes in reverse order when executing the example
program:
$ ./a.out Address: 0x7ffc0a234928, Value: DD Address: 0x7ffc0a234929, Value: CC Address: 0x7ffc0a23492a, Value: BB Address: 0x7ffc0a23492b, Value: AA
x86 CPUs store integers in a little-endian format — from the least-significant byte ("little end") to the most-significant byte in consecutive addresses. Other big-endian CPU architectures store multibyte integers in the opposite order. Figure Figure 2 depicts a four-byte integer in the (a) big-endian and (b) little-endian layouts.
The seemingly strange "endian" terminology originates from Jonathan Swift’s satirical novel Gulliver’s Travels (1726)1. In the story, Gulliver finds himself among two empires of six-inch-tall people who are fighting a war over the proper method for breaking eggs. The "big-endian" empire of Blefuscu cracks the large end of their eggs, whereas people in the "little-endian" empire of Lilliput crack the small end.
In the computing world, whether a system is big-endian or little-endian typically affects only programs that communicate across machines (e.g., over a network). When communicating data between systems, both systems must agree on the byte order for the receiver to properly interpret the value. In 1980, Danny Cohen authored a note to the Internet Engineering Task Force (IETF) titled On Holy Wars and a Plea for Peace 2. In that note, Cohen adopts Swift’s "endian" terminology and suggests that the IETF adopts a standard byte order for network transmissions. The IETF eventually adopted big-endian as the "network byte order" standard.
The C language provides two libraries that allow a program to reorder an integer’s bytes3,4 for communication purposes.
4.7.1. References
-
Jonathan Swift. Gulliver’s Travels. http://www.gutenberg.org/ebooks/829
-
Danny Cohen. On Holy Wars and a Plea for Peace. https://www.ietf.org/rfc/ien/ien137.txt