4.1. Number Bases and Unsigned Integers
Having seen that binary sequences can be interpreted in all sorts of nonnumerical ways, let’s turn our attention to numbers. Specifically, we’ll start with unsigned numbers, which can be interpreted as zero or positive, but they can never be negative (they have no sign).
4.1.1. Decimal Numbers
Rather than starting with binary, let’s first examine a number system we’re already comfortable using, the decimal number system, which uses a base of 10. Base 10 implies two important properties for the interpretation and representation of decimal values.
-
Any individual digit in a base 10 number stores one of 10 unique values (0-9). To store a value larger than 9, the value must carry to an additional digit to the left. For example, if one digit starts at its maximum value (9) and we add 1 to it, the result requires two digits (9 + 1 = 10). The same pattern holds for any digit, regardless of its position within a number (e.g., 5080 + 20 = 5100).
-
The position of each digit in the number determines how important that digit is to the overall value of the number. Labeling the digits from right to left as d0, d1, d2, etc., each successive digit contributes a factor of ten more than the next. For example, take the value 8425 (Figure 1).
For the example value 8425, the 5 in the "ones" place contributes 5 (5 × 100). The 2 in the "tens" place contributes 20 (2 × 101). The 4 in the "hundreds" place contributes 400 (4 × 102), and, finally, the 8 in the "thousands" place contributes 8000 (8 × 103). More formally, one could express 8425 as
This pattern of increasing exponents applied to a base of 10 is the reason why it’s called a base 10 number system. Assigning position numbers to digits from right to left starting with d0 implies that each digit di contributes 10i to the overall value. Thus, the overall value of any N-digit decimal number can be expressed as:
Fortunately, as we’ll soon see, a very similar pattern applies to other number systems.
Distinguishing Number Bases
Now that we’re about to introduce a second number system, one potential problem is a lack of clarity regarding how to interpret a number. For example, consider the value 1000. It’s not immediately obvious whether you should interpret that number as a decimal value (i.e., one thousand) or a binary value (i.e., eight, for reasons explained soon). To help clarify, the remainder of this chapter will explicitly attach a prefix to all nondecimal numbers. We’ll soon introduce binary, for which the prefix is 0b, and hexadecimal, which uses a prefix of 0x. Therefore, if you see 1000, you should assume it’s a decimal "one thousand", and if you see 0b1000, you should interpret it as a binary number, in this case the value "eight". |
4.1.2. Unsigned Binary Numbers
While you may never have considered the specific formula describing decimal numbers as powers of 10, the concept of { ones, tens, hundreds, etc. } places should hopefully feel comfortable. Luckily, similar terminology applies to other number systems, like binary. Of course, the base is different in other number systems, so each digit position contributes a different amount to its numerical value.
A binary number system uses a base of 2 instead of decimal’s 10. Analyzing it the same way that we just did for decimal reveals several parallels (with 2 substituted for 10):
-
Any individual bit in a base 2 number stores one of two unique values (0 or 1). To store a value larger than 1, the binary encoding must carry to an additional bit to the left. For example, if one bit starts at its maximum value (1) and we add 1 to it, the result requires two bits (1 + 1 = 0b10). The same pattern holds for any bit, regardless of its position within a number (e.g., 0b100100 + 0b100 = 0b101000).
-
The position of each bit in the number determines how important that bit is to the numerical value of the number. Labeling the digits from right to left as d0, d1, d2, etc., each successive bit contributes a factor of two more than the next.
The first point implies that counting in binary follows the same pattern as decimal: by simply enumerating the values and adding digits (bits). Because this section focuses on unsigned numbers (zero and positives only), it’s natural to start counting from zero. Table 1 shows how to count the first few natural numbers in binary. As you can see from the table, counting in binary quickly increases the number of digits. Intuitively, this growth makes sense, since each binary digit (two possible values) represents less information than a decimal digit (10 possible values).
Binary value | Decimal value |
---|---|
0 |
0 |
1 |
1 |
10 |
2 |
11 |
3 |
100 |
4 |
101 |
5 |
… |
… |
The second point about labeling digits looks really familiar! In fact, it’s so similar to decimal that it leads to a nearly identical formula for interpreting a binary number. Simply replace the 10 at the base of each exponent with a 2:
Applying this formula yields the unsigned interpretation of any binary number. For example, take 0b1000:
Here’s a longer one-byte example, 0b10110100:
4.1.3. Hexadecimal
Thus far, we’ve examined two number systems, decimal and binary. Decimal is notable due to its comfort for humans, whereas binary matches the way data is stored in hardware. It’s important to note that they are equivalent in their expressive power. That is, there’s no number you can represent in one system that you can’t represent in the other. Given their equivalence, it may surprise you that we’re going to discuss one more number system: the base 16 hexadecimal system.
With two perfectly good number systems, you may wonder why we need another. The answer is primarily convenience. As shown in Table 1, binary bit sequences quickly grow to a large number of digits. Humans tend to have a tough time making sense of long sequences containing only 0’s and 1’s. And whereas decimal is more compact, its base of 10 is a mismatch with binary’s base 2.
Decimal doesn’t easily capture the range that can be expressed using a fixed number of bits. For example, suppose that an old computer uses 16-bit memory addresses. Its valid addresses range from 0b0000000000000000 to 0b1111111111111111. Represented in decimal, the addresses range from 0 to 65535. Clearly, the decimal representations are more compact than the long binary sequences, but unless you memorize their conversions, it’s more difficult to reason about the decimal numbers. Both problems only get worse on modern devices, which use 32- or 64-bit addresses!
These long bit sequences are where hexadecimal’s base 16 shines. The large base allows each digit to represent enough information for hexadecimal numbers to be compact. Furthermore, because the base is itself a power of two (24 = 16), it’s easy to map hexadecimal to binary, and vice versa. For the sake of completeness, let’s analyze hexadecimal in the same way as decimal and binary:
-
Any individual digit in a base 16 number stores one of 16 unique values. Having more than 10 values presents a new challenge for hexadecimal — traditional base 10 digits stop at a maximum value of 9. By convention, hexadecimal uses letters to represent values larger than 9, with A for 10, B for 11, up to F for 15. Like the other systems, to store a value larger than 15, the number must carry to an additional digit to the left. For example, if one digit starts at its maximum value (F) and we add 1 to it, the result requires two digits (0xF + 0x1 = 0x10; note that we use 0x to indicate hexadecimal numbers).
-
The position of each digit in the number determines how important that digit is to the numerical value of the number. Labeling the digits from right to left as d0, d1, d2, etc., each successive digit contributes a factor of 16 more than the next.
Unsurprisingly, the same trusty formula for interpreting a number applies to hexadecimal, with 16 as the base:
For example, to determine the decimal value of 0x23C8:
Hexadecimal Misconception
You may not encounter hexadecimal numbers frequently as you’re first learning
about systems programming. In fact, the only context where you’re likely to
find them is in representing memory addresses. For example, if you print the
address of a variable using the Many students often begin to equate memory addresses (e.g., C pointer variables) with hexadecimal. While you may get used to seeing addresses represented that way, keep in mind that they are still stored using binary in the hardware, just like all other data! |
4.1.4. Storage Limitations
Conceptually, there are infinitely many unsigned integers. In practice, a programmer must choose how many bits to dedicate to a variable prior to storing it, for a variety of reasons:
-
Before storing a value, a program must allocate storage space for it. In C, declaring a variable tells the compiler how much memory it needs based on its type.
-
Hardware storage devices have finite capacity. Whereas a system’s main memory is typically large and unlikely to be a limiting factor, storage locations inside the CPU that are used as temporary "scratch space" (i.e., registers) are more constrained. A CPU uses registers that are limited to its word size (typically 32 or 64 bits, depending on the CPU architecture).
-
Programs often move data from one storage device to another (e.g., between CPU registers and main memory). As values get larger, storage devices need more wires to communicate signals between them. Hence, expanding storage increases the complexity of the hardware and leaves less physical space for other components.
The number of bits used to store an integer dictates the range of its representable values. Figure 2 depicts how we might conceptualize infinite and finite unsigned integer storage spaces.
Attempting to store a larger value to a variable than the variable’s size allows is known as integer overflow. This chapter defers the details of overflow to a later section. For now, think of it like a car’s odometer that "rolls over" back to zero if it attempts to increase beyond its maximum value. Similarly, subtracting one from zero yields the maximum value.
At this point, a natural question to ask about unsigned binary is "What’s the largest positive value that N bits can store?" In other words, given a sequence of N bits that are all 1, what value does the sequence represent? Reasoning about this question informally, the analysis in the previous section shows that N bits yield 2N unique bit sequences. Since one of those sequences must represent the number 0, that leaves 2N - 1 positive values ranging from 1 to 2N - 1. Thus, the maximum value for an unsigned binary number of N bits must be 2N - 1.
For example, 8 bits provide 28 = 256 unique sequences. One of those sequences, 0b00000000, is reserved for 0, leaving 255 sequences for storing positive values. Therefore, an 8-bit variable represents the positive values 1 through 255, the largest of which is 255.