9.1. Diving into Assembly: Basics

For a first look at assembly, we modify the adder() function from the beginning of the chapter to simplify its behavior. The modified function (adder2()) is shown below:

#include <stdio.h>

//adds two to an integer and returns the result
int adder2(int a) {
    return a + 2;
}

int main(){
    int x = 40;
    x = adder2(x);
    printf("x is: %d\n", x);
    return 0;
}

To compile this code, use the following command:

$ gcc -o adder adder.c

Next, let’s view the corresponding assembly of this code by using the objdump command:

$ objdump -d adder > output
$ less output

Search for the code snippet associated with adder2 by typing /adder while examining the file output using less. The section associated with adder should look similar to the following:

Assembly output for the adder2 function
0000000000000724 <adder2>:
 724:   d10043ff        sub     sp, sp, #0x10
 728:   b9000fe0        str     w0, [sp, #12]
 72c:   b9400fe0        ldr     w0, [sp, #12]
 730:   11000800        add     w0, w0, #0x2
 734:   910043ff        add     sp, sp, #0x10
 738:   d65f03c0        ret

Don’t worry if you don’t understand what’s going on just yet. We will cover assembly in greater detail in future sections. For now, let’s study the structure of these individual instructions.

Each line above contains the instruction’s 64-bit address in program memory (shortened to the lowest 3 positions to save space), the bytes corresponding to the instruction, and the plain-text representation of the instruction itself. For example d10043ff is the machine code representation of the instruction sub sp, sp, #0x10 and the instruction occurs at address 0x724 in code memory. Note that 0x724 is an abbreviation of the full 64-bit address associated with the sub sp, sp #0x10 instruction; objdump omits the leading zeroes to help with readability.

It is important to note that a single line of C code often translates to multiple instructions in assembly. The operation a + 2 is represented by the three instructions at code memory addresses 0x728 through 0x730: str w0, [sp, #12], ldr w0, [sp, #12] and add w0, w0, #0x2.

Your assembly may look different!

If you are compiling your code along with us, you may notice that some of your assembly examples look different. The precise assembly instructions outputted depends on the generating compiler’s version, the precise architecture, and the underlying operating system. Most of the assembly examples in this chapter were generated on a Raspberry Pi 3B+ running the 64-bit Ubuntu Mate operating system and using gcc. If you use a different operating system, a different compiler, or a different Raspberry Pi or SBC, your assembly output may vary.

In the examples that follow, we do not use any optimization flags. For example, we compile any example file (e.g. example.c) using the command gcc -o example example.c. Consequently, there are many seemingly redundant instructions in the examples that follow. Remember that the compiler is not "smart" — it simply follows a series of rules to translate human-readable code into machine language. During this translation process, it is not uncommon for some redundancy to occur. Optimizing compilers remove many of these redundancies during optimization, which is covered in a later chapter.

9.1.1. Registers

Recall that a register is a word-sized storage unit located directly on the CPU. The ARMv8 CPU has 31 total registers for storing general-purpose 64-bit data, which are numbered x0 to x30. While a program may interpret a register’s contents as integers or as addresses, the register itself makes no distinction. Programs can read from or write to all 31 registers.

The ARMv8-A ISA also specifies special purpose registers. The first two worth noting are the stack pointer register (SP) and the program counter register (PC). The compiler reserves the SP register for maintaining the layout of the program stack. The PC register points to the next instruction to be executed by the CPU. Unlike the 32 registers mentioned previously, programs cannot write directly to the PC register. Next, the zero register zr permanently stores the value 0, and is only useful as a source register.

9.1.2. Advanced register notation

Since ARMv8-A is an extension of the 32-bit ARMv7-A architecture, the A64 ISA provides mechanisms to access the lower 32 bits of each of the general purpose registers, or w0 through w30. Figure 1 shows a sample layout of register x0. If 32-bit data is stored in component register w0, then the upper 32 bits of the register become inaccessible, and are zeroed out.

32-bit component register w0 and its relation to the 64-bit x0 register
Figure 1. Component register layout of register %x0.
The compiler may choose component registers depending on type

When reading assembly code, keep in mind that the compiler typically uses the 64-bit registers when dealing with 64-bit values (e.g. pointers or long types) and the 32-bit component registers when dealing with 32-bit types (e.g. int). In A64, it is very common to see 32-bit component registers intermixed with the full 64-bit registers. For example, in the adder2() function shown above, the compiler references component register w0 instead of x0 since int types typically take up 32 bits (4 bytes) of space on 64-bit systems. If the adder2() function had a long parameter instead of a int parameter, the compiler would store a in register x0 instead of component register w0.

For readers previously familiar with the A32 ISA, it is important to note that the 32-bit general-purpose registers r0 to r12 from the A32 ISA map to the A64 component registers w0 to w12. The A64 ISA more than doubles the number of available registers.

9.1.3. Instruction Structure

Each instruction consists of an operation code (or opcode) that specifies what it does, and one or more operands that tells the instruction how to do it. For most A64 instructions, the following format is typically used:

opcode D, O1, O2

Where opcode is the operation code, D is the destination register, O1 is the first operand, and O2 is a second operand. For example, the instruction add w0, w0, #0x2 has the opcode add, a destination register of w0, and the two operands w0 and #0x2. Note that operands can have many forms, including:

  • Constant (literal) values are preceded by the # sign. For example in the instruction add w0, w0, #0x2, the operand #0x2 is a literal value that corresponds to the hexadecimal value 0x2.

  • Register forms refer to individual registers. The instruction add sp, sp, #0x10 uses the stack pointer register sp to designate the destination register and the first of the two operands needed for the add instruction.

  • Memory forms correspond to some value inside of main memory (RAM) and are commonly used for address lookups. Memory address forms can contain a combination of registers and constant values. For example, in the instruction str w0, [sp, #12], the operand [sp, #12] is an example of a memory form. It loosely translates to "add 12 to the value in register sp, and then perform a memory lookup on the corresponding address." If this sounds like a pointer dereference, that is because it is!

9.1.4. An example with operands

The best way to explain operands in detail is to present a quick example. Suppose memory contains the following values:

Address Value

0x804

0xCA

0x808

0xFD

0x80c

0x12

0x810

0x1E

Let’s also assume the following registers contain the values:

Register Value

x0

0x804

x1

0xC

x2

0x2

w3

0x4

Then the operands in Table 1 evaluate to the values shown below. Each row of the table matches an operand with its form (e.g. constant, register, memory), how it is translated, and its value:

Table 1. Example operands
Operand Form Translation Value

x0

Register

x0

0x804

[x0]

Memory

*(0x804)

0xCA

#0x804

Constant

0x804

0x804

[x0, #8]

Memory

*(x0 + 8) or *(0x80c)

0x12

[x0, x1]

Memory

*(x0 + x1) or *(0x810)

0x1E

[x0, w3, SXTW]

(Sign-Extend) Memory

*(x0 + SignExtend(w3)) or *(0x808)

0xFD

[x0, x2, LSL, #2]

Scaled Memory

*(x0 + (x2 << 2)) or *(0x80c)

0x12

[x0, w3, SXTW, #1]

(Sign-Extend) Scaled Memory

*(x0 + SignExtend(w3 << 1)) or *(0x80c)

0x12

In Table 1, the notation x0 indicates the value stored in 64-bit register x0, while w3 indicates a 32-bit value stored in component register w3. The operand [x0] indicates that the value inside x0 should be treated as an address, and to dereference (look up) the value at that address. Therefore, the operand [x0] corresponds to *(0x804) or value 0xCA. An operation on a 32-bit register can be combined with a 64-bit register using the sign-extend word (SXTW) instruction. So, [x0, w3, SXTW] sign extends w3 into a 64-bit value before adding it to x0 and performing a memory lookup. Lastly, scaled memory types enables the calculation of offsets through the use of a left shift.

A few important notes before continuing. While Table 1 shows many valid operand forms, not all forms can be used interchangeably in all circumstances.

Specifically:

  • Data cannot be read or written to memory directly; instead, ARM follows a load/store model, which requires data to be operated on in registers. Thus, data must be transferred to memory before being operated on, and transferred back to memory after the operations are complete.

  • The destination component of an instruction must always be a register.

Table 1 is provided as a reference; however, digesting key operand forms will help improve the reader’s speed in parsing assembly language.