9.2. Common Instructions
In this section, we discuss several common ARM assembly instructions. Table 1 lists the most foundational instructions in ARM assembly:
Instruction | Translation |
---|---|
|
D = *(addr) (i.e., loads the value in memory into register D) |
|
*(addr) = S (i.e., stores S into memory location *(addr) ) |
|
D = S (i.e, copies value of S into D) |
|
D = O1 + O2 (adds O1 to O2 and stores result in D) |
|
D = O1 - O2 (subtracts O2 from O1 and stores result in D) |
Therefore, the sequence of instructions:
str w0, [sp, #12] ldr w0, [sp, #12] add w0, w0, #0x2
Translate to:
-
Store the value in register
w0
in the memory location specified bysp + 12
(or*(sp + 12)
). -
Load the value from memory location
sp + 12
(or*(sp + 12)
) into registerw0
-
Add the value
#0x2
to registerw0
, and store the result in registerw0
(orw0 = w0 + 0x2
).
The add
and sub
instructions shown in Table 1 also assist with maintaining
the organization of the program stack (i.e. the call stack). Recall that the
stack pointer (sp
) is reserved by the compiler for call stack management.
Recall also from our earlier discussion on
program memory that
the call stack typically stores local variables and parameters and
helps the program track its own execution (see Figure 1). On ARM
systems, the execution stack grows toward lower addresses. Like all stack
data structures, operations occur at the "top" of the call stack; sp
therefore "points" to the top of the stack, and its value is the address of
top of the stack.

The ldp
and stp
instructions shown in Table 2 assist with moving
multiple memory locations, usually either on or off the program stack.
In Table 2, the register X0
holds a memory address.
Instruction | Translation |
---|---|
|
D1 = *(X0), D2 = *(X0+8) (i.e., loads the value at X0 and X0+8 into registers D1 and D2 respectively) |
|
X0 = X0 + 0x10, then sets D1 = *(X0), D2 = *(X0+8) |
|
D1 = *(X0), D2 = *(X0+8), then sets X0 = X0 + 0x10 |
|
*(X0) = S1, *(X0+8) = S2 (i.e., stores S1 and S2 at locations *(X0) and *(X0+8) respectively) |
|
sets X0 = X0 - 16, then stores *(X0) = S1, *(X0+8) = S2 |
|
stores *(X0) = S1, *(X0+8) = S2 then sets X0 = X0 - 16 |
In short, the ldp
instruction loads the pair of values at locations X0
and X0+0x8
into the destination registers D1
and D2
respectively. Meanwhile,
the stp
instruction stores the pair of values in source registers S1
and S2
at memory locations X0
and X0+0x8
. Note that the assumption here is
that values in the registers are 64-bit quantities. If 32-bit registers are being used instead, the memory offsets change to X0
and X0+0x4
respectively.
There are also two special forms of the ldp
and stp
instructions that enable simultaneous updates to X0
. For example, the instruction
stp S1, S2, [X0, #-16]!
implies that 16 bytes should first be subtracted from X0
, and only afterward should S1
and S2
be stored
at the offsets X0
and X0+0x8
. In contrast, the instruction ldp D1, D2, [X0], #0x10
states that the values at offsets X0
and X0+8
should first be stored in destination registers D1
and D2
and only afterward should X0
have 16 bytes added to it. These special forms
are commonly used at the beginning and end of functions that have multiple function calls, as we will see in
future sections.
9.2.1. Putting it all together: a more concrete example
Let’s take a closer look at the adder2()
function
//adds two to an integer and returns the result
int adder2(int a) {
return a + 2;
}
and its corresponding assembly code:
0000000000000724 <adder2>: 724: d10043ff sub sp, sp, #0x10 728: b9000fe0 str w0, [sp, #12] 72c: b9400fe0 ldr w0, [sp, #12] 730: 11000800 add w0, w0, #0x2 734: 910043ff add sp, sp, #0x10 738: d65f03c0 ret
The assembly code consists of a sub
instruction, followed by a str
and
ldr
instruction, two add
instructions, and finally a ret
instruction.
To understand how the CPU executes this set of instructions, we need to
revisit the structure of
program memory.
Recall that every time a program executes, the operating system allocates the
new program’s address space (also known as virtual memory).
Virtual memory and the related concept of
processes are covered in greater detail in
chapter 13; for now, it suffices to think of processes as the abstraction of a
running program and virtual memory as the memory that is allocated to a single
process. Every process has its own region of memory called the call stack.
Keep in mind that the call stack is located in process/virtual memory, unlike
registers (which are located in the CPU).
Figure 2 depicts a sample state of the call stack and registers prior
to the execution of the adder2()
function.

Notice that the stack grows toward lower addresses. The parameter to the
adder2()
function (or a
) is stored in register x0
by convention. Since
a
is of type int
, it is stored in component register w0
, which is shown
in the figure above. Likewise, since the adder2()
function returns an int
,
component register w0
is used for the return value instead of x0
.
The addresses associated with the instructions in the code segment of program
have been shortened to (0x724
-0x738
) to improve figure readability.
Likewise, the addresses associated with the call stack segment of program
memory have been shortened to 0xe40
-0xe50
from a range of 0xffffffffee40
to 0xffffffffee50
for figure readability. In truth, call stack addresses
occur at much higher addresses in program memory than code segment addresses.
Pay close attention to the initial values of registers sp
and
pc
: they are 0xe50
and 0x724
respectively. The pc
register
(or program counter) indicates the next instruction to execute, and the address
0x724
corresponds to the first instruction in the adder2()
function.
The arrow visually indicates the currently executing instruction.

The first instruction (sub sp, sp, #0x10
) subtracts the constant value
#0x10
from the stack pointer, and updates the stack pointer with the new
result. Since the stack pointer contains the address of the top of the stack,
this operation grows the stack by 16 bytes. The
stack pointer now contains the address 0xe40
, while the program counter
(pc
) register contains the address of the next instruction to execute,
or 0x728
.

Recall that the str
instruction stores a value located in a register into
memory. Thus, the next instruction (str w0, [sp, #12]
) places the value
in w0
(the value of a
, or 0x28
) at call stack location sp + 12
, or
0xe4c
. Note that this instruction does not modify the contents of register
sp
in anyway; it simply stores a value on the call stack. Once this
instruction executes, pc
advances to the address of the next instruction,
or 0x72c
.

Next, ldr w0, [sp, #12]
executes. Recall the ldr
instruction loads
a value in memory into a register. By executing this instruction, the CPU
replaces the value in register w0
with the value located at stack address
sp + 12
. While this may seem like a nonsensical operation (0x28
is
replaced by 0x28
after all), it highlights a convention where the compiler
typically stores function parameters onto the call stack for later use, and
then re-loads the into registers as needed. Again, the value stored in the
sp
register is not impacted by the str
operation. As far as the
program is concerned, the "top" of the stack is still 0xe40
. Once the
ldr
instruction executes, pc
advances to address 0x730
.

Afterwards, add w0, w0, #0x2
executes. Recall that the add
instruction
has the form add D, O1, O2
and places O1 + O2
in the destination register
D
. So, add w0, w0, #0x2
adds the constant value #0x2
to the value
stored in w0
(0x28
), resulting in 0x2A
being stored in register w0
.
Register pc
advances to the next instruction to be executed, or 0x734
.

The next instruction that executes is add sp, sp, #0x10
. This instruction
adds 16 bytes to the address stored in sp
. Since the stack grows toward
lower addresses, adding 16 bytes to the stack pointer consequently shrinks
the stack, and reverts sp
to its original value of 0xe50
. The pc
register then advances to 0x738
.
Recall that the purpose of the call stack is to store the temporary data that
each function uses as it executes in the context of a larger program. By
convention, the stack "grows" at the beginning of a function call, and
reverts to its original state once the function ends. As a result, it is
common to see a sub sp, sp, #v
instruction (where #v
is some constant
value v) at the beginning of a function, and add sp, sp, #v
at the end.

The last instruction that executes is ret
. We will talk more about what
ret
does in future sections when we discuss function calls, but it
for now it suffices to know that ret
prepares the call stack for
returning from a function. By convention, the register x0
always contains
the return value (if one exists). In this case, since adder2()
is of type
int
, the return value is stored in component register w0
and the function
returns the value 0x2A
, or 42
.