9.4.1. Preliminaries
Conditional Comparison Instructions
Comparison instructions perform an arithmetic operation for the purpose of guiding the conditional execution of a program. Table 1 lists the basic instructions associated with conditional control.
Instruction | Translation |
---|---|
|
Compares O1 with O2 (computes O1 - O2) |
|
Computes O1 |
The cmp
instruction compares the value of two operands, O1 and O2. Specifically,
it subtracts O2 from O1. The tst
instruction performs bitwise AND. It is common to
see an instruction like:
tst x0, x0
In this example, the bitwise AND of x0
with itself is zero only when x0
contains
zero. In other words, this is a test for a zero value and is equivalent to:
cmp x0, #0
Unlike the arithmetic instructions covered thus far, cmp
and tst
do not modify a
destination register. Instead, both instructions modify a series of single-bit values known
as condition code flags. For example, cmp
will modify condition code flags
based on whether the value O1 - O2 results in a positive (greater), negative (less), or zero
(equal) value. Recall that condition code values encode information
about an operation in the ALU. The condition code flags are part of the ARM processor state (PSTATE
),
which replaces the current program status register (CPSR
) from ARMv7-A systems.
Flag | Translation |
---|---|
|
Is equal to zero (1: yes, 0: no) |
|
Is negative (1: yes, 0: no) |
|
Signed overflow has occurred (1: yes, 0: no) |
|
Arithmetic carry/unsigned overflow has occurred (1: yes, 0: no) |
Table 2 depicts the common flags used for condition code operations. Revisiting the
cmp O1, O2
instruction:
-
The
Z
flag is set to 1 if O1 and O2 are equal. -
The
N
flag is set to 1 if O1 is less than O2 (i.e. O1 - O2 results in a negative value). -
The
V
flag is set to 1 if the operation O1 - O2 results in overflow (useful for signed comparisons). -
The
C
flag is set to 1 if the operation O1 - O2 results in an arithmetic carry operation (useful for unsigned comparisons).
While an in-depth discussion of condition code flags is beyond the scope of this book, the setting of these registers by
cmp
and tst
enables the next set of instructions we cover (the branch instructions) to operate correctly.
The Branch Instructions
A branch instruction enables a program’s execution to "jump" to a new position in the code.
In the assembly programs we have traced through thus far, pc
always points to the next
instruction in program memory. The branch instructions enable pc
to be set to either
a new instruction not yet seen (as in the case of an if
statement) or to a previously executed
instruction (as in the case of a loop).
Direct branch instructions
Instruction | Description |
---|---|
|
|
|
|
|
If R is equal to 0, |
|
If R is not equal to 0, |
|
If c, |
Table 3 lists the set of common branch instructions; L
refers to a
symbolic label, which serves as an identifier in the program’s object file. All labels
consist of some letters and digits followed by a colon. Labels can be local or global to an object
file’s scope. Function labels tend to be global
and usually consist of the function name and a colon. For example, main:
(or <main>:
)
is used to label a user-defined main
function. In contrast, labels whose scope are
local are preceded by a period. For example, .L1:
is a label one may encounter in
the context of an if
statement or loop.
All labels have an associated address (addr
in Table 3). When the CPU executes a b
instruction, it sets
the pc
register to addr
. The b
instruction enables the program counter to change within 128 MB of its
current location; a programmer writing assembly can also specify a particular address to branch to
by using the br
instruction. Unlike the b
instruction, there are no restrictions on the address
range of br
.
Sometimes, local labels also are shown as an offset from the start of a function. Therefore,
an instruction whose address is 28 bytes away from the start of main
may be represented with
the label <main+28>
. For example, the instruction b 0x7d0 <main+28>
indicates a branch to
address 0x7d0, which has the associated label <main+28>
, meaning that
it is 28 bytes away from the starting address of the main
function. Executing this
instruction sets pc
to 0x7d0.
The last three instructions are conditional branch instructions. In other words, the program
counter register is set to addr
only if the given condition evaluates to true. The cbz
and
cbnz
instructions require a register in addition to an address. In the case of cbz
, if
R is zero, the branch is taken and pc
is set to addr
. In the case of cbnz
, if R is nonzero,
the branch is taken and pc
is set to addr
.
The most powerful of the conditional branch instructions are the b.c
instructions, which enable the
compiler or assembly writer to pick a custom suffix that indicates the condition on which a branch is taken.
Conditional branch instruction suffixes
Table 4 lists the set of common conditional branch suffixes (c). When used in conjunction
with a branch, each instruction starts with the letter b
and a dot, denoting that it is a branch instruction.
The suffix of each instruction (c) indicates the condition for the branch. The branch instruction suffixes also
determine whether to interpret numerical comparisons as signed or unsigned. Note that
conditional branch instructions have a much more limited range (1 MB) than the b
instruction. These suffixes are
also used for the conditional select instruction (csel
), which is covered in the next section.
Signed Comparison | Unsigned Comparison | Description |
---|---|---|
|
|
branch if equal (==) or branch if zero |
|
|
branch if not equal (!=) |
|
|
branch if minus (negative) |
|
|
branch if non-negative (>= 0) |
|
|
branch if greater than (higher) (>) |
|
|
branch if greater than or equal (>=) |
|
|
branch if less than (<) |
|
|
branch if less than or equal (<=) |
The goto Statement
In the following subsections, we look at conditionals and loops in assembly and
reverse engineer them back to C. When translating assembly code of conditionals and
loops back into C, it is useful to understand their corresponding C language goto
forms. The
goto
statement is a C primitive that forces program execution to switch to another
line in the code. The assembly instruction associated with the goto
statement is
b
.
The goto
statement consists of the goto
keyword followed by a goto label,
a type of program label that indicates that execution should continue at the
corresponding label. So, goto done
means that the program execution should
branch to the line marked by label done
. Other examples of program labels in C include
the switch statement labels
previously covered in Chapter 2.
The following code listings depict a function getSmallest
written in regular C code (left) and its
associated goto
form in C (right). The getSmallest
function compares the value of
two integers (x
and y
), and assigns the smaller value to variable smallest
.
Regular C version | Goto version |
---|---|
|
|
The goto
form of this function may seem counterintuitive, but let’s discuss
what exactly is going on. The conditional checks to see whether variable x
is less
than or equal to y
.
-
If
x
is less than or equal toy
, the program transfers control to the label marked byelse_statement
, which contains the single statementsmallest = x
. Since the program executes linearly, the program continues on to execute the code under the labeldone
, which returns the value ofsmallest
(x
). -
If
x
is greater thany
, thensmallest
is set toy
. The program then executes the statementgoto done
, which transfers control to thedone
label, which returns the value ofsmallest
(y
).
While goto
statements were commonly used in the early days of programming,
their use in modern code is considered bad practice, because it
reduces the overall readability of code. In fact,
computer scientist Edsger Dijkstra wrote a famous paper lambasting the
use of goto
statements called Go To Statement Considered Harmful1.
In general, well-designed C programs do not use goto
statements and programmers are
discouraged from using it to avoid writing code that is difficult to read, debug, and maintain.
However, the C goto
statement is important to understand, as GCC typically changes C code
with conditionals into a goto
form prior to translating it to assembly, including code that
contains if
statements and loops.
The following subsections cover the assembly representation of if
statements and loops in greater
detail.
References
-
Edsger Dijkstra. "Go To Statement Considered Harmful". Communications of the ACM 11(3) pp. 147—148. 1968.