8.7. Arrays
Recall that arrays
are ordered collections of data elements of the same type that are contiguously stored
in memory. Statically allocated single
dimension arrays have the form Type arr[N]
where Type
is the data type, arr
is the
identifier associated with the array, and N
is the number of data elements. Declaring
an array statically as Type arr[N]
or dynamically as arr = malloc(N*sizeof(Type))
allocates
N x sizeof(Type) total bytes of memory, with arr
pointing to it.
To access the element at index i in array arr
, use the syntax arr[i]
.
Compilers commonly convert array references into pointer arithmetic
prior to translating to assembly. So arr+i
is
equivalent to &arr[i]
and *(arr+i)
is equivalent to arr[i]
. Since each data element
in arr
is of type Type
, arr+i
implies that element i is stored at address
arr + sizeof(Type) * i
.
Table 1 outlines some common array operations and their corresponding assembly
instructions. Assume register %edx
stores the address of arr
, register %ecx
stores the
value i
, and register %eax
represents some variable x
.
Operation | Type | Assembly Representation |
---|---|---|
|
int * |
|
|
int |
|
|
int |
|
|
int * |
|
|
int * |
|
|
int |
|
Pay close attention to the type of each expression in [ArrayOps]. In general,
the compiler uses mov
instructions to dereference pointers and the lea
instruction to
compute addresses.
Notice that to access element arr[3]
(or *(arr+3)
using pointer arithmetic) the
compiler performs a memory lookup on address arr*3*4
instead of arr*3
. To understand
why this is necessary, recall that any element at index i in an array is stored at address
arr + sizeof(Type) * i
. The compiler must therefore multiply the index by the size of the
data type to compute the correct offset. Recall also that memory is byte-addressable; offsetting
by the correct number of bytes is the same as computing an address.
As an example, consider a sample array (array
) with five integer elements (Figure 1):

Notice that since array
is an array of integers, each element takes up exactly four bytes. Thus, an
integer array with five elements consumes 20 bytes of contiguous memory.
To compute the address of Element 3, the compiler multiplies the index 3 by the data size of the integer type (4) to yield an offset of 12. Sure enough, Element 3 in [FigArray6] is located at byte offset x12.
Let’s take a look at a simple C function called sumArray()
that sums up all the elements
in an array:
int sumArray(int *array, int length) {
int i, total = 0;
for (i = 0; i < length; i++) {
total += array[i];
}
return total;
}
The sumArray()
function takes the address of an array and the array’s associated length
and sums up all the elements in the array. Now take a look at the corresponding assembly
of the sumArray()
function:
<sumArray>: <+0>: push %ebp # save ebp <+1>: mov %esp,%ebp # update ebp (new stack frame) <+3>: sub $0x10,%esp # add 16 bytes to stack frame <+6>: movl $0x0,-0x8(%ebp) # copy 0 to %ebp-8 (total) <+13>: movl $0x0,-0x4(%ebp) # copy 0 to %ebp-4 (i) <+20>: jmp 0x80484ab <sumArray+46> # goto <sumArray+46> (start) <+22>: mov -0x4(%ebp),%eax # copy i to %eax <+25>: lea 0x0(,%eax,4),%edx # copy i*4 to %edx <+32>: mov 0x8(%ebp),%eax # copy array to %eax <+35>: add %edx,%eax # copy array+i*4 to %eax <+37>: mov (%eax),%eax # copy *(array+i*4) to %eax <+39>: add %eax,-0x8(%ebp) # add *(array+i*4) to total (total+=array[i]) <+42>: addl $0x1,-0x4(%ebp) # add 1 to i <+46>: mov -0x4(%ebp),%eax # copy i to %eax <+49>: cmp 0xc(%ebp),%eax # compare i with length <+52>: jl 0x8048493 <sumArray+22> # if (i < length) goto <sumArray+22> (loop) <+54>: mov -0x8(%ebp),%eax # copy total to eax <+57>: leave # prepare to leave the function <+58>: ret # return total
When tracing the above assembly code, consider whether the data being accessed represents an address or
a value. For example, the instruction at <sumArray+13>
results in %ebp-4
containing a variable of
type int
which is initially set to 0
. In contrast, the argument stored at %ebp+8
is the first
argument to the function (array
) which is of type int *
and corresponds to the base address of the
array. A different variable (which we call total
) is stored at location %ebp-8
.
Let’s take a closer look at the five instructions between locations <sumArray+22>
and <sumArray+39>
:
<+22>: mov -0x4(%ebp),%eax # copy i to %eax <+25>: lea 0x0(,%eax,4),%edx # copy i*4 to %edx <+32>: mov 0x8(%ebp),%eax # copy array to %eax <+35>: add %edx,%eax # copy array+i*4 to %eax <+37>: mov (%eax),%eax # copy *(array+i*4) to %eax <+39>: add %eax,-0x8(%ebp) # add *(array+i*4) to total (total+=array[i])
Recall that the compiler commonly uses lea
to perform simple arithmetic on operands.
The operand 0x0(,%eax,4)
translates to %eax*4 + 0x0
. Since %eax
holds the value
i
, this operation copies the value i*4
to %edx
. At this point, %edx
contains the
number of bytes that must be added to calculate the correct offset of array[i]
.
The next instruction (mov 0x8(%ebp), %eax
) copies the first argument (the address to array
) into %eax
.
Adding %edx
to %eax
in the next instruction causes %eax
to contain array+i*4
.
Recall that the element at index i in array
is stored at address array + sizeof(T) * i
Therefore, %eax
now contains the assembly level computation of address &array[i]
.
The instruction at <sumArray+37>
dereferences the value located at %eax
, placing the value array[i]
into %eax
. Lastly, %eax
is added to the value in %ebp-8
, or total
. Therefore, the
five instructions between locations <sumArray+22>
and <sumArray+39>
correspond to the line
total += array[i]
in the sumArray()
function.