8.8. Matrices

A matrix is a 2D array. A matrix in the C language can be statically allocated as a 2-dimensional array (M[n][m]), dynamically allocated with a single call to malloc(), or dynamically allocated as an array of arrays. Let’s consider the array of arrays implementation. The first array contains n elements (M[n]), and each element M[i] in our matrix contains an array of m elements. The following code snippets each declare matrices of size 4 x 3:

//statically-allocated matrix (allocated on stack)
int M1[4][3];

//dynamically-allocated matrix (programmer friendly, allocated on heap)
int **M2, i;
M2 = malloc(4 * sizeof(int*));
for (i = 0; i < 4; i++) {
    M2[i] = malloc(3 * sizeof(int));
}

In the case of the dynamically allocated matrix, the main array contains a contiguous array of int pointers. Each integer pointer points to a different array in memory. Figure 1 illustrates how we would normally visualize each of these matrices:

matrices
Figure 1. Illustration of a statically-allocated (M1) and dynamically-allocated (M2) 3x4 matrix.

For both of the above matrix declarations, element (i,j) can be accessed using the double indexing syntax M[i][j], where M is either M1 or M2. However, the organization of these matrices are different in memory. While both store the elements in their primary array contiguously in memory, our statically allocated matrix also stores all the rows contiguously in memory:

matrixArray
Figure 2. Matrix M1’s memory layout in row-major order.

This contiguous ordering is not guaranteed for M2. Recall that to contiguously allocate an n x m matrix on the heap, we should use a single call to malloc() that allocates n x m elements:

//dynamic matrix (allocated on heap, memory efficient way)
#define ROWS 4
#define COLS 3
int *M3;
M3  = malloc(ROWS*COLS*sizeof(int));

Recall that with the declaration of M3, element (i,j) cannot be accessed using the M[i][j] notation. Instead, we must index the element using the format: M3[i*cols + j].

8.8.1. Contiguous Two Dimensional Arrays

Consider a function sumMat() that takes a pointer to a contiguously allocated (either statically allocated or a memory efficient dynamically allocated) matrix as its first parameter, along with a number of rows and columns and returns the sum of all the elements inside the matrix.

We use scaled indexing in the code snippet below, since it applies to both statically and dynamically allocated contiguous matrices. Recall that the syntax m[i][j] does not work with the memory efficient contiguous dynamic allocation previously discussed.

int sumMat(int *m, int rows, int cols) {
    int i, j, total = 0;
    for (i = 0; i < rows; i++){
        for (j = 0; j < cols; j++){
            total += m[i*cols + j];
        }
    }
    return total;
}

Here is the corresponding assembly. Each line is annotated with its English translation:

<sumMat>:
0x08048507 <+0>:   push  %ebp                   #save ebp
0x08048508 <+1>:   mov   %esp,%ebp              #update ebp (new stack frame)
0x0804850a <+3>:   sub   $0x10,%esp             #add 4 more spaces to stack frame
0x0804850d <+6>:   movl  $0x0,-0xc(%ebp)        #copy 0 to ebp-12 (total)
0x08048514 <+13>:  movl  $0x0,-0x4(%ebp)        #copy 0 to ebp-4 (i)
0x0804851b <+20>:  jmp   0x8048555 <sumMat+78>  #goto <sumMat+78>
0x0804851d <+22>:  movl  $0x0,-0x8(%ebp)        #copy 0 to ebp-8 (j)
0x08048524 <+29>:  jmp   0x8048549 <sumMat+66>  #goto <sumMat+66>
0x08048526 <+31>:  mov   -0x4(%ebp),%eax        #copy i to eax
0x08048529 <+34>:  imul  0x10(%ebp),%eax        #multiply i with cols, place in eax
0x0804852d <+38>:  mov   %eax,%edx              #copy i*cols to edx
0x0804852f <+40>:  mov   -0x8(%ebp),%eax        #copy j to %eax
0x08048532 <+43>:  add   %edx,%eax              #add i*cols with j, place in eax
0x08048534 <+45>:  lea   0x0(,%eax,4),%edx      #multiply (i*cols+j) by 4,put in edx
0x0804853b <+52>:  mov   0x8(%ebp),%eax         #copy m pointer to eax
0x0804853e <+55>:  add   %edx,%eax              #add m to (i*cols+j)*4, place in eax
0x08048540 <+57>:  mov   (%eax),%eax            #copy m[i*cols+j] to eax
0x08048542 <+59>:  add   %eax,-0xc(%ebp)        #add eax to total
0x08048545 <+62>:  addl  $0x1,-0x8(%ebp)        #increment j by 1 (j+=1)
0x08048549 <+66>:  mov   -0x8(%ebp),%eax        #copy j to eax
0x0804854c <+69>:  cmp   0x10(%ebp),%eax        #compare j with cols
0x0804854f <+72>:  jl    0x8048526 <sumMat+31>  #if (j < cols) goto <sumMat+31>
0x08048551 <+74>:  addl  $0x1,-0x4(%ebp)        #add 1 to i (i+=1)
0x08048555 <+78>:  mov   -0x4(%ebp),%eax        #copy i to eax
0x08048558 <+81>:  cmp   0xc(%ebp),%eax         #compare i with rows
0x0804855b <+84>:  jl    0x804851d <sumMat+22>  #if (i < rows) goto sumMat+22
0x0804855d <+86>:  mov   -0xc(%ebp),%eax        #copy total to eax
0x08048560 <+89>:  leave                        #prepare to leave the function
0x08048561 <+90>:  ret                          #return total

Local variables i, j, and total are loaded at addresses %ebp-4, %ebp-8 and %ebp-12 on the stack, respectively. Input parameters m, row, and cols are located at locations %ebp+8, %ebp+12, and %ebp+16 respectively. Using this knowledge, let’s zoom in on the component that just deals with the access of element (i,j) in our matrix:

0x08048526 <+31>:  mov   -0x4(%ebp),%eax        #copy i to eax
0x08048529 <+34>:  imul  0x10(%ebp),%eax        #multiply i with cols, place in eax
0x0804852d <+38>:  mov   %eax,%edx              #copy i*cols to edx

The first set of instructions computes i * cols and places the result in register %edx. Recall that for a matrix matrix, the pointer notation matrix + (i * cols) is equivalent to the access &matrix[i].


0x0804852f <+40>:  mov   -0x8(%ebp),%eax       #copy j to eax
0x08048532 <+43>:  add   %edx,%eax             #add i*cols with j, place in eax
0x08048534 <+45>:  lea   0x0(,%eax,4),%edx     #multiply (i*cols+j) by 4, put in edx

The next set of instructions computes (i * cols + j) * 4. The compiler multiplies the index (i * cols) + j by 4 since each element in the matrix is a 4 byte integer and this multiplication enables the compiler to calculate the correct offset.


The last set of instructions adds the calculated offset to the matrix pointer and dereferences it to yield the value of element (i,j):

0x0804853b <+52>:  mov   0x8(%ebp),%eax         #copy m pointer to eax
0x0804853e <+55>:  add   %edx,%eax              #add m to (i*cols+j)*4, place in eax
0x08048540 <+57>:  mov   (%eax),%eax            #copy m[i*cols+j] to eax
0x08048542 <+59>:  add   %eax,-0xc(%ebp)        #add eax to total

The first instruction loads the address of matrix m into register %eax. The add instruction adds the offset (i*cols + j)*4 to the address of m to correctly calculate the address of element (i,j), and places this address in register %eax. The third instruction dereferences %eax and places the resulting value in register %eax. The last instruction adds the value in %eax to the accumulator total, which is located at stack address %ebp-0xc.

Let’s consider how element (1,2) is accessed in Figure 2 (reproduced below for convenience):

matrixArray
Figure 3. Matrix M1’s memory layout in row-major order.

Element (1,2) is located at address M1 + (1 * cols) + 2. Since cols = 3, element (1,2) corresponds to M1+5. To access the element at this location, the compiler must multiply 5 by the size of the int data type (4 bytes), yielding the offset M1 + 20, which corresponds to byte x20 in the figure. Dereferencing this location yields element 5, which is indeed element (1,2) in the matrix.

8.8.2. Non-contiguous Matrix

The non-contiguous matrix implementation is a bit more complicated. Figure 4 visualizes how M2 may be laid out in memory:

matrixDynamic
Figure 4. Matrix M2’s non-contiguous layout in memory.

Notice that the array of pointers is contiguous, and that each array pointed to by an element of M2 (i.e. M2[i]) is contiguous. However, the individual arrays are not contiguous with each other.

The sumMatrix() function below takes an array of integer pointers (called matrix) as its first parameter, and a number of rows and columns as its second and third parameters:

int sumMatrix(int **matrix, int rows, int cols) {
    int i, j, total=0;
    for (i = 0; i < rows; i++) {
        for (j = 0; j < cols; j++) {
            total += matrix[i][j];
        }
    }
    return total;
}

While this function looks nearly identical to the sumMat() function shown earlier, the matrix accepted by this function consists of a contiguous array of pointers. Each pointer contains the address of a separate contiguous array, which corresponds to a separate row in the matrix.

The corresponding assembly for sumMatrix() follows. Each line is annotated with its English translation.

0x080484ad <+0>:   push  %ebp                      # save ebp
0x080484ae <+1>:   mov   %esp,%ebp                 # update ebp (new stack frame)
0x080484b0 <+3>:   sub   $0x10,%esp                # add 4 more spaces to stack frame
0x080484b3 <+6>:   movl  $0x0,-0xc(%ebp)           # copy 0 to %ebp-12 (total)
0x080484ba <+13>:  movl  $0x0,-0x4(%ebp)           # copy 0 to %ebp-4 (i)
0x080484c1 <+20>:  jmp   0x80484fa <sumMatrix+77>  # goto <sumMatrix+77>
0x080484c3 <+22>:  movl  $0x0,-0x8(%ebp)           # copy 0 to %ebp-8 (j)
0x080484ca <+29>:  jmp   0x80484ee <sumMatrix+65>  # goto <sumMatrix+65>
0x080484cc <+31>:  mov   -0x4(%ebp),%eax           # copy i to %eax
0x080484cf <+34>:  lea   0x0(,%eax,4),%edx         # multiply i by 4, place in %edx
0x080484d6 <+41>:  mov   0x8(%ebp),%eax            # copy matrix to %eax
0x080484d9 <+44>:  add   %edx,%eax                 # add i*4 to matrix, place in %eax
0x080484db <+46>:  mov   (%eax),%eax               # copy matrix[i] to %eax
0x080484dd <+48>:  mov   -0x8(%ebp),%edx           # copy j to %edx
0x080484e0 <+51>:  shl   $0x2,%edx                 # multiply j by 4, place in %edx
0x080484e3 <+54>:  add   %edx,%eax                 # add j*4 to matrix[i],put in %eax
0x080484e5 <+56>:  mov   (%eax),%eax               # copy matrix[i][j] to %eax
0x080484e7 <+58>:  add   %eax,-0xc(%ebp)           # add matrix[i][j] to total
0x080484ea <+61>:  addl  $0x1,-0x8(%ebp)           # add 1 to j (j+=1)
0x080484ee <+65>:  mov   -0x8(%ebp),%eax           # copy j to %eax
0x080484f1 <+68>:  cmp   0x10(%ebp),%eax           # compare j with cols
0x080484f4 <+71>:  jl    0x80484cc <sumMatrix+31>  # if (j < cols) goto <sumMatrix+31>
0x080484f6 <+73>:  addl  $0x1,-0x4(%ebp)           # add 1 to i (i+=1)
0x080484fa <+77>:  mov   -0x4(%ebp),%eax           # copy i to %eax
0x080484fd <+80>:  cmp   0xc(%ebp),%eax            # compare i with rows
0x08048500 <+83>:  jl    0x80484c3 <sumMatrix+22>  # if (i < rows) goto <sumMatrix+22>
0x08048502 <+85>:  mov   -0xc(%ebp),%eax           # copy total to %eax
0x08048505 <+88>:  leave                           # prepare to leave the function
0x08048506 <+89>:  ret                             # return total

Once again, variables i, j, and total are at stack addresses %ebp-4, %ebp-8 and and %ebp-12 respectively. Input parameters m, row, and cols are located at stack addresses %ebp+8, %ebp+12, and %ebp+16 respectively.

Let’s zoom in on the section that deals specifically with an access to element (i,j), or matrix[i][j]:

0x080484cc <+31>:  mov   -0x4(%ebp),%eax      # copy i to %eax
0x080484cf <+34>:  lea   0x0(,%eax,4),%edx    # multiply i by 4, place in %edx
0x080484d6 <+41>:  mov   0x8(%ebp),%eax       # copy matrix to %eax
0x080484d9 <+44>:  add   %edx,%eax            # add i*4 to matrix, place in %eax
0x080484db <+46>:  mov   (%eax),%eax          # copy matrix[i] to %eax

The five instructions between <sumMatrix+31> and <sumMatrix+46> compute matrix[i], or *(matrix+i). Note that the compiler needs to multiply i by 4 prior to adding it to matrix to calculate the correct offset (recall that pointers are 4 bytes in size). The instruction at <sumMatrix+46> then dereferences the calculated address to get the element matrix[i].

Since matrix is an array of int pointers, the element located at matrix[i] is itself an int pointer. The jth element in matrix[i] is located at offset j*4 in the matrix[i] array.


The next set of instructions extract the jth element in array matrix[i]:

0x080484dd <+48>:  mov   -0x8(%ebp),%edx      # copy j to %edx
0x080484e0 <+51>:  shl   $0x2,%edx            # multiply j by 4, place in %edx
0x080484e3 <+54>:  add   %edx,%eax            # add j*4 to matrix[i], place in %eax
0x080484e5 <+56>:  mov   (%eax),%eax          # copy matrix[i][j] to %eax
0x080484e7 <+58>:  add   %eax,-0xc(%ebp)      # add matrix[i][j] to total

The first instruction in this snippet loads variable j into register %edx. The compiler uses the left shift (shl) instruction to multiply j by 4 and stores the result in register %edx. The compiler then adds the resulting value to the address located in matrix[i] to get the address of matrix[i][j].

Let’s revisit Figure 4 to consider an example access of M2[1][2] (reproduced below for convenience).

matrixDynamic
Figure 5. Matrix M2’s non-contiguous layout in memory.

Note that M2 starts at memory location x0. The compiler first computes the address of M2[1] by multiplying 1 by 4 (sizeof(int *)) and adding it to the address of M2 (x0), yielding the new address x4. A dereference on this address yields the address associated with M2[1], or x36. The compiler then multiplies index 2 by 4 (sizeof(int)), and adds the result (8) to x36, yielding a final address of x44. The address x44 is dereferenced, yielding the value 5. Sure enough, the element in [DynamicMatrix6] that corresponds to M2[1][2] has the value 5.