7.9. Structs in Assembly

A struct is another way to create a collection of data types in C. Unlike arrays, structs enable different data types to be grouped together. C stores structs like single dimension arrays, where data elements (fields) are stored contiguously.

Let’s revisit the studentT struct from Chapter 1:

struct studentT {
    char name[64];
    int  age;
    int  grad_yr;
    float gpa;
};

struct studentT student;

Figure 1 shows how the student struct is laid out in memory. Each xi denotes the address of a particular field.

structArray
Figure 1. The memory layout of a struct studentT.

Each field is stored contiguously next to each other in memory in the order in which they are declared. In Figure 1 the age field is allocated at the memory location directly after the name field (at byte offset x64), and is followed by the grad_yr (byte offset x68) and gpa (byte offset x72) fields. This organization enables memory-efficient access to the fields of the struct.

To understand how the compiler generates assembly code to work with structs, consider the function initStudent():

void initStudent(struct studentT *s, char *nm, int ag, int gr, float g) {
    strcpy(s->name, nm);
    s->grad_yr = gr;
    s->age = ag;
    s->gpa = g;
}

The initStudent() function uses the base address of a studentT struct as its first parameter, and desired values for each field as its remaining parameters. The listing below depicts this function in assembly.

Dump of assembler code for function initStudent:
0x400596 <+0>:   push   %rbp                  # save rbp
0x400597 <+1>:   mov    %rsp,%rbp             # update rbp (new stack frame)
0x40059a <+4>:   sub    $0x20,%rsp            # add 32 bytes to stack frame
0x40059e <+8>:   mov    %rdi,-0x8(%rbp)       # copy 1st parameter to %rbp-0x8 (s)
0x4005a2 <+12>:  mov    %rsi,-0x10(%rbp)      # copy 2nd parameter to %rpb-0x10 (nm)
0x4005a6 <+16>:  mov    %edx,-0x14(%rbp)      # copy 3rd parameter to %rbp-0x14 (ag)
0x4005a9 <+19>:  mov    %ecx,-0x18(%rbp)      # copy 4th parameter to %rbp-0x18 (gr)
0x4005ac <+22>:  movss  %xmm0,-0x1c(%rbp)     # copy 5th parameter to %rbp-0x1c (g)
0x4005b1 <+27>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005b5 <+31>:  mov    -0x10(%rbp),%rdx      # copy nm to %rdx
0x4005b9 <+35>:  mov    %rdx,%rsi             # copy nm to %rsi
0x4005bc <+38>:  mov    %rax,%rdi             # copy s to %rdi
0x4005bf <+41>:  callq  0x400460 <strcpy@plt> # call strcpy(s->name, nm)
0x4005c4 <+46>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005c8 <+50>:  mov    -0x18(%rbp),%edx      # copy gr to %edx
0x4005cb <+53>:  mov    %edx,0x44(%rax)       # copy gr to %rax+0x44 (s->grad_yr)
0x4005ce <+56>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005d2 <+60>:  mov    -0x14(%rbp),%edx      # copy ag to %edx
0x4005d5 <+63>:  mov    %edx,0x40(%rax)       # copy ag to %rax+0x40 (s->age)
0x4005d8 <+66>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005dc <+70>:  movss  -0x1c(%rbp),%xmm0     # copy g to %xmm0
0x4005e1 <+75>:  movss  %xmm0,0x48(%rax)      # copy g to %rax+0x48
0x4005e7 <+81>:  leaveq                       # prepare stack for exiting function
0x4005e8 <+82>:  retq                         # return (void function, %rax ignored)

Being mindful of the byte offsets of each field is key to understanding this code. A few things to keep in mind:

  • the strcpy() call takes the base address of the name field of struct s and the address of array nm as its two arguments. Recall that since name is the first field in the studentT struct, the address of s is synonymous with the address of s→name.

0x40059e <+8>:   mov    %rdi,-0x8(%rbp)       # copy 1st parameter to %rbp-0x8 (s)
0x4005a2 <+12>:  mov    %rsi,-0x10(%rbp)      # copy 2nd parameter to %rpb-0x10 (nm)
0x4005a6 <+16>:  mov    %edx,-0x14(%rbp)      # copy 3rd parameter to %rbp-0x14 (ag)
0x4005a9 <+19>:  mov    %ecx,-0x18(%rbp)      # copy 4th parameter to %rbp-0x18 (gr)
0x4005ac <+22>:  movss  %xmm0,-0x1c(%rbp)     # copy 5th parameter to %rbp-0x1c (g)
0x4005b1 <+27>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005b5 <+31>:  mov    -0x10(%rbp),%rdx      # copy nm to %rdx
0x4005b9 <+35>:  mov    %rdx,%rsi             # copy nm to %rsi
0x4005bc <+38>:  mov    %rax,%rdi             # copy s to %rdi
0x4005bf <+41>:  callq  0x400460 <strcpy@plt> # call strcpy(s->name, nm)
  • The above code snippet contains an undiscussed register (%xmm0) and an instruction (movss). The %xmm0 register is an example of a register reserved for floating point values. The movss instruction indicates that the data being moved onto the call stack is of type single-precision floating point.

  • The next part of the code (instructions <initStudent+46> thru <initStudent+53>) places the value of the gr parameter at an offset of 0x44 (or 68) from the start of s. Revisiting the memory layout of the struct in Figure 1 shows that this address corresponds to s→grad_yr.

0x4005c4 <+46>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005c8 <+50>:  mov    -0x18(%rbp),%edx      # copy gr to %edx
0x4005cb <+53>:  mov    %edx,0x44(%rax)       # copy gr to %rax+0x44 (s->grad_yr)
  • The next section of code (instructions <initStudent+56> thru <initStudent+63>) copies the ag parameter to the s→age field of the struct, which is located at an offset of 0x40 (or 64) bytes from the address of s.

0x4005ce <+56>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005d2 <+60>:  mov    -0x14(%rbp),%edx      # copy ag to %edx
0x4005d5 <+63>:  mov    %edx,0x40(%rax)       # copy ag to %rax+0x40 (s->age)
  • Lastly, the g parameter value is copied to the s→gpa field (byte offset 72 or 0x48) of the struct. Notice the use of the %xmm0 register since the data contained at location %rbp-0x1c is single precision floating point:

0x4005d8 <+66>:  mov    -0x8(%rbp),%rax       # copy s to %rax
0x4005dc <+70>:  movss  -0x1c(%rbp),%xmm0     # copy g to %xmm0
0x4005e1 <+75>:  movss  %xmm0,0x48(%rax)      # copy g to %rax+0x48

7.9.1. Data Alignment and Structs

Consider the following modified declaration of our studentT struct:

struct studentTM {
    char name[63]; //updated to 63 instead of 64
    int  age;
    int  grad_yr;
    float gpa;
};

struct studentTM student2;

The size of the name field of the struct is modified to be 63 bytes, instead of the original 64 bytes. Consider how this affects the way the struct is laid out in memory. It may be tempting to visualize it like so:

struct2wrong
Figure 2. An incorrect memory layout for the updated struct studentTM. Note that the struct’s "name" field is reduced from 64 to 63 bytes.

In this depiction, the age field occurs in the byte immediately following the name field. But this is incorrect. Figure 3 depicts the actual layout of the struct in memory:

struct2right
Figure 3. The correct memory layout for the updated struct studentTM. Byte x63 is added by the compiler to satisfy memory alignment constraints, but it doesn’t correspond to any of the struct’s fields.

x86_64’s alignment policy requires that 2-byte data types (i.e, short) reside at a 2-byte aligned address, 4-byte data types (i.e. int, float, and unsigned) reside at 4-byte aligned addresses, while larger data types (long, double and pointer data) to reside at 8-byte aligned addresses. For structs, the compiler adds empty bytes as "padding" between fields to ensure that each field satisfies its alignment requirements. For example, in the struct declared above, the compiler adds a byte of padding at byte x63 to ensure that the age field starts at an address that is at a multiple of 4. Values aligned properly in memory can be read or written in a single operation, enabling greater efficiency.

Consider what happens when the struct is defined as the following:

struct studentTM {
    int  age;
    int  grad_yr;
    float gpa;
    char name[63];
};

struct studentTM student3;

Moving the name array to the end of the struct ensures that age, grad_yr and gpa are 4-byte aligned. Most compilers will remove the filler byte at the end of the struct. However, if the struct is ever used in the context of an array (e.g., struct studentTM courseSection[20];) the compiler will once again add the filler byte as padding between each struct in the array to ensure that alignment requirements are properly met.