8.9. structs in Assembly
A struct is another way to create a
collection of data types in C. Unlike arrays, structs enable different data
types to be grouped together. C stores a struct
like a single-dimension
array, where the data elements (fields) are stored contiguously.
Let’s revisit struct studentT
from Chapter 1:
struct studentT {
char name[64];
int age;
int grad_yr;
float gpa;
};
struct studentT student;
Figure 1 shows how student
is laid out in memory. For the
sake of example, assume that student
starts at address x0. Each xi
denotes the address of a field.
The fields are stored contiguously next to one another in memory in the order in
which they are declared. In Figure 1, the age
field is allocated at
the memory location directly after the name
field (at byte offset x64) and
is followed by the grad_yr
(byte offset x68) and gpa
(byte offset x72)
fields. This organization enables memory-efficient access to the fields.
To understand how the compiler generates assembly code to work a struct
,
consider the function initStudent
:
void initStudent(struct studentT *s, char *nm, int ag, int gr, float g) {
strncpy(s->name, nm, 64);
s->grad_yr = gr;
s->age = ag;
s->gpa = g;
}
The initStudent
function uses the base address of a struct studentT
as its
first parameter, and the desired values for each field as its remaining
parameters. The listing that follows depicts this function in assembly.
In general, parameter i to function initStudent
is located at stack
address (ebp+8)
+ 4 × i.
<initStudent>: <+0>: push %ebp # save ebp <+1>: mov %esp,%ebp # update ebp (new stack frame) <+3>: sub $0x18,%esp # add 24 bytes to stack frame <+6>: mov 0x8(%ebp),%eax # copy first parameter (s) to eax <+9>: mov 0xc(%ebp),%edx # copy second parameter (nm) to edx <+12> mov $0x40,0x8(%esp) # copy 0x40 (or 64) to esp+8 <+16>: mov %edx,0x4(%esp) # copy nm to esp+4 <+20>: mov %eax,(%esp) # copy s to top of stack (esp) <+23>: call 0x8048320 <strncpy@plt> # call strncpy(s->name, nm, 64) <+28>: mov 0x8(%ebp),%eax # copy s to eax <+32>: mov 0x14(%ebp),%edx # copy fourth parameter (gr) to edx <+35>: mov %edx,0x44(%eax) # copy gr to offset eax+68 (s->grad_yr) <+38>: mov 0x8(%ebp),%eax # copy s to eax <+41>: mov 0x10(%ebp),%edx # copy third parameter (ag) to edx <+44>: mov %edx,0x40(%eax) # copy ag to offset eax+64 (s->age) <+47>: mov 0x8(%ebp),%edx # copy s to edx <+50>: mov 0x18(%ebp),%eax # copy g to eax <+53>: mov %eax,0x48(%edx) # copy g to offset edx+72 (s->gpa) <+56>: leave # prepare to leave the function <+57>: ret # return
Being mindful of the byte offsets of each field is key to understanding this code. Here are a few things to keep in mind.
-
The
strncpy
call takes the base address of thename
field ofs
, the address of arraynm
, and a length specifier as its three arguments. Recall that becausename
is the first field instruct studentT
, the address ofs
is synonymous with the address ofs→name
.
<+6>: mov 0x8(%ebp),%eax # copy first parameter (s) to eax <+9>: mov 0xc(%ebp),%edx # copy second parameter (nm) to edx <+12> mov $0x40,0x8(%esp) # copy 0x40 (or 64) to esp+8 <+16>: mov %edx,0x4(%esp) # copy nm to esp+4 <+20>: mov %eax,(%esp) # copy s to top of stack (esp) <+23>: call 0x8048320 <strncpy@plt> # call strncpy(s->name, nm, 64)
-
The next part of the code (instructions
<initStudent+28>
thru<initStudent+35>
) places the value of thegr
parameter at an offset of 68 from the start ofs
. Revisiting the memory layout in Figure 1 shows that this address corresponds tos→grad_yr
.
<+28>: mov 0x8(%ebp),%eax # copy s to eax <+32>: mov 0x14(%ebp),%edx # copy fourth parameter (gr) to edx <+35>: mov %edx,0x44(%eax) # copy gr to offset eax+68 (s->grad_yr)
-
The next section of code (instructions
<initStudent+38>
thru<initStudent+53>
) copies theag
parameter to thes→age
field. Afterward, theg
parameter value is copied to thes→gpa
field (byte offset 72):
<+38>: mov 0x8(%ebp),%eax # copy s to eax <+41>: mov 0x10(%ebp),%edx # copy third parameter (ag) to edx <+44>: mov %edx,0x40(%eax) # copy ag to offset eax+64 (s->age) <+47>: mov 0x8(%ebp),%edx # copy s to edx <+50>: mov 0x18(%ebp),%eax # copy g to eax <+53>: mov %eax,0x48(%edx) # copy g to offset edx+72 (s->gpa)
8.9.1. Data Alignment and structs
Consider the following modified declaration of struct studentT
:
struct studentTM {
char name[63]; //updated to 63 instead of 64
int age;
int grad_yr;
float gpa;
};
struct studentTM student2;
The size of the name
field is modified to be 63 bytes, instead of
the original 64. Consider how this affects the way the struct
is
laid out in memory. It may be tempting to visualize it as in Figure 2.
In this depiction, the age
field occupies the byte immediately following the
name
field. But this is incorrect. Figure 3 depicts the actual
layout in memory.
IA32’s alignment policy requires that two-byte data types (i.e., short
) reside
at a two-byte-aligned address whereas four-byte data types (int
, float
,
long
, and pointer types) reside at four-byte-aligned addresses, and
eight-byte data types (double
, long long
) reside at eight-byte-aligned
addresses. For a struct
, the compiler adds empty bytes as padding
between fields to ensure that each field satisfies its alignment requirements.
For example, in the struct
declared in the previous code snippet, the compiler adds a byte of empty
space (or padding) at byte x63 to ensure that the age
field starts at an
address that is at a multiple of four. Values aligned properly in memory can
be read or written in a single operation, enabling greater efficiency.
Consider what happens when a struct
is defined as follows:
struct studentTM {
int age;
int grad_yr;
float gpa;
char name[63];
};
struct studentTM student3;
Moving the name
array to the end ensures that age
, grad_yr
,
and gpa
are four-byte aligned. Most compilers will remove the
filler byte at the end of the struct
. However, if the struct
is ever
used in the context of an array (e.g., struct studentTM courseSection[20];
)
the compiler will once again add the filler byte as padding between
each struct
in the array to ensure that alignment requirements are
properly met.