Dive Into Systems

2.7. C Structs

In the previous chapter we introduced C struct types. In this chapter we dive deeper into C structs, examine statically and dynamically allocated structs, and combine structs and pointers to create more complex data types and data structures.

We begin with a quick overview of statically declared structs. See the previous chapter for more details.

2.7.1. Review of the C struct Type

A struct type represents a heterogeneous collection of data; it’s a mechanism for treating a set of different types as a single, coherent unit.

There are three steps to defining and using struct types in C programs:

Define a struct type that defines the field values and their types.
Declare variables of the struct type.
Use dot notation to access individual field values in the variable.

In C, structs are lvalues (they can appear on the left-hand side of an assignment statement). The value of a struct variable is the contents of its memory (all of the bytes making up its field values). When calling functions with struct parameters, the value of the struct argument (a copy of all of the bytes of all of its fields) gets copied to the struct function parameter.

When programming with structs, and in particular when combining structs and arrays, it’s critical to carefully consider the type of every expression. Each field in a struct represents a specific type, and the syntax for accessing field values and the semantics of passing individual field values to functions follow those of their specific type.

The following full example program demonstrates defining a struct type, declaring variables of that type, accessing field values, and passing structs and individual field values to functions. (We omit some error handling and comments for readability).

#include <stdio.h>
#include <string.h>

/* define a new struct type (outside function bodies) */
struct studentT {
    char  name[64];
    int   age;
    float gpa;
    int   grad_yr;
};

/* function prototypes */
int checkID(struct studentT s1, int min_age);
void changeName(char *old, char *new);

int main(void) {
    int can_vote;
    // declare variables of struct type:
    struct studentT student1, student2;

    // access field values using .
    strcpy(student1.name, "Ruth");
    student1.age = 17;
    student1.gpa = 3.5;
    student1.grad_yr = 2021;

    // structs are lvalues
    student2 = student1;
    strcpy(student2.name, "Frances");
    student2.age = student1.age + 4;

    // passing a struct
    can_vote = checkID(student1, 18);
    printf("%s %d\n", student1.name, can_vote);

    can_vote = checkID(student2, 18);
    printf("%s %d\n", student2.name, can_vote);

    // passing a struct field value
    changeName(student2.name, "Kwame");
    printf("student 2's name is now %s\n", student2.name);

    return 0;
}

int checkID(struct studentT s, int min_age) {
    int ret = 1;

    if (s.age < min_age) {
        ret = 0;
        // changes age field IN PARAMETER COPY ONLY
        s.age = min_age + 1;
    }
    return ret;
}

void changeName(char *old, char *new) {
    if ((old == NULL) || (new == NULL)) {
        return;
    }
    strcpy(old,new);
}

When run, the program produces:

Ruth 0
Frances 1
student 2's name is now Kwame

When working with structs, it’s particularly important to think about the types of the struct and its fields. For example, when passing a struct to a function, the parameter gets a copy of the struct’s value (a copy of all bytes from the argument). Consequently, changes to the parameter’s field values do not change the argument’s value. This behavior is illustrated in the preceding program in the call to checkID, which modifies the parameter’s age field. The changes in checkID have no effect on the corresponding argument’s age field value.

When passing a field of a struct to a function, the semantics match the type of the field (the type of the function’s parameter). For example, in the call to changeName, the value of the name field (the base address of the name array inside the student2 struct) gets copied to the parameter old, meaning that the parameter refers to the same set of array elements in memory as its argument. Thus, changing an element of the array in the function also changes the element’s value in the argument; the semantics of passing the name field match the type of the name field.

2.7.2. Pointers and Structs

Just like other C types, programmers can declare a variable as a pointer to a user-defined struct type. The semantics of using a struct pointer variable resemble those of other pointer types such as int *.

Consider the struct studentT type introduced in the previous program example:

struct studentT {
    char  name[64];
    int   age;
    float gpa;
    int   grad_yr;
};

A programmer can declare variables of type struct studentT or struct studentT * (a pointer to a struct studentT):

struct studentT s;
struct studentT *sptr;

// think very carefully about the type of each field when
// accessing it (name is an array of char, age is an int ...)
strcpy(s.name, "Freya");
s.age = 18;
s.gpa = 4.0;
s.grad_yr = 2020;

// malloc space for a struct studentT for sptr to point to:
sptr = malloc(sizeof(struct studentT));
if (sptr == NULL) {
    printf("Error: malloc failed\n");
    exit(1);
}

Note that the call to malloc initializes sptr to point to a dynamically allocated struct in heap memory. Using the sizeof operator to compute malloc’s size request (e.g., `sizeof(struct studentT)) ensures that malloc allocates space for all of the field values in the struct.

To access individual fields in a pointer to a struct, the pointer variable first needs to be dereferenced. Based on the rules for pointer dereferencing, you may be tempted to access struct fields like so:

// the grad_yr field of what sptr points to gets 2021:
(*sptr).grad_yr = 2021;

// the age field of what sptr points to gets s.age plus 1:
(*sptr).age = s.age + 1;

However, because pointers to structs are so commonly used, C provides a special operator (→) that both dereferences a struct and accesses one of its field values. For example, sptr→year is equivalent to (*sptr).year. Here are some examples of accessing field values using this notation:

// the gpa field of what sptr points to gets 3.5:
sptr->gpa = 3.5;

// the name field of what sptr points to is a char *
// (can use strcpy to init its value):
strcpy(sptr->name, "Lars");

Figure 1 sketches what the variables s and sptr may look like in memory after the code above executes. Recall that malloc allocates memory from the heap, and local variables are allocated on the stack.

All the fields of struct s (Freya) are stored on the stack. The sptr pointer on the stack stores the heap address of another student struct (Lars).

Figure 1. The differences in memory layout between a statically allocated struct (data on the stack) and a dynamically allocated struct (data on the heap).

2.7.3. Pointer Fields in Structs

Structs can also be defined to have pointer types as field values. For example:

struct personT {
    char *name;     // for a dynamically allocated string field
    int  age;
};

int main(void) {
    struct personT p1, *p2;

    // need to malloc space for the name field:
    p1.name = malloc(sizeof(char) * 8);
    strcpy(p1.name, "Zhichen");
    p1.age = 22;


    // first malloc space for the struct:
    p2 = malloc(sizeof(struct personT));

    // then malloc space for the name field:
    p2->name = malloc(sizeof(char) * 4);
    strcpy(p2->name, "Vic");
    p2->age = 19;
    ...

    // Note: for strings, we must allocate one extra byte to hold the
    // terminating null character that marks the end of the string.
}

In memory, these variables will look like Figure 2 (note which parts are allocated on the stack and which are on the heap).

Example struct with a pointer field type

Figure 2. The layout in memory of a struct with a pointer field.

As structs and the types of their fields increase in complexity, be careful with their syntax. To access field values appropriately, start from the outermost variable type and use its type syntax to access individual parts. For example, the types of the struct variables shown in Table 1 govern how a programmer should access their fields.

Table 1. Struct field access examples
Expression	Type	Field Access Syntax
p1	struct personT	p1.age, p1.name
p2	struct personT *	p2->age, p2->name

Further, knowing the types of field values allows a program to use the correct syntax in accessing them, as shown by the examples in Table 2.

Table 2. Accessing different struct field types
Expression	Type	Example Access Syntax
p1.age	int	p1.age = 18;
p2->age	int	p2->age = 18;
p1.name	char *	printf("%s", p1.name);
p2->name	char *	printf("%s", p2->name);
p1.name[2]	char	p1.name[2] = 'a';
p2->name[2]	char	p2->name[2] = 'a';

In examining the last example, start by considering the type of the outermost variable (p2 is a pointer to a struct personT). Therefore, to access a field value in the struct, the programmer needs to use → syntax (p2→name). Next, consider the type of the name field, which is a char *, used in this program to point to an array of char values. To access a specific char storage location through the name field, use array indexing notation: p2→name[2] = 'a'.

2.7.4. Arrays of Structs

Arrays, pointers, and structs can be combined to create more complex data structures. Here are some examples of declaring variables of different types of arrays of structs:

struct studentT classroom1[40];   // an array of 40 struct studentT

struct studentT *classroom2;      // a pointer to a struct studentT
                                  // (for a dynamically allocated array)

struct studentT *classroom3[40];  // an array of 40 struct studentT *
                                  // (each element stores a (struct studentT *)

Again, thinking very carefully about variable and field types is necessary for understanding the syntax and semantics of using these variables in a program. Here are some examples of the correct syntax for accessing these variables:

// classroom1 is an array:
//    use indexing to access a particular element
//    each element in classroom1 stores a struct studentT:
//    use dot notation to access fields
classroom1[3].age = 21;

// classroom2 is a pointer to a struct studentT
//    call malloc to dynamically allocate an array
//    of 15 studentT structs for it to point to:
classroom2 = malloc(sizeof(struct studentT) * 15);

// each element in array pointed to by classroom2 is a studentT struct
//    use [] notation to access an element of the array, and dot notation
//    to access a particular field value of the struct at that index:
classroom2[3].year = 2013;

// classroom3 is an array of struct studentT *
//    use [] notation to access a particular element
//    call malloc to dynamically allocate a struct for it to point to
classroom3[5] = malloc(sizeof(struct studentT));

// access fields of the struct using -> notation
// set the age field pointed to in element 5 of the classroom3 array to 21
classroom3[5]->age = 21;

A function that takes an array of type struct studentT as a parameter might look like this:

void updateAges(struct studentT *classroom, int size) {
    int i;

    for (i = 0; i < size; i++) {
        classroom[i].age += 1;
    }
}

A program could pass this function either a statically or dynamically allocated array of struct studentT:

updateAges(classroom1, 40);
updateAges(classroom2, 15);

The semantics of passing classroom1 (or classroom2) to updateAges match the semantics of passing a statically declared (or dynamically allocated) array to a function: the parameter refers to the same set of elements as the argument, and thus changes to the array’s values within the function affect the argument’s elements.

Figure 3 shows what the stack might look like for the second call to the updateAges function (showing the passed classroom2 array with example field values for the struct in each of its elements).

Main’s classroom2 variable points to an array of studentT structs on the heap. When classroom2 gets passed to updateAges, it makes a copy of the pointer, yielding another pointer that points to the same heap array.

Figure 3. The memory layout of an array of struct studentT passed to a function.

As always, the parameter gets a copy of the value of its argument (the memory address of the array in heap memory). Thus, modifying the array’s elements in the function will persist to its argument’s values (both the parameter and the argument refer to the same array in memory).

The updateAges function cannot be passed the classroom3 array because its type is not the same as the parameter’s type: classroom3 is an array of struct studentT *, not an array of struct studentT.

2.7.5. Self-Referential Structs

A struct can be defined with fields whose type is a pointer to the same struct type. These self-referential struct types can be used to build linked implementations of data structures, such as linked lists, trees, and graphs.

The details of these data types and their linked implementations are beyond the scope of this book. However, we briefly show one example of how to define and use a self-referential struct type to create a linked list in C. Refer to a textbook on data structures and algorithms for more information about linked lists.

A linked list is one way to implement a list abstract data type. A list represents a sequence of elements that are ordered by their position in the list. In C, a list data structure could be implemented as an array or as a linked list using a self-referential struct type for storing individual nodes in the list.

To build the latter, a programmer would define a node struct to contain one list element and a link to the next node in the list. Here’s an example that could store a linked list of integer values:

struct node {
    int data;           // used to store a list element's data value
    struct node *next;  // used to point to the next node in the list
};

Instances of this struct type can be linked together through the next field to create a linked list.

This example code snippet creates a linked list containing three elements (the list itself is referred to by the head variable that points to the first node in the list):

struct node *head, *temp;
int i;

head = NULL;  // an empty linked list

head = malloc(sizeof(struct node));  // allocate a node
if (head == NULL) {
    printf("Error malloc\n");
    exit(1);
}
head->data = 10;    // set the data field
head->next = NULL;  // set next to NULL (there is no next element)

// add 2 more nodes to the head of the list:
for (i = 0; i < 2; i++) {
    temp = malloc(sizeof(struct node));  // allocate a node
    if (temp == NULL) {
        printf("Error malloc\n");
        exit(1);
    }
    temp->data = i;     // set data field
    temp->next = head;  // set next to point to current first node
    head = temp;        // change head to point to newly added node
}

Note that the temp variable temporarily points to a malloc’ed node that gets initialized and then added to the beginning of the list by setting its next field to point to the node currently pointed to by head, and then by changing the head to point to this new node.

The result of executing this code would look like Figure 4 in memory.

Two stack variables, head and temp, contain the address of the first node on the heap. The first node’s next field points to the second node, whose next field points to the third. The third node’s next pointer is null, indicating the end of the list.

Figure 4. The layout in memory of three example linked list nodes.