Dive Into Systems

16.5. Arrays and Strings

An array is a C construct that creates an ordered collection of data elements of the same type and associates this collection with a single program variable. Ordered means that each element is in a specific position in the collection of values (that is, there is an element in position 0, position 1, and so on), not that the values are necessarily sorted. Arrays are one of C’s primary mechanisms for grouping multiple data values and referring to them by a single name. Arrays come in several flavors, but the basic form is a one-dimensional array, which is useful for implementing list-like data structures and strings in C. C arrays are most similar to Java’s Array class.

16.5.1. Introduction to Arrays

C arrays can store multiple data values of the same type. In this chapter, we discuss statically declared arrays, meaning that the total capacity (the maximum number of elements that can be stored in an array) is fixed and is defined when the array variable is declared. In the Chapter 2, we discuss dynamically allocated arrays and multi-dimensional arrays.

Table 1 shows Java and C versions of a program that initializes and then prints a collection of integer values. Both the Java and the C versions use an array of int types to store the collection of values.

In general, Java provides a high-level interfaces to the programmer that hide much of the low-level implementation details. C, on the other hand, exposes a low-level array implementation to the programmer and leaves it up to the programmer to implement higher-level functionality. In other words, arrays enable low-level data storage without higher-level list functionality, such as length, compare, binarySearch, and so on. Java also provides several higher-level list abstractions in its List and ArrayList classes, both of which support dynamically resizing of the list of values. In contrast, it is up to the C programmer to implement these types of abstractions on top of its fixed-size arrays.

Java version C version

Table 1. Syntax Comparison of Arrays in Java and C
Java version	C version
/* Example Java program using an Array */ class ArrayExample { public static void main(String[] args) { int i, size = 0; // create and init array of 3 ints int[] small_arr = {1, 3, 5}; // declare and create array of 10 ints int[] nums = new int[10]; // set value of each element for (i = 0; i < 10; i++) { nums[i] = i; size++; } // set value at position 3 to 5 nums[3] = small_arr[2]; // print number of array elements System.out.printf("array size: %d\n", size); // or nums.length // print each element of nums for (i = 0; i < 10; i++) { System.out.printf("%d\n", nums[i]); } } }	`/* Example C program using arrays */ #include <stdio.h> int main(void) { int i, size = 0; // declare and init array of 3 ints int small_arr[] = {1, 3, 5}; // declare array of 10 ints int nums[10]; // set value of each element for (i = 0; i < 10; i++) { nums[i] = i; size++; } // set value at position 3 to 5 nums[3] = small_arr[2]; // print number of array elements printf("array size: %d\n", size); // print each element of nums for (i = 0; i < 10; i++) { printf("%d\n", nums[i]); } return 0; }`

/* Example Java program using an Array */

class ArrayExample {

 public static void  main(String[] args) {

   int i, size = 0;

   // create and init array of 3 ints
   int[] small_arr = {1, 3, 5};

   // declare and create array of 10 ints
   int[] nums = new int[10];

   // set value of each element
   for (i = 0; i < 10; i++) {
      nums[i] = i;
      size++;
   }

   // set value at position 3 to 5
   nums[3] = small_arr[2];

   // print number of array elements
   System.out.printf("array size: %d\n",
        size);  // or nums.length

   // print each element of nums
   for (i = 0; i < 10; i++) {
     System.out.printf("%d\n", nums[i]);
   }

 }
}

/* Example C program using arrays */

#include <stdio.h>

int main(void) {

  int i, size = 0;

  // declare and init array of 3 ints
  int small_arr[] = {1, 3, 5};

  // declare array of 10 ints
  int nums[10];

  // set value of each element
  for (i = 0; i < 10; i++) {
    nums[i] = i;
    size++;
  }

  // set value at position 3 to 5
  nums[3] = small_arr[2];

  // print number of array elements
  printf("array size: %d\n",
         size);

  // print each element of nums
  for (i = 0; i < 10; i++) {
    printf("%d\n", nums[i]);
  }

  return 0;
}

The C and Java versions of this program are almost identical. In particular, the individual elements can be accessed via indexing, and that index values start at 0. That is, both languages refer to the very first element in a collection as the element at position 0.

In both C and Java arrays are fixed-capacity data structures (vs. ones that grow in capacity as more elements are added). The main differences in the C and Java versions of this program relate to how the array type is declared and how space for its capacity is allocated.

In Java, the syntax for an array type is <typename>[] and space for an array of a some capacity is allocated using new <typename>[<capacity>]. For example:

For a Java array:

int[] nums;          // declare nums as an array of int
nums = new int[10];  // create a new int array of capacity 10

In C, array types are declared using <typename> <varname>[<capacity>]. For example:

For a C array:

int nums[10];    // declare nums as an array of capacity 10

When declaring an array variable in C, the programmer must specify its type (the type of each value stored in the array) and its total capacity (the maximum number of storage locations) as part of the definition. For example:

int  arr[10];  // declare an array of 10 ints
char str[20];  // declare an array of 20 chars

The preceding declarations create one variable named arr, an array of int values with a total capacity of 10, and another variable named str, an array of char values with a total capacity of 20.

Both Java and C also allow a programmer to both declare and initialize the elements in the declaration (the small_arr array in both is an array with capacity 3 that stores the int values 1, 3, and 5):

// java version:
int[]  small_arr = {1, 3, 5};

// C version:
int  small_arr[] = {1, 3, 5};

Because arrays are objects in Java, there are a large set of methods of the Array class that can be used to interact with Java arrays beyond simple indexing to get and set values. Some of these include methods to search the array and to create other data structures from the array. C’s array support is limited to creating an ordered collection of elements of the same type, and supporting indexing to access individual array elements. Any higher-level processing on the array must be implemented by a C programmer.

Both Java and C store array values in contiguous memory locations. C dictates the array layout in program memory, whereas Java hides some of the details of this from the programmer. In C, individual array elements are allocated in consecutive locations in the program’s memory. For example, the third array position is located in memory immediately following the second array position and immediately before the fourth array position. The same is true for Java, however often what is stored in a Java array is an object reference and not the object value itself. As a result, although the object references of contiguous array elements are stored contiguously in program memory, the objects to which they refer may not be stored contiguously in memory.

16.5.2. Array Access Methods

Java provides multiple ways to access elements in its arrays. C, however, supports only indexing, as described earlier. Valid index values range from 0 to the capacity of the array minus 1. Here are some examples:

int i, num;
int arr[10];  // declare an array of ints, with a capacity of 10

num = 6;      // keep track of how many elements of arr are used

// initialize first 5 elements of arr (at indices 0-4)
for (i=0; i < 5; i++) {
    arr[i] = i * 2;
}

arr[5] = 100; // assign the element at index 5 the value 100

This example declares the array with a capacity of 10 (it has 10 elements), but it only uses the first six (our current collection of values is size 6, not 10). It’s often the case when using statically declared arrays that some of an array’s capacity will remain unused. As a result, we need another program variable to keep track of the actual size (number of elements) in the array (num in this example).

Java and C differ in their error-handling approaches when a program attempts to access an invalid index. Java throws a java.lang.ArrayIndexOutOfBoundsException exception if an invalid index value is used to access elements in an array. In C, it’s up to the programmer to ensure that their code uses only valid index values when indexing into arrays. As a result, for code like the following that accesses an array element beyond the bounds of the allocated array, the program’s runtime behavior is undefined:

int array[10];    // an array of size 10 has valid indices 0 through 9

array[10] = 100;  // 10 is not a valid index into the array

The C compiler is happy to compile code that accesses array positions beyond the bounds of the array; there is no bounds checking by the compiler or at runtime. As a result, running this code can lead to unexpected program behavior (and the behavior might differ from run to run). It can lead to your program crashing, it can change another variable’s value, or it might have no effect on your program’s behavior. In other words, this situation leads to a program bug that might or might not show up as unexpected program behavior. Thus, as a C programmer, it’s up to you to ensure that your array accesses refer to valid positions!

16.5.3. Arrays and Functions

The semantics of passing arrays to functions in C is similar to that of passing arrays to functions in Java: the function can alter the elements in the passed array. Here’s an example function that takes two parameters, an int array parameter (arr), and an int parameter (size):

void print_array(int arr[], int size) {
    int i;
    for (i = 0; i < size; i++) {
        printf("%d\n", arr[i]);
    }
}

The [] after the parameter name tells the compiler that the type of the parameter arr is array of int, not int like the parameter size. In Chapter 2 we show an alternate syntax for specifying array parameters. The capacity of the array parameter arr isn’t specified: arr[] means that this function can be called with an array argument of any capacity. Because there is no way to get an array’s size or capacity just from the array variable, functions that are passed arrays almost always also have a second parameter that specifies the array’s size (the size parameter in the preceding example).

To call a function that has an array parameter, pass the name of the array as the argument. Here is a C code snippet with example calls to the print_array function:

int some[5], more[10], i;

for (i = 0; i < 5; i++) {  // initialize the first 5 elements of both arrays
    some[i] = i * i;
    more[i] = some[i];
}

for (i = 5; i < 10; i++) { // initialize the last 5 elements of "more" array
    more[i] = more[i-1] + more[i-2];
}

print_array(some, 5);    // prints all 5 values of "some"
print_array(more, 10);   // prints all 10 values of "more"
print_array(more, 8);    // prints just the first 8 values of "more"

In C, the name of the array variable is equivalent to the base address of the array (that is, the memory location of its 0th element). Due to C’s pass by value function call semantics, when you pass an array to a function, each element of the array is not individually passed to the function. In other words, the function isn’t receiving a copy of each array element. Instead, an array parameter gets the value of the array’s base address. This behavior implies that when a function modifies the elements of an array that was passed as a parameter, the changes will persist when the function returns. For example, consider this C program snippet:

void test(int a[], int size) {
    if (size > 3) {
        a[3] = 8;
    }
    size = 2; // changing parameter does NOT change argument
}

int main(void) {
    int arr[5], n = 5, i;

    for (i = 0; i < n; i++) {
        arr[i] = i;
    }

    printf("%d %d", arr[3], n);  // prints: 3 5

    test(arr, n);
    printf("%d %d", arr[3], n);  // prints: 8 5

    return 0;
}

The call in main to the test function is passed the argument arr, whose value is the base address of the arr array in memory. The parameter a in the test function gets a copy of this base address value. In other words, parameter a refers to the same array storage locations as its argument, arr. As a result, when the test function changes a value stored in the a array (a[3] = 8), it affects the corresponding position in the argument array (arr[3] is now 8). The reason is that the value of a is the base address of arr, and the value of arr is the base address of arr, so both a and arr refer to the same array (the same storage locations in memory)! Figure 1 shows the stack contents at the point in the execution just before the test function returns.

A stack with two frames: main at the bottom and test on the top. main has two variables, an integer n (5) and an array storing values 0, 1, 2, 8, and 4. Test also has two values, an integer size (2) and an array parameter arr that stores the base memory address of the array in main’s stack frame.

Figure 1. The stack contents for a function with an array parameter

Parameter a is passed the value of the base address of the array argument arr, which means they both refer to the same set of array storage locations in memory. We indicate this with the arrow from a to arr. Values that get modified by the function test are highlighted. Changing the value of the parameter size does not change the value of its corresponding argument n, but changing the value of one of the elements referred to by a (for example, a[3] = 8) does affect the value of the corresponding position in arr.

16.5.4. Introduction to Strings and the C String Library

Java implements a String class and provides a rich interface for using strings. C does not define a string type. Instead, strings are implemented as arrays of char values. Not every character array is used as a C string, but every C string is a character array.

Recall that arrays in C might be defined with a larger size than a program ultimately uses. For example, we saw earlier in the section "Array Access Methods" that we might declare an array of size 10 but only use the first six positions. This behavior has important implications for strings: we can’t assume that a string’s length is equal to that of the array that stores it. For this reason, strings in C must end with a special character value, the null character ('\0'), to indicate the end of the string.

Strings that end with a null character are said to be null-terminated. Although all strings in C should be null-terminated, failing to properly account for null characters is a common source of errors for novice C programmers. When using strings, it’s important to keep in mind that your character arrays must be declared with enough capacity to store each character value in the string plus the null character ('\0'). For example, to store the string "hi", you need an array of at least three chars (one to store 'h', one to store 'i', and one to store '\0').

Because strings are commonly used, C provides a string library that contains functions for manipulating strings. Programs that use these string library functions need to include the string.h header.

The C string library provides some similar functionality to the Java String class for manipulating string values. However, in C the program is responsible for ensuring that strings passed to the C string library are well-formed (null terminated char arrays) and that passed char arrays have enough capacity for the library function. Java hides these details from the programmer, and thus the programmer does not need to think about these when using strings in their Java program.

When printing the value of a string with printf, use the %s placeholder in the format string. The printf function will print all the characters in the array argument until it encounters the '\0' character. Similarly, string library functions often either locate the end of a string by searching for the '\0' character or add a '\0' character to the end of any string that they modify.

Here’s an example program that uses strings and string library functions:

#include <stdio.h>
#include <string.h>   // include the C string library

int main(void) {
    char str1[10];
    char str2[10];
    int len;

    str1[0] = 'h';
    str1[1] = 'i';
    str1[2] = '\0';

    len = strlen(str1);

    printf("%s %d\n", str1, len);  // prints: hi 2

    strcpy(str2, str1);     // copies the contents of str1 to str2
    printf("%s\n", str2);   // prints:  hi

    strcpy(str2, "hello");  // copy the string "hello" to str2
    len = strlen(str2);
    printf("%s has %d chars\n", str2, len);   // prints: hello has 5 chars
}

The strlen function in the C string library returns the number of characters in its string argument. A string’s terminating null character doesn’t count as part of the string’s length, so the call to strlen(str1) returns 2 (the length of the string "hi"). The strcpy function copies one character at a time from a source string (the second parameter) to a destination string (the first parameter) until it reaches a null character in the source.

Note that most C string library functions expect the call to pass in a character array that has enough capacity for the function to perform its job. For example, you wouldn’t want to call strcpy with a destination string that isn’t large enough to contain the source; doing so will lead to undefined behavior in your program!

C string library functions also require that string values passed to them are correctly formed, with a terminating '\0' character. It’s up to you as the C programmer to ensure that you pass in valid strings for C library functions to manipulate. Thus, in the call to strcpy in the preceding example, if the source string (str1) was not initialized to have a terminating '\0' character, strcpy would continue beyond the end of the str1 array’s bounds, leading to undefined behavior that could cause it to crash.

The previous example uses the strcpy function safely. In general, though, strcpy poses a security risk because it assumes that its destination is large enough to store the entire string, which may not always be the case (for example, if the string comes from user input).

We chose to show strcpy now to simplify the introduction to strings, but we illustrate safer alternatives in Section 2.6.

In Chapter 2, we discuss C strings and the C string library in more detail.