2.6. Strings and the String Library

In the previous chapter we introduced arrays and strings in C. In this chapter we discuss dynamically allocated C strings and their use with the C string library. We first give a brief overview of statically declared strings.

2.6.1. C’s Support for Statically Allocated Strings (Arrays of char)

C does not support a separate string type, but a string can be implemented in C programs using an array of char values that is terminated by a special null character value '\0'. The terminating null character identifies the end of the sequence of character values that make up a string. Not every character array is a C string, but every C string is an array of char values.

Because strings frequently appear in programs, C provides libraries with functions for manipulating strings. Programs that use the C string library need to include string.h. Most string library functions require the programmer to allocate space for the array of characters that the functions manipulate. When printing out the value of a string, use the %s placeholder.

Here’s an example program that uses strings and some string library functions:

#include <stdio.h>
#include <string.h>   // include the C string library

int main(void) {
    char str1[10];
    char str2[10];

    str1[0] = 'h';
    str1[1] = 'i';
    str1[2] = '\0';   // explicitly add null terminating character to end

    // strcpy copies the bytes from the source parameter (str1) to the
    // destination parameter (str2) and null terminates the copy.
    strcpy(str2, str1);
    str2[1] = 'o';
    printf("%s %s\n", str1, str2);  // prints: hi ho

    return 0;
}

2.6.2. Dynamically Allocating Strings

Arrays of characters can be dynamically allocated (as discussed in the Pointers and Arrays sections). When dynamically allocating space to store a string, it’s important to remember to allocate space in the array for the terminating '\0' character at the end of the string.

The following example program demonstrates static and dynamically allocated strings (note the value passed to malloc):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    int size;
    char str[64];         // statically allocated
    char *new_str = NULL; // for dynamically allocated

    strcpy(str, "Hello");
    size = strlen(str);   // returns 5

    new_str = malloc(sizeof(char) * (size+1)); // need space for '\0'
    if(new_str == NULL) {
        printf("Error: malloc failed!  exiting.\n");
        exit(1);
    }
    strcpy(new_str, str);
    printf("%s %s\n", str, new_str);    // prints "Hello Hello"

    strcat(str, " There");  // concatenate " There" to the end of str
    printf("%s\n", str);    // prints "Hello There"

    free(new_str);  // free malloc'ed space when done
    new_str = NULL;

    return 0;
}
C String Functions and Destination Memory

Many C string functions (notably strcpy and strcat) store their results by following a destination string pointer (char *) parameter and writing to the location it points to. Such functions assume that the destination contains enough memory to store the result. Thus, as a programmer, you must ensure that sufficient memory is available at the destination prior to calling these functions.

Failure to allocate enough memory will yield undefined results that range from program crashes to major security vulnerabilities. For example, the following calls to strcpy and strcat demonstrate mistakes that novice C programmers often make:

// Attempt to write a 12-byte string into a 5-character array.
char mystr[5];
strcpy(mystr, "hello world");

// Attempt to write to a string with a NULL destination.
char *mystr = NULL;
strcpy(mystr, "try again");

// Attempt to modify a read-only string literal.
char *mystr = "string literal value";
strcat(mystr, "string literals aren't writable");

2.6.3. Libraries for Manipulating C Strings and Characters

C provides several libraries with functions for manipulating strings and characters. The string library (string.h) is particularly useful when writing programs that use C strings. The stdlib.h and stdio.h libraries also contain functions for string manipulation, and the ctype.h library contains functions for manipulating individual character values.

When using C string library functions, it’s important to remember that most do not allocate space for the strings they manipulate, nor do they check that you pass in valid strings; your program must allocate space for the strings that the C string library will use. Furthermore, if the library function modifies the passed string, the caller needs to ensure that the string is correctly formatted (that is, it has a terminating \0 character at the end). Calling string library functions with bad array argument values will often cause a program to crash. The documentation (for example, manual pages) for different library functions specifies whether the library function allocates space or if the caller is responsible for passing in allocated space to the library function.

char[] and char * Parameters and char * Return Type

Both statically declared and dynamically allocated arrays of characters can be passed to a char * parameter because the name of either type of variable evaluates to the base address of the array in memory. Declaring the parameter as type char [] will also work for both statically and dynamically allocated argument values, but char * is more commonly used for specifying the type of string (array of char) parameters.

If a function returns a string (its return type is a char *), its return value can only be assigned to a variable whose type is also char *; it cannot be assigned to a statically allocated array variable. This restriction exists because the name of a statically declared array variable is not a valid lvalue (its base address in memory cannot be changed), so it cannot be assigned a char * return value.

strlen, strcpy, strncpy

The string library provides functions for copying strings and finding the length of a string:

// returns the number of characters in the string (not including the null character)
int strlen(char *s);

// copies string src to string dst up until the first '\0' character in src
// (the caller needs to make sure src is initialized correctly and
// dst has enough space to store a copy of the src string)
// returns the address of the dst string
char *strcpy(char *dst, char *src);

// like strcpy but copies up to the first '\0' or size characters
// (this provides some safety to not copy beyond the bounds of the dst
// array if the src string is not well formed or is longer than the
// space available in the dst array); size_t is an unsigned integer type
char *strncpy(char *dst, char *src, size_t size);

The strcpy function is unsafe to use in situations when the source string might be longer than the total capacity of the destination string. In this case, one should use strncpy. The size parameter stops strncpy from copying more than size characters from the src string into the dst string. When the length of the src string is greater than or equal to size, strncpy copies the first size characters from src to dst and does not add a null character to the end of the dst. As a result, the programmer should explicitly add a null character to the end of dst after calling strncpy.

Here are some example uses of these functions in a program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>   // include the string library

int main(void) {
    // variable declarations that will be used in examples
    int len, i, ret;
    char str[32];
    char *d_str, *ptr;

    strcpy(str, "Hello There");
    len = strlen(str);  // len is 11

    d_str = malloc(sizeof(char) * (len+1));
    if (d_str == NULL) {
        printf("Error: malloc failed\n");
        exit(1);
    }

    strncpy(d_str, str, 5);
    d_str[5] = '\0';   // explicitly add null terminating character to end

    printf("%d:%s\n", strlen(str), str);      // prints 11:Hello There
    printf("%d:%s\n", strlen(d_str), d_str);  // prints 5:Hello

    free(d_str);

    return 0;
}
strlcpy

The strlcpy function is similar to strncpy, except it always adds the '\0' character to the end of the destination string. Always terminating the string makes it a safer alternative to strncpy because it doesn’t require the programmer to remember to explicitly null terminate the string.

// like strncpy but copies up to the first '\0' or size-1 characters
// and null terminates the dest string (if size > 0).
char *strlcpy(char *dest, char *src, size_t size);

Linux’s GNU C library added strlcpy in a recent version (2.38). It’s currently only available on some systems, but its availability will increase as newer versions of the C library become more widespread. We recommend using strlcpy when it is available.

On systems where strlcpy is available, the following call to strncpy from the example above:

  // copy up to 5 chars from str to d_str
  strncpy(d_str, str, 5);
  d_str[5] = '\0';   // explicitly add null terminating character to end

could be replaced with this call to strlcpy:

  // copy up to 5 chars from str to d_str
  strlcpy(d_str, str, 6);  // strlcpy always adds '\0' to the end

strcmp, strncmp

The string library also provides a function to compare two strings. Comparing string variables using the == operator does not compare the characters in the strings — it compares only the base addresses of the two strings. For example, the expression:

if (d_str == str) { ...

compares the base address of the char array in the heap pointed to by d_str to the base address of the str char array allocated on the stack.

To compare the values of the strings, a programmer needs to either write code by hand to compare corresponding element values, or use the strcmp or strncmp functions from the string library:

int strcmp(char *s1, char *s2);
// returns 0 if s1 and s2 are the same strings
// a value < 0 if s1 is less than s2
// a value > 0 if s1 is greater than s2

int strncmp(char *s1, char *s2, size_t n);
// compare s1 and s2 up to at most n characters

The strcmp function compares strings character by character based on their ASCII representation. In other words, it compares the char values in corresponding positions of the two parameter arrays to produce the result of the string comparison, which occasionally yields unintuitive results. For example, the ASCII encoding for the char value 'a' is larger than the encoding for the char value 'Z'. As a result, strcmp("aaa", "Zoo") returns a positive value indicating that "aaa" is greater than "Zoo", and a call to strcmp("aaa", "zoo") returns a negative value indicating that "aaa" is less than "zoo".

Here are some string comparison examples:

strcpy(str, "alligator");
strcpy(d_str, "Zebra");

ret =  strcmp(str,d_str);
if (ret == 0) {
    printf("%s is equal to %s\n", str, d_str);
} else if (ret < 0) {
    printf("%s is less than %s\n", str, d_str);
} else {
    printf("%s is greater than %s\n", str, d_str);  // true for these strings
}

ret = strncmp(str, "all", 3);  // returns 0: they are equal up to first 3 chars

strcat, strstr, strchr

String library functions can concatenate strings (note that it’s up to the caller to ensure that the destination string has enough space to store the result):

// append chars from src to end of dst
// returns ptr to dst and adds '\0' to end
char *strcat(char *dst, char *src)

// append the first chars from src to end of dst, up to a maximum of size
// returns ptr to dst and adds '\0' to end
char *strncat(char *dst, char *src, size_t size);

It also provides functions for finding substrings or character values in strings:

// locate a substring inside a string
// (const means that the function doesn't modify string)
// returns a pointer to the beginning of substr in string
// returns NULL if substr not in string
char *strstr(const char *string, char *substr);

// locate a character (c) in the passed string (s)
// (const means that the function doesn't modify s)
// returns a pointer to the first occurrence of the char c in string
// or NULL if c is not in the string
char *strchr(const char *s, int c);

Here are some examples using these functions (we omit some error handling for the sake of readability):

char str[32];
char *ptr;

strcpy(str, "Zebra fish");
strcat(str, " stripes");  // str gets "Zebra fish stripes"
printf("%s\n", str);     // prints: Zebra fish stripes

strncat(str, " are black.", 8);
printf("%s\n", str);     // prints: Zebra fish stripes are bla  (spaces count)

ptr = strstr(str, "trip");
if (ptr != NULL) {
    printf("%s\n", ptr);   // prints: tripes are bla
}

ptr = strchr(str, 'e');
if (ptr != NULL) {
    printf("%s\n", ptr);   // prints: ebra fish stripes are bla
}

Calls to strchr and strstr return the address of the first element in the parameter array with a matching character value or a matching substring value, respectively. This element address is the start of an array of char values terminated by a \0 character. In other words, ptr points to the beginning of a substring inside another string. When printing the value of ptr as a string with printf, the character values starting at the index pointed to by ptr are printed, yielding the results listed above.

strtok, strtok_r

The string library also provides functions that divide a string into tokens. A token refers to a subsequence of characters in a string separated by any number of delimiter characters of the programmer’s choosing.

char *strtok(char *str, const char *delim);

// a reentrant version of strtok (reentrant is defined in later chapters):
char *strtok_r(char *str, const char *delim, char **saveptr);

The strtok (or strtok_r) functions find individual tokens within a larger string. For example, setting strtok's delimiters to the set of whitespace characters yields words in a string that originally contains an English sentence. That is, each word in the sentence is a token in the string.

Below is an example program that uses strtok to find individual words as the tokens in an input string. (it can also be copied from here: strtokexample.c).

/*
 * Extract whitespace-delimited tokens from a line of input
 * and print them one per line.
 *
 * to compile:
 *   gcc -g -Wall strtokexample.c
 *
 * example run:
 *   Enter a line of text:        aaaaa             bbbbbbbbb          cccccc
 *
 *   The input line is:
 *         aaaaa             bbbbbbbbb          cccccc
 *   Next token is aaaaa
 *   Next token is bbbbbbbbb
 *   Next token is cccccc
 */

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void) {
     /* whitespace stores the delim string passed to strtok.  The delim
      * string  is initialized to the set of characters that delimit tokens
      * We initialize the delim string to the following set of chars:
      *   ' ': space  '\t': tab  '\f': form feed  '\r': carriage return
      *   '\v': vertical tab  '\n': new line
      * (run "man ascii" to list all ASCII characters)
      *
      * This line shows one way to statically initialize a string variable
      * (using this method the string contents are constant, meaning that they
      *  cannot be modified, which is fine for the way we are using the
      *  whitespace string in this program).
      */
    char *whitespace = " \t\f\r\v\n";  /* Note the space char at beginning */

    char *token;  /* The next token in the line. */
    char *line;   /* The line of text read in that we will tokenize. */

    /* Allocate some space for the user's string on the heap. */
    line = malloc(200 * sizeof(char));
    if (line == NULL) {
        printf("Error: malloc failed\n");
        exit(1);
    }

    /* Read in a line entered by the user from "standard in". */
    printf("Enter a line of text:\n");
    line = fgets(line, 200 * sizeof(char), stdin);
    if (line == NULL) {
        printf("Error: reading input failed, exiting...\n");
        exit(1);
    }
    printf("The input line is:\n%s\n", line);

    /* Divide the string into tokens. */
    token = strtok(line, whitespace);       /* get the first token */
    while (token != NULL) {
        printf("Next token is %s\n", token);
        token = strtok(NULL, whitespace);     /* get the next token */
    }

    free(line);

    return 0;
}

sprintf

The C stdio library also provides functions that manipulate C strings. Perhaps the most useful is the sprintf function, which "prints" into a string rather than printing output to a terminal:

// like printf(), the format string allows for placeholders like %d, %f, etc.
// pass parameters after the format string to fill them in
int sprintf(char *s, const char *format, ...);

sprintf initializes the contents of a string from values of various types. Its parameter format resembles those of printf and scanf. Here are some examples:

char str[64];
float ave = 76.8;
int num = 2;

// initialize str to format string, filling in each placeholder with
// a char representation of its arguments' values
sprintf(str, "%s is %d years old and in grade %d", "Henry", 12, 7);
printf("%s\n", str);  // prints: Henry is 12 years old and in grade 7

sprintf(str, "The average grade on exam %d is %g", num, ave);
printf("%s\n", str);  // prints: The average grade on exam 2 is 76.8

Functions for Individual Character Values

The standard C library (stdlib.h) contains a set of functions for manipulating and testing individual char values, including:

#include <stdlib.h>   // include stdlib and ctypes to use these
#include <ctype.h>

int islower(ch);
int isupper(ch);       // these functions return a non-zero value if the
int isalpha(ch);       // test is TRUE, otherwise they return 0 (FALSE)
int isdigit(ch);
int isalnum(ch);
int ispunct(ch);
int isspace(ch);
char tolower(ch);     // returns ASCII value of lower-case of argument
char toupper(ch);

Here are some examples of their use:

char str[64];
int len, i;

strcpy(str, "I see 20 ZEBRAS, GOATS, and COWS");

if ( islower(str[2]) ){
    printf("%c is lower case\n", str[2]);   // prints: s is lower case
}

len = strlen(str);
for (i = 0; i < len; i++) {
    if ( isupper(str[i]) ) {
        str[i] = tolower(str[i]);
    } else if( isdigit(str[i]) ) {
        str[i] = 'X';
    }
}
printf("%s\n", str);  // prints: i see XX zebras, goats, and cows

Functions to Convert Strings to Other Types

stdlib.h also contains functions to convert between strings and other C types. For example:

#include <stdlib.h>

int atoi(const char *nptr);     // convert a string to an integer
double atof(const char *nptr);  // convert a string to a float

Here’s an example:

printf("%d %g\n", atoi("1234"), atof("4.56"));

For more information about these and other C library functions (including what they do, their parameter format, what they return, and which headers need to be included to use them), see their man pages. For example, to view the strcpy man page, run:

$ man strcpy