2.6. Strings and the String Library
In the previous chapter we introduced arrays and strings in C. In this chapter we discuss dynamically allocated C strings and their use with the C string library. We first give a brief overview of statically declared strings.
2.6.1. C’s Support for Statically Allocated Strings (Arrays of char)
C does not support a separate string type, but a string can be implemented in C
programs using an array of char
values that is terminated by a special null
character value '\0'
. The terminating null character identifies
the end of the sequence of character values that make up a string. Not every
character array is a C string, but every C string is an array of char
values.
Because strings frequently appear in programs, C provides
libraries with functions for manipulating strings. Programs that use
the C string library need to include string.h
. Most string
library functions require the programmer to allocate space for
the array of characters that the functions manipulate.
When printing out the value of a string, use the %s
placeholder.
Here’s an example program that uses strings and some string library functions:
#include <stdio.h>
#include <string.h> // include the C string library
int main(void) {
char str1[10];
char str2[10];
str1[0] = 'h';
str1[1] = 'i';
str1[2] = '\0'; // explicitly add null terminating character to end
// strcpy copies the bytes from the source parameter (str1) to the
// destination parameter (str2) and null terminates the copy.
strcpy(str2, str1);
str2[1] = 'o';
printf("%s %s\n", str1, str2); // prints: hi ho
return 0;
}
2.6.2. Dynamically Allocating Strings
Arrays of characters can be dynamically allocated (as discussed in the
Pointers and
Arrays sections). When dynamically
allocating space to store a string, it’s important to remember to allocate
space in the array for the terminating '\0'
character at the end of the
string.
The following example program demonstrates static and dynamically
allocated strings (note the value passed to malloc
):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
int size;
char str[64]; // statically allocated
char *new_str = NULL; // for dynamically allocated
strcpy(str, "Hello");
size = strlen(str); // returns 5
new_str = malloc(sizeof(char) * (size+1)); // need space for '\0'
if(new_str == NULL) {
printf("Error: malloc failed! exiting.\n");
exit(1);
}
strcpy(new_str, str);
printf("%s %s\n", str, new_str); // prints "Hello Hello"
strcat(str, " There"); // concatenate " There" to the end of str
printf("%s\n", str); // prints "Hello There"
free(new_str); // free malloc'ed space when done
new_str = NULL;
return 0;
}
C String Functions and Destination Memory
Many C string functions (notably Failure to allocate enough memory will yield undefined results that range from
program crashes to major
security vulnerabilities. For example, the following calls to
|
2.6.3. Libraries for Manipulating C Strings and Characters
C provides several libraries with functions for manipulating strings and
characters. The string library (string.h
) is particularly useful
when writing programs that use C strings. The stdlib.h
and stdio.h
libraries also contain functions for string manipulation, and the ctype.h
library contains functions for manipulating individual character values.
When using C string library functions, it’s important to remember that most do
not allocate space for the strings they manipulate, nor do they check that you
pass in valid strings; your program must allocate space for the strings that
the C string library will use. Furthermore, if the library function modifies
the passed string, the caller needs to ensure that the string is correctly
formatted (that is, it has a terminating \0
character at the end). Calling
string library functions with bad array argument values will often cause a
program to crash. The documentation (for example, manual pages) for different
library functions specifies whether the library function allocates space or if
the caller is responsible for passing in allocated space to the library
function.
char[] and char * Parameters and char * Return TypeBoth statically declared and dynamically allocated arrays of characters can be
passed to a If a function returns a string (its return type is a |
strlen, strcpy, strncpy
The string library provides functions for copying strings and finding the length of a string:
// returns the number of characters in the string (not including the null character)
int strlen(char *s);
// copies string src to string dst up until the first '\0' character in src
// (the caller needs to make sure src is initialized correctly and
// dst has enough space to store a copy of the src string)
// returns the address of the dst string
char *strcpy(char *dst, char *src);
// like strcpy but copies up to the first '\0' or size characters
// (this provides some safety to not copy beyond the bounds of the dst
// array if the src string is not well formed or is longer than the
// space available in the dst array); size_t is an unsigned integer type
char *strncpy(char *dst, char *src, size_t size);
The strcpy
function is unsafe to use in situations when the source string
might be longer than the total capacity of the destination string. In this
case, one should use strncpy
. The size
parameter stops strncpy
from
copying more than size
characters from the src
string into the dst
string. When the length of the src
string is greater than or equal to
size
, strncpy
copies the first size
characters from src
to dst
and does not add a null character to the end of the dst
. As a result,
the programmer should explicitly add a null character to
the end of dst
after calling strncpy
.
Here are some example uses of these functions in a program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h> // include the string library
int main(void) {
// variable declarations that will be used in examples
int len, i, ret;
char str[32];
char *d_str, *ptr;
strcpy(str, "Hello There");
len = strlen(str); // len is 11
d_str = malloc(sizeof(char) * (len+1));
if (d_str == NULL) {
printf("Error: malloc failed\n");
exit(1);
}
strncpy(d_str, str, 5);
d_str[5] = '\0'; // explicitly add null terminating character to end
printf("%d:%s\n", strlen(str), str); // prints 11:Hello There
printf("%d:%s\n", strlen(d_str), d_str); // prints 5:Hello
free(d_str);
return 0;
}
strlcpy The // like strncpy but copies up to the first '\0' or size-1 characters // and null terminates the dest string (if size > 0). char *strlcpy(char *dest, char *src, size_t size); Linux’s GNU C library added On systems where // copy up to 5 chars from str to d_str strncpy(d_str, str, 5); d_str[5] = '\0'; // explicitly add null terminating character to end could be replaced with this call to // copy up to 5 chars from str to d_str strlcpy(d_str, str, 6); // strlcpy always adds '\0' to the end |
strcmp, strncmp
The string library also provides a function to compare two strings.
Comparing string variables using the ==
operator does not compare
the characters in the strings — it compares only the base addresses
of the two strings. For example, the expression:
if (d_str == str) { ...
compares the base address of the char
array in the heap pointed to by
d_str
to the base address of the str
char
array allocated on the
stack.
To compare the values of the strings, a programmer needs to
either write code by hand to compare corresponding element values,
or use the strcmp
or strncmp
functions from the string library:
int strcmp(char *s1, char *s2);
// returns 0 if s1 and s2 are the same strings
// a value < 0 if s1 is less than s2
// a value > 0 if s1 is greater than s2
int strncmp(char *s1, char *s2, size_t n);
// compare s1 and s2 up to at most n characters
The strcmp
function compares strings character by character based on their
ASCII representation. In
other words, it compares the char
values in corresponding positions of the
two parameter arrays to produce the result of the string comparison, which
occasionally yields unintuitive results. For example, the ASCII encoding for
the char
value 'a'
is larger than the encoding for the char
value 'Z'
. As
a result, strcmp("aaa", "Zoo")
returns a positive value indicating that
"aaa"
is greater than "Zoo"
, and a call to strcmp("aaa", "zoo")
returns a
negative value indicating that "aaa"
is less than "zoo"
.
Here are some string comparison examples:
strcpy(str, "alligator");
strcpy(d_str, "Zebra");
ret = strcmp(str,d_str);
if (ret == 0) {
printf("%s is equal to %s\n", str, d_str);
} else if (ret < 0) {
printf("%s is less than %s\n", str, d_str);
} else {
printf("%s is greater than %s\n", str, d_str); // true for these strings
}
ret = strncmp(str, "all", 3); // returns 0: they are equal up to first 3 chars
strcat, strstr, strchr
String library functions can concatenate strings (note that it’s up to the caller to ensure that the destination string has enough space to store the result):
// append chars from src to end of dst
// returns ptr to dst and adds '\0' to end
char *strcat(char *dst, char *src)
// append the first chars from src to end of dst, up to a maximum of size
// returns ptr to dst and adds '\0' to end
char *strncat(char *dst, char *src, size_t size);
It also provides functions for finding substrings or character values in strings:
// locate a substring inside a string
// (const means that the function doesn't modify string)
// returns a pointer to the beginning of substr in string
// returns NULL if substr not in string
char *strstr(const char *string, char *substr);
// locate a character (c) in the passed string (s)
// (const means that the function doesn't modify s)
// returns a pointer to the first occurrence of the char c in string
// or NULL if c is not in the string
char *strchr(const char *s, int c);
Here are some examples using these functions (we omit some error handling for the sake of readability):
char str[32];
char *ptr;
strcpy(str, "Zebra fish");
strcat(str, " stripes"); // str gets "Zebra fish stripes"
printf("%s\n", str); // prints: Zebra fish stripes
strncat(str, " are black.", 8);
printf("%s\n", str); // prints: Zebra fish stripes are bla (spaces count)
ptr = strstr(str, "trip");
if (ptr != NULL) {
printf("%s\n", ptr); // prints: tripes are bla
}
ptr = strchr(str, 'e');
if (ptr != NULL) {
printf("%s\n", ptr); // prints: ebra fish stripes are bla
}
Calls to strchr
and strstr
return the address of the first element in the
parameter array with a matching character value or a matching
substring value, respectively. This element address is the start of an array of char
values terminated by a \0
character. In other words, ptr
points to the
beginning of a substring inside another string. When printing the value of
ptr
as a string with printf
, the character values starting at the index
pointed to by ptr
are printed, yielding the results listed above.
strtok, strtok_r
The string library also provides functions that divide a string into tokens. A token refers to a subsequence of characters in a string separated by any number of delimiter characters of the programmer’s choosing.
char *strtok(char *str, const char *delim);
// a reentrant version of strtok (reentrant is defined in later chapters):
char *strtok_r(char *str, const char *delim, char **saveptr);
The strtok
(or strtok_r
) functions find individual tokens within a larger
string. For example, setting strtok
's delimiters to the set of whitespace
characters yields words in a string that originally contains an English
sentence. That is, each word in the sentence is a token in the string.
Below is an example program that uses strtok
to find individual words as the
tokens in an input string. (it can also be copied from here:
strtokexample.c).
/*
* Extract whitespace-delimited tokens from a line of input
* and print them one per line.
*
* to compile:
* gcc -g -Wall strtokexample.c
*
* example run:
* Enter a line of text: aaaaa bbbbbbbbb cccccc
*
* The input line is:
* aaaaa bbbbbbbbb cccccc
* Next token is aaaaa
* Next token is bbbbbbbbb
* Next token is cccccc
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void) {
/* whitespace stores the delim string passed to strtok. The delim
* string is initialized to the set of characters that delimit tokens
* We initialize the delim string to the following set of chars:
* ' ': space '\t': tab '\f': form feed '\r': carriage return
* '\v': vertical tab '\n': new line
* (run "man ascii" to list all ASCII characters)
*
* This line shows one way to statically initialize a string variable
* (using this method the string contents are constant, meaning that they
* cannot be modified, which is fine for the way we are using the
* whitespace string in this program).
*/
char *whitespace = " \t\f\r\v\n"; /* Note the space char at beginning */
char *token; /* The next token in the line. */
char *line; /* The line of text read in that we will tokenize. */
/* Allocate some space for the user's string on the heap. */
line = malloc(200 * sizeof(char));
if (line == NULL) {
printf("Error: malloc failed\n");
exit(1);
}
/* Read in a line entered by the user from "standard in". */
printf("Enter a line of text:\n");
line = fgets(line, 200 * sizeof(char), stdin);
if (line == NULL) {
printf("Error: reading input failed, exiting...\n");
exit(1);
}
printf("The input line is:\n%s\n", line);
/* Divide the string into tokens. */
token = strtok(line, whitespace); /* get the first token */
while (token != NULL) {
printf("Next token is %s\n", token);
token = strtok(NULL, whitespace); /* get the next token */
}
free(line);
return 0;
}
sprintf
The C stdio
library also provides functions that manipulate C strings.
Perhaps the most useful is the sprintf
function, which "prints" into a string
rather than printing output to a terminal:
// like printf(), the format string allows for placeholders like %d, %f, etc.
// pass parameters after the format string to fill them in
int sprintf(char *s, const char *format, ...);
sprintf
initializes the contents of a string from values of various types.
Its parameter format
resembles those of printf
and scanf
. Here are some
examples:
char str[64];
float ave = 76.8;
int num = 2;
// initialize str to format string, filling in each placeholder with
// a char representation of its arguments' values
sprintf(str, "%s is %d years old and in grade %d", "Henry", 12, 7);
printf("%s\n", str); // prints: Henry is 12 years old and in grade 7
sprintf(str, "The average grade on exam %d is %g", num, ave);
printf("%s\n", str); // prints: The average grade on exam 2 is 76.8
Functions for Individual Character Values
The standard C library (stdlib.h
) contains a set of functions
for manipulating and testing individual char
values, including:
#include <stdlib.h> // include stdlib and ctypes to use these
#include <ctype.h>
int islower(ch);
int isupper(ch); // these functions return a non-zero value if the
int isalpha(ch); // test is TRUE, otherwise they return 0 (FALSE)
int isdigit(ch);
int isalnum(ch);
int ispunct(ch);
int isspace(ch);
char tolower(ch); // returns ASCII value of lower-case of argument
char toupper(ch);
Here are some examples of their use:
char str[64];
int len, i;
strcpy(str, "I see 20 ZEBRAS, GOATS, and COWS");
if ( islower(str[2]) ){
printf("%c is lower case\n", str[2]); // prints: s is lower case
}
len = strlen(str);
for (i = 0; i < len; i++) {
if ( isupper(str[i]) ) {
str[i] = tolower(str[i]);
} else if( isdigit(str[i]) ) {
str[i] = 'X';
}
}
printf("%s\n", str); // prints: i see XX zebras, goats, and cows
Functions to Convert Strings to Other Types
stdlib.h
also contains functions to convert between
strings and other C types. For example:
#include <stdlib.h>
int atoi(const char *nptr); // convert a string to an integer
double atof(const char *nptr); // convert a string to a float
Here’s an example:
printf("%d %g\n", atoi("1234"), atof("4.56"));
For more information about these and other C library functions (including
what they do, their parameter format, what they return, and which
headers need to be included to use them), see their
man pages.
For example, to view the strcpy
man page, run:
$ man strcpy