14.6. Thread Safety
So far, we have covered synchronization constructs that programmers can use to ensure that their multithreaded programs are consistent and correct regardless of the number of threads employed. However, it is not always safe to make the assumption that standard C library functions can be used "as is" in the context of any multithreaded application. Not all functions in the C library are thread safe, or capable of being run by multiple threads while guaranteeing a correct result without unintended side effects. To ensure that the programs we write are thread safe, it is important to use synchronization primitives like mutexes and barriers to enforce that multithreaded programs are consistent and correct regardless of how the number of threads varies.
Another closely related concept related to thread safety is re-entrancy. All thread safe code is re-entrant; however, not all re-entrant code is thread safe. A function is re-entrant if it can be re-executed/partially executed by a function without causing issue. By definition, re-entrant code ensures that accesses to the global state of a program always result in that global state remaining consistent. While re-entrancy is often (incorrectly) used as a synonym for thread safety, there are special cases for which re-entrant code is not thread safe.
When writing multithreaded code, verify that the C library functions used are indeed thread safe. Fortunately, the list of thread unsafe C library functions is fairly small. The Open Group kindly maintains a list of thread-unsafe functions.
14.6.1. Fixing Issues of Thread Safety
Synchronization
primitives are the most common way to fix issues related to thread safety.
However, unknowingly using thread-unsafe C library functions can cause subtle
issues. Let’s look at a slightly modified version of our countsElem
function called countElemsStr
, which attempts to count the frequency of digits in a
given string, where each digit is separated by spaces. The following program
has been edited for brevity; the full source of this program is available at:
countElemsStr.c.
/* computes the frequency of all the elements in the input string and stores
* the associated counts of each element in the array called counts. */
void countElemsStr(int *counts, char *input_str) {
int val, i;
char *token;
token = strtok(input_str, " ");
while (token != NULL) {
val = atoi(token);
counts[val] = counts[val] + 1;
token = strtok(NULL, " ");
}
}
/* main function:
* calls countElemsStr on a static string and counts up all the digits in
* that string. */
int main( int argc, char **argv ) {
//lines omitted for brevity, but gets user defined length of string
//fill string with n digits
char *inputString = calloc(length * 2, sizeof(char));
fillString(inputString, length * 2);
countElemsStr(counts, inputString);
return 0;
}
The countElemsStr
function uses the strtok
function (as examined in our
discussion on strings) to parse each digit
(stored in token
) in the string, before converting it to an integer and
making the associated updates in the counts
array.
Compiling and running this program on 100,000 elements yields the following output:
$ gcc -o countElemsStr countElemsStr.c $ ./countElemsStr 100000 1 contents of counts array: 9963 9975 9953 10121 10058 10017 10053 9905 9915 10040
Now, let’s take a look at a multithreaded version of countElemsStr
(full
source of the program viewable here):
/* parallel version of countElemsStr (First cut):
* computes the frequency of all the elements in the input string and stores
* the associated counts of each element in the array called counts
*/
void *countElemsStr(void *args) {
//parse args
struct t_arg *myargs = (struct t_arg *)args;
//omitted for brevity
//local variables
int val, i;
char *token;
int local_counts[MAX] = {0};
//compute local start and end values and chunk size:
//omitted for brevity
//tokenize values
token = strtok(input_str + start, " ");
while (token != NULL) {
val = atoi(token); //convert to an int
local_counts[val] = local_counts[val] + 1; //update associated counts
token = strtok(NULL, " ");
}
pthread_mutex_lock(&mutex);
for (i = 0; i < MAX; i++) {
counts[i] += local_counts[i];
}
pthread_mutex_unlock(&mutex);
return NULL;
}
In this version of the program, each thread processes a separate section
of the string referenced by input_str
. The local_counts
array ensures that
the bulk of the write operations occur to local storage. A mutex is employed
to ensure that no two threads write to the shared variable counts
.
However, compiling and running this program yields the following results:
$ gcc -o countElemsStr_p countElemsStr_p.c -pthread $ ./countElemsStr_p 100000 1 1 contents of counts array: 9963 9975 9953 10121 10058 10017 10053 9905 9915 10040 $ ./countElemsStr_p 100000 1 2 contents of counts array: 498 459 456 450 456 471 446 462 450 463 $ ./countElemsStr_p 100000 1 4 contents of counts array: 5038 4988 4985 5042 5056 5013 5025 5035 4968 5065
Even though mutex locks are used around accesses to the counts
array,
the results from separate runs are radically different. This issue arises
because the countsElemsStr
function is not thread safe, because the
string library function strtok
is not thread safe! Visiting the
OpenGroup
website confirms that strtok
is on the list of thread-unsafe functions.
To fix this issue, it suffices to replace strtok
with its thread-safe
alternative, strtok_r
. In the latter function, a pointer is used as the last
parameter to help the thread keep track of where in the string it is parsing.
Here is the fixed function with strtok_r
(full source code
here (countsElemsStr_p_v2.c):
/* parallel version of countElemsStr (First cut):
* computes the frequency of all the elements in the input string and stores
* the associated counts of each element in the array called counts */
void* countElemsStr(void* args) {
//parse arguments
//omitted for brevity
//local variables
int val, i;
char * token;
int local_counts[MAX] = {0};
char * saveptr; //for saving state of strtok_r
//compute local start and end values and chunk size:
//omitted for brevity
//tokenize values
token = strtok_r(input_str+start, " ", &saveptr);
while (token != NULL) {
val = atoi(token); //convert to an int
local_counts[val] = local_counts[val]+1; //update associated counts
token = strtok_r(NULL, " ", &saveptr);
}
pthread_mutex_lock(&mutex);
for (i = 0; i < MAX; i++) {
counts[i]+=local_counts[i];
}
pthread_mutex_unlock(&mutex);
return NULL;
}
The only change in this version of the code is the declaration of the
character pointer saveptr
and replacing all instances of strtok
with
strtok_r
. Rerunning the code with these changes yields the following
output:
$ gcc -o countElemsStr_p_v2 countElemsStr_p_v2.c -pthread $ ./countElemsStr_p_v2 100000 1 1 contents of counts array: 9963 9975 9953 10121 10058 10017 10053 9905 9915 10040 $ ./countElemsStr_p_v2 100000 1 2 contents of counts array: 9963 9975 9953 10121 10058 10017 10053 9905 9915 10040 $ ./countElemsStr_p_v2 100000 1 4 contents of counts array: 9963 9975 9953 10121 10058 10017 10053 9905 9915 10040
Now the program produces the same result for every run. The use of saveptr
in conjunction with strtok_r
ensures that each thread can independently
track their location when parsing the string.
The takeaway from this section is that one should always check the list of thread-unsafe functions in C when writing multithreaded applications. Doing so can save the programmer a lot of heartache and frustration when writing and debugging threaded applications.