A library implements a collection of functions and definitions that can be used by other programs. A C library consists of two parts:

  1. The application programming interface (API) to the library, which gets defined in one or more header files (.h files) that must be included in C source code files that plan to use the library. The headers define what the library exports to its users. These definitions usually include library function prototypes, and they may also include type, constant, or global variable declarations.

  2. The implementation of the library’s functionality, often made available to programs in a precompiled binary format, that gets linked (added) into the binary executable created by gcc. Precompiled library code might be in an archive file (libsomelib.a) containing several .o files that can be statically linked into the executable file at compile time. Alternatively, it may consist of a shared object file (libsomelib.so) that can be dynamically linked at runtime into a running program.

For example, the C string library implements a set of functions to manipulate C strings. The string.h header file defines its interface, so any program that wants to use string library functions must #include <string.h>. The implementation of the C string library is part of the larger standard C library (libc) that the gcc compiler automatically links into every executable file it creates.

A library’s implementation consists of one or more modules (.c files), and may additionally include header files that are internal to the library implementation; internal header files are not part of the library’s API but are part of well-designed, modular library code. Often times the C source code implementation of a library is not exported to the user of the library. Instead, the library is made available in a precompiled binary form. These binary formats are not executable programs (they cannot be run on their own), but they provide executable code that can be linked into (added into) an executable file by gcc at compilation time.

There are numerous libraries available for C programmers to use. For example, the POSIX thread library (discussed in Chapter 10) enables multi-threaded C programs. C programmers can also implement and use their own libraries (discussed in the next section). Large C programs tend to use many C libraries, some of which gcc links implicitly, and others that require explicit linking with the -l command line option to gcc.

Standard C libraries normally do not need to be explicitly linked in with the -l option, but other libraries do. The man page for a library function often specifies if the library needs to be explicitly linked in when compiling. For example, the POSIX threads library (pthread) and the readline libraries require explicit linking on the gcc command line:

$ gcc -o myprog myprog.c -lpthread -lreadline

Note that the full name of the library file should not be included in the -l argument to gcc; the library files are named something like libpthread.so or libreadline.a, but the lib prefix, and .so and .a suffix of the file names are not included. The actual library file name may also contain version numbers (e.g. libreadline.so.8.0), which are also not included in the -l command line option (-lreadline). By not forcing the user to specify (or even know) the exact name and location of the library files to link in, gcc is free to find the most recent version of a library in a user’s library path. It also allows the compiler to choose to dynamically link when both a shared object (.so) and an archive (.a) version of a library are available. If users want to statically link libraries, then they can explicitly specify static linking in the gcc command line. The --static option provides one method for requesting static linking:

$ gcc -o myprog myprog.c --static -lpthread -lreadline

Compilation Steps

Characterizing C’s program compilation steps will help to illustrate how library code gets linked into an executable binary file. We first present the compilation steps and then discuss (with examples) different types of errors that can occur with compiling programs that use libraries.

The C compiler translates a C source file (e.g., myprog.c) into an executable binary file (e.g., a.out) in four distinct steps (plus a fifth step that occurs at runtime):

  1. Precompiler Step: runs first and expands preprocessor directives: the # that appear in the C program, such as #define and #include. Compilation errors at this step include syntax errors in preprocessor directives and gcc not finding header files associated with #include directives. To view the intermediate results of the precompiler step, pass the -E flag to gcc (the output can be redirected to a file that can be viewed by a text editor):

    $ gcc -E  myprog.c
    $ gcc -E  myprog.c  > out
    $ vim out
  2. Compile Step: runs next and does the bulk of the compiling task. It translates the C program source code (myprog.c) to machine-specific assembly code (myprog.s). Assembly code is a human-readable form of the binary machine code instructions that a computer can execute. Compilation errors at this step include C language syntax errors, and undefined symbol warnings, and errors from missing definitions and function prototypes. To view the intermediate results of the compile step, pass the -S flag to gcc (this option creates a text file named myprog.s with the assembly translation of myprog.c, which can be viewed in a text editor):

    $ gcc -S  myprog.c
    $ vim myprog.s
  3. Assembly Step: converts the assembly code into relocatable binary object code (myprog.o). The resulting object file contains machine code instructions, but it is not a complete executable program that can execute on its own. The gcc compiler on Unix and Linux systems produces binary files in a specific format called ELF (Executable and Linkable Format). To compile up to only this step, pass the -c flag to gcc (this produces a file named myprog.o). Binary files (e.g. a.out and .o files) can be viewed using objdump or similar tools for displaying binary files:

    $ gcc -c  myprog.c
    
    # disassemble functions in myprog.o with objdump:
    $ objdump -d myprog.o
  4. Link Editing Step: runs last and creates a single executable file (a.out) from relocatable binaries (.o) and libraries (.a or .so). In this step, the linker verifies that any references to names (symbols) in a .o file are present in other .o, .a or .so files. For example, the linker will find the printf function in the standard C library (libc.so). If the linker cannot find the definition of a symbol, this step fails with an error stating that a symbol is undefined. Running gcc without flags for partial compilation performs all four steps of compiling a C source code file (myprog.c) to an executable binary file (a.out) that can be run:

    $ gcc myprog.c
    $ ./a.out
    
    # disassemble functions in a.out with objdump:
    $ objdump -d a.out

    If the binary executable file (a.out) statically links in library code (from .a library files), then gcc embeds copies of library functions from the .a file into the resulting a.out file. All calls to library functions by the application are bound to the locations in the a.out file into which the library function is copied. Binding associates a name with a location in the program memory. For example, binding a call to a library function named gofish means replacing the use of the function name to the address in memory of the function (in later chapters we discuss memory addresses in more detail.)

    If, however, the a.out was created by dynamically linking a library (from library shared object, .so, files), then the a.out does not contain a copy of library function code from these libraries. Instead, it contains information about which dynamically linked libraries are needed by the a.out file to run it. Such executables require an additional linking step at runtime.

  5. Runtime Linking Step: if the a.out was linked with shared object files during link editing (step 4), then dynamic library code (in .so files) must be loaded at runtime and linked with the running program. This runtime loading and linking of shared object libraries is called dynamic linking. When a user runs an a.out with shared object dependencies, the system performs dynamic linking before the program begins executing its main function.

    The compiler adds information about shared object dependencies into the a.out file during the link editing compilation step (step 4). When the program starts executing, the dynamic linker examines the list of shared object dependencies and finds and loads the shared object files into the running program. It then updates relocation table entries in the a.out that binds the program’s uses of symbols in shared objects (such as calls to library functions) to their location in the .so file loaded at runtime. Runtime linking reports errors if the dynamic linker cannot find a shared object (.so) file needed by the executable.

    The ldd utility lists an executable file’s shared object dependencies:

    $ ldd a.out

    The GNU debugger (GDB) can examine a running program and show which shared object code is loaded and linked at runtime. We cover GDB in Chapter 3. However, the details of examining the Procedure Lookup Table (PLT), which is used for runtime linking of calls to dynamically linked library functions, is beyond the scope of this textbook.

For more details about the phases of compilation and about tools for examining different phases, see: Compilation Phases.

Several compilation and linking errors can occur due to the programmer forgetting to include library header files or forgetting to explicitly link in library code. Identifying the gcc compiler error or warning associated with each of these errors will help in debugging errors related to using C libraries.

Consider the following C program that makes a call to a function libraryfunc from the mylib library, which is available as a shared object file, libmylib.so:

#include <stdio.h>
#include <mylib.h>

int main(int argc, char *argv[]) {
    int result;
    result = libraryfunc(6, MAX);
    printf("result is %d\n", result);
    return 0;
}

Assume that the header file, mylib.h, contains the following definitions:

#define MAX 10   // a constant exported by the library

// a function exported by the library
extern int libraryfunc(int x, int y);

The extern prefix to the function prototype means that the function’s definition comes from another file — it’s not in the mylib.h file, but instead it’s provided by one of the .c files in the library’s implementation.

Forgetting to include a header file

If the programmer forgets to include mylib.h in their program, then the compiler produces warnings and errors about the program’s use of library functions and constants that it does not know about. For example, if the user compiles their program without #include <mylib.h>, gcc will produce the following output:

# `-g`: add debug information, -c: compile to `.o`
gcc -g -c myprog.c

myprog.c: In function main:
myprog.c:8:12: warning: implicit declaration of function libraryfunc
   result = libraryfunc(6, MAX);
            ^~~~~~~~~~~

myprog.c:8:27: error: MAX undeclared (first use in this function)
   result = libraryfunc(6, MAX);
                           ^~~

The first compiler warning (implicit declaration of function libraryfunc) tells the programmer that the compiler cannot find a function prototype for the libraryfunc function. This is just a compiler warning because gcc will guess that the function’s return type is an integer and will continue compiling the program. However, programmers should not ignore such warnings! They indicate that the program isn’t including a function prototype before its use in the myprog.c file, which is often due to missing including a header file that contains the function prototype.

The second compiler error (MAX undeclared (first use in this function)) follows from a missing constant definition. The compiler cannot guess at the value of the missing constant, so this missing definition fails with an error. This type of "undeclared" message often indicates that a header file defining a constant or global variable is missing or hasn’t been properly included.

If the programmer includes the library header file (as shown in the listing above), but forgets to explicitly link in the library during the link editing step (step 4) of compilation, then gcc indicates this with an undefined reference error:

$ gcc -g myprog.c

In function main:
myprog.c:9: undefined reference to libraryfunc
collect2: error: ld returned 1 exit status

This error originates from ld, the linker component of the compiler. It indicates that the linker cannot find the implementation of the library function libraryfunc that gets called at line 9 in myprog.c. An undefined reference error indicates that a library needs to be explicitly linked into the executable. In this example, specifying -lmylib on the gcc command line will fix the error:

$ gcc -g myprog.c  -lmylib
gcc can’t find header or library files

Compilation will also fail with errors if a library’s header or implementation files are not present in the directories that gcc searches by default. For example, if gcc cannot find the mylib.h file, it will produce an error message like this:

$ gcc -c myprog.c -lmylib
myprog.c:1:10: fatal error: mylib.h: No such file or directory
 #include <mylib.h>
          ^~~~~~~
compilation terminated.

If the linker cannot find a .a or .so version of the library to link in (during step 4 of compilation), gcc will exit with an error like the following:

$ gcc -c myprog.c -lmylib
fail to build the `a.out` file.
/usr/bin/ld: cannot find -lmylib
collect2: error: ld returned 1 exit status

Similarly, if a dynamically linked executable cannot locate a shared object file (e.g., libmylib.so), it will fail to execute at runtime with an error like the following:

$ ./a.out
./a.out: error while loading shared libraries:
        libmylib.so: cannot open shared object file: No such file or directory

To resolve these types of errors, programmers must specify additional options to gcc to indicate where the library’s files can be found. They may also need to modify the LD_LIBRARY_PATH environment variable for the runtime linker to find a library’s .so file.

Library and Include Paths

The compiler automatically searches in standard directory locations for header and library files. For example, systems commonly store standard header files in /usr/include/ and library files in /usr/lib, and gcc automatically looks for headers and libraries in these directories. gcc also automatically searches for header files in the current working directory.

If gcc cannot find a header or a library file, then the user must explicitly provide paths on the command line using -I and -L. For example, suppose that there exists a library named libmylib.so in /home/me/lib, and its header file mylib.h is in /home/me/include. Because gcc knows nothing of those paths by default, it must be explicitly told to include files there to successfully compile a program that uses this library:

$ gcc  -I/home/me/include -o myprog myprog.c -L/home/me/lib -lmylib

To specify the location of a dynamic library (e.g., libmylib.so) when launching a dynamically linked executable, set the LD_LIBRARY_PATH environment variable to include the path to the library. Here’s an example bash command that can be run at a shell prompt or added to a .bashrc file:

export LD_LIBRARY_PATH=/home/me/lib:$LD_LIBRARY_PATH

When the gcc command lines get long, or when an executables require many source and header files, it helps to simplify compilation by using make and a Makefile. Here’s some information about using make and writing Makefiles.