2.9.5. C Libraries: Using, Compiling, and Linking
A library implements a collection of functions and definitions that can be used by other programs. A C library consists of two parts:
-
The application programming interface (API) to the library, which gets defined in one or more header files (
.h
files) that must be included in C source code files that plan to use the library. The headers define what the library exports to its users. These definitions usually include library function prototypes, and they may also include type, constant, or global variable declarations. -
The implementation of the library’s functionality, often made available to programs in a precompiled binary format that gets linked (added) into the binary executable created by
gcc
. Precompiled library code might be in an archive file (libsomelib.a
) containing several.o
files that can be statically linked into the executable file at compile time. Alternatively, it may consist of a shared object file (libsomelib.so
) that can be dynamically linked at runtime into a running program.
For example, the C string library implements a set of functions to manipulate C
strings. The string.h
header file defines its interface, so any program that
wants to use string library functions must #include <string.h>
. The
implementation of the C string library is part of the larger standard C library
(libc
) that the gcc
compiler automatically links into every executable file
it creates.
A library’s implementation consists of one or more modules (.c
files), and
may additionally include header files that are internal to the library
implementation; internal header files are not part of the library’s API but are
part of well-designed, modular library code. Often the C source code
implementation of a library is not exported to the user of the library.
Instead, the library is made available in a precompiled binary form. These
binary formats are not executable programs (they cannot be run on their own),
but they provide executable code that can be linked into (added into) an
executable file by gcc
at compilation time.
There are numerous libraries available for C programmers to use. For example,
the POSIX thread library (discussed in
Chapter
10) enables multithreaded C programs. C programmers can also implement and
use their own libraries (discussed in the
next section). Large C
programs tend to use many C libraries, some of which gcc
links implicitly,
whereas others require explicit linking with the -l
command line option to
gcc
.
Standard C libraries normally do not need to be explicitly linked in with the
-l
option, but other libraries do. The documentation for a library function
often specifies whether the library needs to be explicitly linked in when
compiling. For example, the POSIX threads library (pthread
) and the
readline
library require explicit linking on the gcc
command line:
$ gcc -o myprog myprog.c -pthread -lreadline
Note that linking the POSIX thread library is a special case that does not
include the -l
prefix. However, most libraries are explicitly linked into
the executable using the -l
syntax on the gcc
command line.
Also note
that the full name of the library file should not be included in the -l
argument to gcc
; the library files are named something like libreadline.so
or libreadline.a
, but the lib
prefix and .so
or .a
suffix of the
filenames are not included. The actual library filename may also contain
version numbers (e.g. libreadline.so.8.0
), which are also not included in the
-l
command line option (-lreadline
). By not forcing the user to specify
(or even know) the exact name and location of the library files to link in,
gcc
is free to find the most recent version of a library in a user’s library
path. It also allows the compiler to choose to dynamically link when both a
shared object (.so
) and an archive (.a
) version of a library are available.
If users want to statically link libraries, then they can explicitly specify
static linking in the gcc
command line. The --static
option provides one
method for requesting static linking:
$ gcc -o myprog myprog.c --static -pthread -lreadline
Compilation Steps
Characterizing C’s program compilation steps will help to illustrate how library code gets linked into an executable binary file. We first present the compilation steps and then discuss (with examples) different types of errors that can occur when compiling programs that use libraries.
The C compiler translates a C source file (e.g., myprog.c
) into an executable
binary file (e.g., a.out
) in four distinct steps (plus a fifth step that
occurs at runtime).
-
The precompiler step runs first and expands preprocessor directives: the
#
directives that appear in the C program, such as#define
and#include
. Compilation errors at this step include syntax errors in preprocessor directives orgcc
not finding header files associated with#include
directives. To view the intermediate results of the precompiler step, pass the-E
flag togcc
(the output can be redirected to a file that can be viewed by a text editor):$ gcc -E myprog.c $ gcc -E myprog.c > out $ vim out
-
The compile step runs next and does the bulk of the compilation task. It translates the C program source code (
myprog.c
) to machine-specific assembly code (myprog.s
). Assembly code is a human-readable form of the binary machine code instructions that a computer can execute. Compilation errors at this step include C language syntax errors, undefined symbol warnings, and errors from missing definitions and function prototypes. To view the intermediate results of the compile step, pass the-S
flag togcc
(this option creates a text file namedmyprog.s
with the assembly translation ofmyprog.c
, which can be viewed in a text editor):$ gcc -S myprog.c $ vim myprog.s
-
The assembly step converts the assembly code into relocatable binary object code (
myprog.o
). The resulting object file contains machine code instructions, but it is not a complete executable program that can run on its own. Thegcc
compiler on Unix and Linux systems produces binary files in a specific format called ELF (Executable and Linkable Format). To stop compilation after this step, pass the-c
flag togcc
(this produces a file namedmyprog.o
). Binary files (e.g.a.out
and.o
files) can be viewed usingobjdump
or similar tools for displaying binary files:$ gcc -c myprog.c # disassemble functions in myprog.o with objdump: $ objdump -d myprog.o
-
The link editing step runs last and creates a single executable file (
a.out
) from relocatable binaries (.o
) and libraries (.a
or.so
). In this step, the linker verifies that any references to names (symbols) in a.o
file are present in other.o
,.a
, or.so
files. For example, the linker will find theprintf
function in the standard C library (libc.so
). If the linker cannot find the definition of a symbol, this step fails with an error stating that a symbol is undefined. Runninggcc
without flags for partial compilation performs all four steps of compiling a C source code file (myprog.c
) to an executable binary file (a.out
) that can be run:$ gcc myprog.c $ ./a.out # disassemble functions in a.out with objdump: $ objdump -d a.out
If the binary executable file (
a.out
) statically links in library code (from.a
library files), thengcc
embeds copies of library functions from the.a
file in the resultinga.out
file. All calls to library functions by the application are bound to the locations in thea.out
file to which the library function is copied. Binding associates a name with a location in the program memory. For example, binding a call to a library function namedgofish
means replacing the use of the function name with the address in memory of the function (in later chapters we discuss memory addresses in more detail.)If, however, the
a.out
was created by dynamically linking a library (from library shared object,.so
, files), thena.out
does not contain a copy of the library function code from these libraries. Instead, it contains information about which dynamically linked libraries are needed by thea.out
file to run it. Such executables require an additional linking step at runtime. -
The runtime linking step is needed if
a.out
was linked with shared object files during link editing (step 4). In such cases, the dynamic library code (in.so
files) must be loaded at runtime and linked with the running program. This runtime loading and linking of shared object libraries is called dynamic linking. When a user runs ana.out
executable with shared object dependencies, the system performs dynamic linking before the program begins executing itsmain
function.The compiler adds information about shared object dependencies into the
a.out
file during the link editing compilation step (step 4). When the program starts executing, the dynamic linker examines the list of shared object dependencies and finds and loads the shared object files into the running program. It then updates relocation table entries in thea.out
file, binding the program’s use of symbols in shared objects (such as calls to library functions) to their locations in the.so
file loaded at runtime. Runtime linking reports errors if the dynamic linker cannot find a shared object (.so
) file needed by the executable.The
ldd
utility lists an executable file’s shared object dependencies:$ ldd a.out
The GNU debugger (GDB) can examine a running program and show which shared object code is loaded and linked at runtime. We cover GDB in Chapter 3. However, the details of examining the Procedure Lookup Table (PLT), which is used for runtime linking of calls to dynamically linked library functions, is beyond the scope of this textbook.
For more details about the phases of compilation and about tools for examining different phases, see: Compilation Phases.
Common Compilation Errors Related to Compiling and Linking Libraries
Several compilation and linking errors can occur due to the programmer
forgetting to include library header files or forgetting to explicitly link in
library code. Identifying the gcc
compiler error or warning associated with
each of these errors will help in debugging errors related to using C
libraries.
Consider the following C program that makes a call to a function
libraryfunc
from the examplelib
library, which is available as
a shared object file, libexamplelib.so
:
#include <stdio.h>
#include <examplelib.h>
int main(int argc, char *argv[]) {
int result;
result = libraryfunc(6, MAX);
printf("result is %d\n", result);
return 0;
}
Assume that the header file, examplelib.h
, contains the
definitions in the following example:
#define MAX 10 // a constant exported by the library
// a function exported by the library
extern int libraryfunc(int x, int y);
The extern
prefix to the function prototype means that the function’s
definition comes from another file — it’s not in the examplelib.h
file, but
instead it’s provided by one of the .c
files in the library’s implementation.
Forgetting to include a header file
If the programmer forgets to include examplelib.h
in their program, then the
compiler produces warnings and errors about the program’s use
of library functions and constants that it does not know about.
For example, if the user compiles their program without #include <examplelib.h>
,
gcc
will produce the following output:
# '-g': add debug information, '-c': compile to .o gcc -g -c myprog.c myprog.c: In function main: myprog.c:8:12: warning: implicit declaration of function libraryfunc result = libraryfunc(6, MAX); ^~~~~~~~~~~ myprog.c:8:27: error: MAX undeclared (first use in this function) result = libraryfunc(6, MAX); ^~~
The first compiler warning (implicit declaration of function libraryfunc
)
tells the programmer that the compiler cannot find a function prototype for the
libraryfunc
function. This is just a compiler warning because gcc
will
guess that the function’s return type is an integer and will continue compiling
the program. However, programmers should not ignore warnings such as these!
They
indicate that the program isn’t including a function prototype before its use
in the myprog.c
file, which is often due to not including a header file that
contains the function prototype.
The second compiler error (MAX undeclared (first use in this function)
)
follows from a missing constant definition. The compiler cannot guess at the
value of the missing constant, so this missing definition fails with an error.
This type of "undeclared" message often indicates that a header file
defining a constant or global variable is missing or hasn’t been properly
included.
Forgetting to link a library
If the programmer includes the library header file (as shown in the previous
listing),
but forgets to explicitly link in the library during the link
editing step (step 4) of compilation, then gcc
indicates this with an
undefined reference
error:
$ gcc -g myprog.c In function main: myprog.c:9: undefined reference to libraryfunc collect2: error: ld returned 1 exit status
This error originates from ld
, the linker component of the compiler.
It indicates that the linker cannot find the implementation of the
library function libraryfunc
that gets called at line 9 in myprog.c
.
An undefined reference
error indicates that a library needs to be
explicitly linked into the executable. In this example, specifying
-lexamplelib
on the gcc
command line will fix the error:
$ gcc -g myprog.c -lexamplelib
gcc can’t find header or library files
Compilation will also fail with errors if a library’s header or implementation
files are not present in the directories that gcc
searches by default. For
example, if gcc
cannot find the examplelib.h
file, it will produce an error
message like this:
$ gcc -c myprog.c -lexamplelib myprog.c:1:10: fatal error: examplelib.h: No such file or directory #include <examplelib.h> ^~~~~~~ compilation terminated.
If the linker cannot find a .a
or .so
version of the library to link in
during the link editing step of compilation, gcc
will exit with an error like
the following:
$ gcc -c myprog.c -lexamplelib /usr/bin/ld: cannot find -lexamplelib collect2: error: ld returned 1 exit status
Similarly, if a dynamically linked executable cannot locate a shared object
file (e.g., libexamplelib.so
), it will fail to execute at runtime with an error
like the following:
$ ./a.out ./a.out: error while loading shared libraries: libexamplelib.so: cannot open shared object file: No such file or directory
To resolve these types of errors, programmers must specify additional options
to gcc
to indicate where the library’s files can be found. They may also
need to modify the LD_LIBRARY_PATH
environment variable for the runtime
linker to find a library’s .so
file.
Library and Include Paths
The compiler automatically searches in standard directory locations for
header and library files. For example, systems commonly store standard header
files in /usr/include
, and library files in /usr/lib
, and
gcc
automatically looks for headers and libraries in these
directories; gcc
also automatically searches for header files in the current
working directory.
If gcc
cannot find a header or a library file, then the user must explicitly
provide paths on the command line using -I
and -L
. For example, suppose
that a library named libexamplelib.so
exists in /home/me/lib
, and its
header file examplelib.h
is in /home/me/include
. Because gcc
knows nothing of
those paths by default, it must be explicitly told to include files there to
successfully compile a program that uses this library:
$ gcc -I/home/me/include -o myprog myprog.c -L/home/me/lib -lexamplelib
To specify the location of a dynamic library (e.g., libexamplelib.so
) when
launching a dynamically linked executable, set the LD_LIBRARY_PATH
environment variable to include the path to the library. Here’s an example
bash command that can be run at a shell prompt or added to a .bashrc
file:
export LD_LIBRARY_PATH=/home/me/lib:$LD_LIBRARY_PATH
When the gcc
command lines get long, or when an executable requires many
source and header files, it helps to simplify compilation by using make
and a
Makefile
. Here’s some more information about
make and Makefiles.