Dive into Systems
Authors
Suzanne J. Matthews, Ph.D. — West Point
suzanne.matthews@westpoint.edu
Tia Newhall, Ph.D. — Swarthmore College
newhall@cs.swarthmore.edu
Kevin C. Webb, Ph.D. — Swarthmore College
kwebb@cs.swarthmore.edu
Book Version
Dive into Systems — Version 1.2
Copyright
© 2020 Dive into Systems, LLC
License: CC BY-NC-ND 4.0
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International.
Disclaimer
The authors made every effort to ensure that the information in this book was correct. The programs in this book have been included for instructional purposes only. The authors do not offer any warranties with respect to the programs or contents of this book. The authors do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause.
The views expressed in this book are those of the authors and do not reflect the official policy or position of the Department of the Army, Department of Defense, or the U.S. Government.
Acknowledgements
The authors would like to acknowledge the following individuals for helping make Dive into Systems a success:
Formal Reviewers
Each chapter in Dive into Systems was peer-reviewed by several CS professors around the United States. We are extremely grateful to those faculty who served as formal reviewers. Your insight, time, and recommendations have improved the rigor and precision of Dive into Systems. Specifically, we would like to acknowledge the contributions of:
-
Jeannie Albrecht (Williams College) for her review and feedback on Chapter 15.
-
John Barr (Ithaca College) for his review and feedback on chapters 6, 7, and 8, and providing general advice for the x86_64 chapter.
-
Jon Bentley for providing review and feedback on section 5.1, including line-edits.
-
Anu G. Bourgeois (Georgia State University) for her review and feedback on Chapter 4.
-
Martina Barnas (Indiana University Bloomington) for her review and insightful feedback on Chapter 14, especially section 14.4.
-
David Bunde (Knox College) for his review, comments and suggestions on Chapter 14.
-
Stephen Carl (Sewanee: The University of the South) for his careful review and detailed feedback on chapters 6 and 7.
-
Bryan Chin (U.C. San Diego) for his insightful review of the ARM assembly chapter (chapter 9).
-
Amy Csizmar Dalal (Carleton College) for her review and feedback on Chapter 5.
-
Debzani Deb (Winston-Salem State University) for her review and feedback on Chapter 11.
-
Saturnino Garcia (University of San Diego) for his review and feedback on Chapter 5.
-
Tim Haines (University of Wisconsin) for his comments and review of Chapter 3.
-
Bill Jannen (Williams College) for his detailed review and insightful comments on Chapter 11.
-
Ben Marks (Swarthmore College) for comments on chapters 1 and 2.
-
Alexander Mentis (West Point) for insightful comments and line-edits of early drafts of this book.
-
Rick Ord (U.C. San Diego) for his review and suggested corrections for the Preface, and reviewing over 60% (!!) of the book, including chapters 0, 1, 2, 3, 4, 6, 7, 8 and 14. His feedback has helped us keep our notation and code consistent over the different chapters!
-
Joe Politz (U.C. San Diego) for his review and detailed suggestions for strengthening Chapter 12.
-
Brad Richards (University of Puget Sound) for his rapid feedback and suggestions for Chapter 12.
-
Kelly Shaw (Williams College) for her review and suggestions for Chapter 15.
-
Simon Sultana (Fresno Pacific University) for his review and suggested corrections for Chapter 1.
-
Cynthia Taylor (Oberlin College) for her review and suggested corrections of Chapter 13.
-
David Toth (Centre College) for his review and suggested corrections for Chapters 2 and 14.
-
Bryce Wiedenbeck (Davidson College) for his review and suggested corrections for Chapter 4.
-
Daniel Zingaro (University of Toronto Mississauga) for catching so many typos.
Additional Feedback
The following people caught random typos and other sundries. We are grateful for your help in finding typos!
-
Tanya Amert (Denison University)
-
Ihor Beliuha
-
Daniel Canas (Wake Forest University)
-
Chien-Chung Shen (University of Delaware)
-
Vasanta Chaganti (Swarthmore College)
-
Stephen Checkoway (Oberlin College)
-
John DeGood (The College of New Jersey)
-
Joe Errey
-
Artin Farahani
-
Sat Garcia (University of San Diego)
-
Aaron Gember-Jacobson (Colgate University)
-
Stephen Gilbert
-
Arina Kazakova (Swarthmore College)
-
Deborah Knox (The College of New Jersey)
-
Kevin Lahey (Colgate University)
-
Raphael Matchen
-
Sivan Nachaum (Smith College)
-
Aline Normolye (Bryn Mawr College)
-
SaengMoung Park (Swarthmore College)
-
Rodrigo Piovezan (Swarthmore College)
-
Roy Ragsdale (West Point) who gave advice for restructuring the guessing game for the ARM buffer overflow exploit in chapter 9.
-
Zachary Robinson (Swarthmore College)
-
Joel Sommers (Colgate University)
-
Peter Stenger
-
Richard Weiss (Evergreen State College)
-
David Toth (Centre College)
-
Alyssa Zhang (Swarthmore College)
Early Adopters
An alpha release of Dive into Systems was piloted at West Point in Fall 2018; The beta release of the textbook was piloted at West Point and Swarthmore College in Spring 2019. In Fall 2019, Dive into Systems launched its Early Adopter Program, which enabled faculty around the United States to pilot the stable release of Dive into Systems at their institutions. The Early Adopter Program is a huge help to the authors, as it helps us get valuable insight into student and faculty experiences with the textbook. We use the feedback we receive to improve and strengthen the content of Dive into Systems, and are very thankful to everyone who completed our student and faculty surveys.
2019-2020 Early Adopters
The following individuals piloted Dive into Systems as a textbook at their institutions during the Fall 2019- Spring 2020 Academic Year:
-
John Barr (Ithaca College) - Computer Organization & Assembly Language (Comp 210)
-
Chris Branton (Drury University) - Computer Systems Concepts (CSCI 342)
-
Dick Brown (St. Olaf College) - Hardware Design (CSCI 241)
-
David Bunde (Knox College) - Introduction to Computing Systems (CS 214)
-
Bruce Char (Drexel University) - Systems Programming (CS 283)
-
Vasanta Chaganti (Swarthmore College) - Introduction to Computer Systems (CS 31)
-
Bryan Chin (U.C. San Diego) - Computer Organization and Systems Programming (CSE 30)
-
Stephen Carl (Sewanee: The University of the South) - Computer Systems and Organization (CSci 270)
-
John Dougherty (Haverford College) - Computer Organization (cs240)
-
John Foley (Smith College) - Operating Systems (CSC 262)
-
Elizabeth Johnson (Xavier University) - Programming in C
-
Alexander Kendrowitch (West Point) - Computer Organization (CS380)
-
Bill Kerney (Clovis Community College) - Assembly Programming (CSCI 45)
-
Deborah Knox (The College of New Jersey) - Computer Architecture (CSC 325)
-
Doug MacGregor (Western Colorado University) - Operating Systems/Architecture (CS 330)
-
Jeff Matocha (Ouachita Baptist University) - Computer Organization (CSCI 3093)
-
Keith Muller (U.C. San Diego) - Computer Organization and Systems Programming (CSE 30)
-
Crystal Peng (Park University) - Computer Architecture (CS 319)
-
Leo Porter (U.C. San Diego) - Introduction to Computer Architecture (CSE 141)
-
Lauren Provost (Simmons University) - Computer Architecture and Organization (CS 226)
-
Kathleen Riley (Bryn Mawr College) - Principles of Computer Organization (CMSC B240)
-
Roger Shore (High Point University) - Computer Systems (CSC-2410)
-
Tony Tong (Wheaton College, Norton MA) - Advanced Topics in Computer Science: Parallel and Distributed Computing (COMP 398)
-
Brian Toone (Samford University) - Computer Organization and Architecture (COSC 305)
-
David Toth (Centre College) - Systems Programming (CSC 280)
-
Bryce Wiedenbeck (Davidson College) - Computer Organization (CSC 250)
-
Richard Weiss (The Evergreen State College) - Computer Science Foundations: Computer Architecture (CSF)
Preface
In today’s world, much emphasis is placed on learning to code, and programming is touted as a golden ticket to a successful life. Despite all the code boot camps and programming being taught in elementary schools, the computer itself is often treated as an afterthought — it’s increasingly becoming invisible in the discussions of raising the next generations of computer scientists.
The purpose of this book is to give readers a gentle yet accessible introduction to computer systems. To write effective programs, programmers must understand a computer’s underlying subsystems and architecture. However, the expense of modern textbooks often limits their availability to the set of students that can afford them. This free online textbook seeks to make computer systems concepts accessible to everyone. It is targeted toward students with an introductory knowledge of computer science who have some familiarity with Python. If you’re looking for a free book to introduce you to basic computing principles in Python, we encourage you to read How To Think Like a Computer Scientist with Python first.
If you’re ready to proceed, please come in — the water is warm!
What This Book Is About
Our book is titled Dive into Systems and is meant to be a gentle introduction to topics in computer systems, including C programming, architecture fundamentals, assembly language, and multithreading. The ocean metaphor is very fitting for computer systems. As modern life is thought to have risen from the depths of the primordial ocean, so has modern programming risen from the design and construction of early computer architecture. The first programmers studied the hardware diagrams of the first computers to create the first programs.
Yet as life (and computing) began to wander away from the oceans from which they emerged, the ocean began to be perceived as a foreboding and dangerous place, inhabited by monsters. Ancient navigators used to place pictures of sea monsters and other mythical creatures in the uncharted waters. Here be dragons, the text would warn. Likewise, as computing has wandered ever further away from its machine-level origins, computer systems topics have often emerged as personal dragons for many computing students.
In writing this book, we hope to encourage students to take a gentle dive into computer systems topics. Even though the sea may look like a dark and dangerous place from above, there is a beautiful and remarkable world to be discovered for those who choose to peer just below the surface. So too can a student gain a greater appreciation for computing by looking below the code and examining the architectural reef below.
We are not trying to throw you into the open ocean here. Our book assumes only a CS1 knowledge and is designed to be a first exposure to many computer systems topics. We cover topics such as C programming, logic gates, binary, assembly, the memory hierarchy, threading, and parallelism. Our chapters are written to be as independent as possible, with the goal of being widely applicable to a broad range of courses.
Lastly, a major goal for us writing this book is for it to be freely available. We want our book to be a living document, peer reviewed by the computing community, and evolving as our field continues to evolve. If you have feedback for us, please drop us a line. We would love to hear from you!
Ways to Use This Book
Our textbook covers a broad range of topics related to computer systems, specifically targeting intermediate-level courses such as introduction to computer systems or computer organization. It can also be used to provide background reading for upper-level courses such as operating systems, compilers, parallel and distributed computing, and computer architecture.
This textbook is not designed to provide complete coverage of all systems topics. It does not include advanced or full coverage of operating systems, computer architecture, or parallel and distributed computing topics, nor is it designed to be used in place of textbooks devoted to advanced coverage of these topics in upper-level courses. Instead, it focuses on introducing computer systems, common themes in systems in the context of understanding how a computer runs a program, and how to design programs to run efficiently on systems. The topic coverage provides a common knowledge base and skill set for more advanced study in systems topics.
Our book’s topics can be viewed as a vertical slice through a computer. At the lowest layer we discuss binary representation of programs and circuits designed to store and execute programs, building up a simple CPU from basic gates that can execute program instructions. At the next layer we introduce the operating system, focusing on its support for running programs and for managing computer hardware, particularly on the mechanisms of implementing multiprogramming and virtual memory support. At the highest layer, we present the C programming language and how it maps to low-level code, how to design efficient code, compiler optimizations, and parallel computing. A reader of the entire book will gain a basic understanding of how a program written in C (and Pthreads) executes on a computer and, based on this understanding, will know some ways in which they can change the structure of their program to improve its performance.
Although as a whole the book provides a vertical slice through the computer, the book chapters are written as independently as possible so that an instructor can mix and match chapters for their particular needs. The chapter dependency graph is shown below, though individual sections within chapters may not have as deep a dependency hierarchy as the entire chapter.
Summary of Chapter Topics
-
Chapter 0, Introduction: Introduction to computer systems and some tips for reading this book.
-
Chapter 1, Introduction to C Programming: Covers C programming basics, including compiling and running C programs. We assume readers of this book have had an introduction to programming in some programming language. We compare example C syntax to Python syntax so that readers familiar with Python can see how they may translate. However, Python programming experience is not necessary for reading or understanding this chapter.
-
Chapter 2, A Deeper Dive into C: Covers most of the C language, notably pointers and dynamic memory. We also elaborate on topics from Chapter 1 in more detail and discuss some advanced C features.
-
Chapter 3, C Debugging Tools: Covers common C debugging tools (GDB and Valgrind) and illustrates how they can be used to debug a variety of applications.
-
Chapter 4, Binary and Data Representation: Covers encoding data into binary, binary representation of C types, arithmetic operations on binary data, and arithmetic overflow.
-
Chapter 5, Gates, Circuits, and Computer Architecture: Covers the von Neumann architecture from logic gates to the construction of a basic CPU. We characterize clock-driven execution and the stages of instruction execution though arithmetic, storage, and control circuits. We also briefly introduce pipelining, some modern architecture features, and a short history of computer architecture.
-
Chapters 6-10, Assembly Programming: Covers translating C into assembly code from basic arithmetic expressions to functions, the stack, and array and
struct
access. In three separate chapters we cover assembly from three different instruction set architectures: 32-bit x86, 64-bit x86, and 64-bit ARM. -
Chapter 11, Storage and the Memory Hierarchy: Covers storage devices, the memory hierarchy and its effects on program performance, locality, caching, and the Cachegrind profiling tool.
-
Chapter 12, Code Optimization: Covers compiler optimizations, designing programs with performance in mind, tips for code optimization, and quantitatively measuring a program’s performance.
-
Chapter 13, Operating Systems: Covers core operating system abstractions and the mechanisms behind them. We primarily focus on processes, virtual memory, and interprocess communication (IPC).
-
Chapter 14, Shared Memory Parallelism: Covers multicore processors, threads and Pthreads programming, synchronization, race conditions, and deadlock. This chapter includes some advanced topics on measuring parallel performance (speed-up, efficiency, Amdahl’s law), thread safety, and cache coherence.
-
Chapter 15, Advanced Parallel Systems and Programming Models: Introduces the basics of distributed memory systems and the Message Passing Interface (MPI), hardware accelerators and CUDA, and cloud computing and MapReduce.
Example Uses of This Book
Dive into Systems can be used as a primary textbook for courses that introduce computer systems topics, or individual chapters can be used to provide background information in courses that cover topics in more depth.
As examples from the authors' two institutions, we have been using it as the primary textbook for two different intermediate-level courses:
-
Introduction To Computer Systems at Swarthmore College. Chapter ordering: 4, 1 (some 3), 5, 6, 7, 10, 2 (more 3), 11, 13, 14.
-
Computer Organization at West Point. Chapter ordering: 1, 4, 2 (some 3), 6, 7, 10, 11, 12, 13, 14, 15.
Additionally, we use individual chapters as background reading in many of our upper-level courses, including:
Upper-level Course Topic | Chapters for Background Readings |
---|---|
Architecture |
5, 11 |
Compilers |
6, 7, 8, 9, 10, 11, 12 |
Database Systems |
11, 14, 15 |
Networking |
4, 13, 14 |
Operating Systems |
11, 13, 14 |
Parallel and Distributed Systems |
11, 13, 14, 15 |
Finally, Chapters 2 and 3 are used as C programming and debugging references in many of our courses.
Available Online
The free online version of our textbook is available at https://diveintosystems.org/.
0. Introduction
Dive into the fabulous world of computer systems! Understanding what a computer system is and how it runs your programs can help you to design code that runs efficiently and that can make the best use of the power of the underlying system. In this book, we take you on a journey through computer systems. You will learn how your program written in a high-level programming language (we use C) executes on a computer. You will learn how program instructions translate into binary and how circuits execute their binary encoding. You will learn how an operating system manages programs running on the system. You will learn how to write programs that can make use of multicore computers. Throughout, you will learn how to evaluate the systems costs associated with program code and how to design programs to run efficiently.
What Is a Computer System?
A computer system combines the computer hardware and special system software that together make the computer usable by users and programs. Specifically, a computer system has the following components (see Figure 1):
-
Input/output (IO) ports enable the computer to take information from its environment and display it back to the user in some meaningful way.
-
A central processing unit (CPU) runs instructions and computes data and memory addresses.
-
Random access memory (RAM) stores the data and instructions of running programs. The data and instructions in RAM are typically lost when the computer system loses power.
-
Secondary storage devices like hard disks store programs and data even when power is not actively being provided to the computer.
-
An operating system (OS) software layer lies between the hardware of the computer and the software that a user runs on the computer. The OS implements programming abstractions and interfaces that enable users to easily run and interact with programs on the system. It also manages the underlying hardware resources and controls how and when programs execute. The OS implements abstractions, policies, and mechanisms to ensure that multiple programs can simultaneously run on the system in an efficient, protected, and seamless manner.
The first four of these define the computer hardware component of a computer system. The last item (the operating system) represents the main software part of the computer system. There may be additional software layers on top of an OS that provide other interfaces to users of the system (e.g., libraries). However, the OS is the core system software that we focus on in this book.
We focus specifically on computer systems that have the following qualities:
-
They are general purpose, meaning that their function is not tailored to any specific application.
-
They are reprogrammable, meaning that they support running a different program without modifying the computer hardware or system software.
To this end, many devices that may "compute" in some form do not fall into the category of a computer system. Calculators, for example, typically have a processor, limited amounts of memory, and I/O capability. However, calculators typically do not have an operating system (advanced graphing calculators like the TI-89 are a notable exception to this rule), do not have secondary storage, and are not general purpose.
Another example that bears mentioning is the microcontroller, a type of integrated circuit that has many of the same capabilities as a computer. Microcontrollers are often embedded in other devices (such as toys, medical devices, cars, and appliances), where they control a specific automatic function. Although microcontrollers are general purpose, reprogrammable, contain a processor, internal memory, secondary storage, and are I/O capable, they lack an operating system. A microcontroller is designed to boot and run a single specific program until it loses power. For this reason, a microcontroller does not fit our definition of a computer system.
What Do Modern Computer Systems Look Like?
Now that we have established what a computer system is (and isn’t), let’s discuss what computer systems typically look like. Figure 2 depicts two types of computer hardware systems (excluding peripherals): a desktop computer (left) and a laptop computer (right). A U.S. quarter on each device gives the reader an idea of the size of each unit.
Notice that both contain the same hardware components, though some of the components may have a smaller form factor or be more compact. The DVD/CD bay of the desktop was moved to the side to show the hard drive underneath — the two units are stacked on top of each other. A dedicated power supply helps provide the desktop power.
In contrast, the laptop is flatter and more compact (note that the quarter in this picture appears a bit bigger). The laptop has a battery and its components tend to be smaller. In both the desktop and the laptop, the CPU is obscured by a heavyweight CPU fan, which helps keep the CPU at a reasonable operating temperature. If the components overheat, they can become permanently damaged. Both units have dual inline memory modules (DIMM) for their RAM units. Notice that laptop memory modules are significantly smaller than desktop modules.
In terms of weight and power consumption, desktop computers typically consume 100 - 400 W of power and typically weigh anywhere from 5 to 20 pounds. A laptop typically consumes 50 - 100 W of power and uses an external charger to supplement the battery as needed.
The trend in computer hardware design is toward smaller and more compact devices. Figure 3 depicts a Raspberry Pi single-board computer. A single-board computer (SBC) is a device in which the entirety of the computer is printed on a single circuit board.
The Raspberry Pi SBC contains a system-on-a-chip (SoC) processor with integrated RAM and CPU, which encompasses much of the laptop and desktop hardware shown in Figure 2. Unlike laptop and desktop systems, the Raspberry Pi is roughly the size of a credit card, weighs 1.5 ounces (about a slice of bread), and consumes about 5 W of power. The SoC technology found on the Raspberry Pi is also commonly found in smartphones. In fact, the smartphone is another example of a computer system!
Lastly, all of the aforementioned computer systems (Raspberry Pi and smartphones included) have multicore processors. In other words, their CPUs are capable of executing multiple programs simultaneously. We refer to this simultaneous execution as parallel execution. Basic multicore programming is covered in Chapter 14 of this book.
All of these different types of computer hardware systems can run one or more general purpose operating systems, such as macOS, Windows, or Unix. A general-purpose operating system manages the underlying computer hardware and provides an interface for users to run any program on the computer. Together these different types of computer hardware running different general-purpose operating systems make up a computer system.
What You Will Learn In This Book
By the end of this book, you will know the following:
How a computer runs a program: You will be able to describe, in detail, how a program expressed in a high-level programming language gets executed by the low-level circuitry of the computer hardware. Specifically, you will know:
-
how program data gets encoded into binary and how the hardware performs arithmetic on it
-
how a compiler translates C programs into assembly and binary machine code (assembly is the human-readable form of binary machine code)
-
how a CPU executes binary instructions on binary program data, from basic logic gates to complex circuits that store values, perform arithmetic, and control program execution
-
how the OS implements the interface for users to run programs on the system and how it controls program execution on the system while managing the system’s resources.
How to evaluate systems costs associated with a program’s performance: A program runs slowly for a number of reasons. It could be a bad algorithm choice or simply bad choices on how your program uses system resources. You will understand the Memory Hierarchy and its effects on program performance, and the operating systems costs associated with program performance. You will also learn some valuable tips for code optimization. Ultimately, you will be able to design programs that use system resources efficiently, and you will know how to evaluate the systems costs associated with program execution.
How to leverage the power of parallel computers with parallel programming: Taking advantage of parallel computing is important in today’s multicore world. You will learn to exploit the multiple cores on your CPU to make your program run faster. You will know the basics of multicore hardware, the OS’s thread abstraction, and issues related to multithreaded parallel program execution. You will have experience with parallel program design and writing multithreaded parallel programs using the POSIX thread library (Pthreads). You will also have an introduction to other types of parallel systems and parallel programming models.
Along the way, you will also learn many other important details about computer systems, including how they are designed and how they work. You will learn important themes in systems design and techniques for evaluating the performance of systems and programs. You’ll also master important skills, including C and assembly programming and debugging.
Getting Started with This Book
A few notes about languages, book notation, and recommendations for getting started reading this book:
Linux, C, and the GNU Compiler
We use the C programming language in examples throughout the book. C is a high-level programming language like Java and Python, but it is less abstracted from the underlying computer system than many other high-level languages. As a result, C is the language of choice for programmers who want more control over how their program executes on the computer system.
The code and examples in this book are compiled using the GNU C Compiler (GCC) and run on the Linux operating system. Although not the most common mainstream OS, Linux is the dominant OS on supercomputing systems and is arguably the most commonly used OS by computer scientists.
Linux is also free and open source, which contributes to its popular use in these settings. A working knowledge of Linux is an asset to all students in computing. Similarly, GCC is arguably the most common C compiler in use today. As a result, we use Linux and GCC in our examples. However, other Unix systems and compilers have similar interfaces and functionality.
In this book, we encourage you to type along with the listed examples. Linux commands appear in blocks like the following:
$
The $
represents the command prompt. If you see a box that looks
like
$ uname -a
this is an indication to type uname -a
on the command line. Make sure that
you don’t type the $
sign!
The output of a command is usually shown directly after the command in a command line listing.
As an example, try typing in uname -a
. The output of this command varies from system to
system. Sample output for a 64-bit system is shown here.
$ uname -a Linux Fawkes 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The uname
command prints out information about a particular system. The -a
flag prints out all relevant information associated with the system
in the following order:
-
The kernel name of the system (in this case Linux)
-
The hostname of the machine (e.g., Fawkes)
-
The kernel release (e.g., 4.4.0-171-generic)
-
The kernel version (e.g., #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019)
-
The machine hardware (e.g., x86_64)
-
The type of processor (e.g., x86_64)
-
The hardware platform (e.g., x86_64)
-
The operating system name (e.g., GNU/Linux)
You can learn more about the uname
command or any other Linux command by prefacing the command with man
,
as shown here:
$ man uname
This command brings up the manual page associated with the uname
command. To quit out of this interface,
press the q
key.
While a detailed coverage of Linux is beyond the scope of this book, readers can get a good introduction in the online Appendix 2 - Using UNIX. There are also several online resources that can give readers a good overview. One recommendation is "The Linux Command Line"1.
Other Types of Notation and Callouts
Aside from the command line and code snippets, we use several other types of "callouts" to represent content in this book.
The first is the aside. Asides are meant to provide additional context to the text, usually historical. Here’s a sample aside:
The second type of callout we use in this text is the note. Notes are used to highlight important information, such as the use of certain types of notation or suggestions on how to digest certain information. A sample note is shown below:
How to do the readings in this book
As a student, it is important to do the readings in the textbook. Notice that we say "do" the readings, not simply "read" the readings. To "read" a text typically implies passively imbibing words off a page. We encourage students to take a more active approach. If you see a code example, try typing it in! It’s OK if you type in something wrong, or get errors; that’s the best way to learn! In computing, errors are not failures — they are simply experience. |
The last type of callout that students should pay specific attention to is the warning. The authors use warnings to highlight things that are common "gotchas" or a common cause of consternation among our own students. Although all warnings may not be equally valuable to all students, we recommend that you review warnings to avoid common pitfalls whenever possible. A sample warning is shown here:
This book contains puns
The authors (especially the first author) are fond of puns and musical parodies related to computing (and not necessarily good ones). Adverse reactions to the authors' sense of humor may include (but are not limited to) eye-rolling, exasperated sighs, and forehead slapping. |
If you are ready to get started, please continue on to the first chapter as we dive into the wonderful world of C. If you already know some C programming, you may want to start with Chapter 4 on binary representation, or continue with more advanced C programming in Chapter 2.
We hope you enjoy your journey with us!
References
-
William Shotts. "The Linux Command Line", LinuxCommand.org, https://linuxcommand.org/
1. By the C, by the C, by the Beautiful C
"By the Beautiful Sea", Carroll and Atteridge, 1914
This chapter presents an overview of C programming written for students who have some experience programming in another language. It’s specifically written for Python programmers and uses a few Python examples for comparison purposes (Appendix 1 is a version of Chapter 1 for Java programmers). However, it should be useful as an introduction to C programming for anyone with basic programming experience in any language.
C is a high-level programming language like other languages you might know,
such as Python, Java, Ruby, or C++. It’s an imperative and a procedural
programming language, which means that a C program is expressed as a sequence
of statements (steps) for the computer to execute and that C programs are
structured as a set of functions (procedures). Every C program must have at
least one function, the main
function, which contains the set of statements
that execute when the program begins.
The C programming language is less abstracted from the computer’s machine language than some other languages with which you might be familiar. This means that C doesn’t have support for object-oriented programming (like Python, Java, and C++) or have a rich set of high-level programming abstractions (such as strings, lists, and dictionaries in Python). As a result, if you want to use a dictionary data structure in your C program, you need to implement it yourself, as opposed to just importing the one that is part of the programming language (as in Python).
C’s lack of high-level abstractions might make it seem like a less appealing programming language to use. However, being less abstracted from the underlying machine makes C easier for a programmer to see and understand the relationship between a program’s code and the computer’s execution of it. C programmers retain more control over how their programs execute on the hardware, and they can write code that runs more efficiently than equivalent code written using the higher-level abstractions provided by other programming languages. In particular, they have more control over how their programs manage memory, which can have a significant impact on performance. Thus, C remains the de facto language for computer systems programming where low-level control and efficiency are crucial.
We use C in this book because of its expressiveness of program control and its relatively straightforward translation to assembly and machine code that a computer executes. This chapter introduces programming in C, beginning with an overview of its features. Chapter 2 then describes C’s features in more detail.
1.1. Getting Started Programming in C
Let’s start by looking at a "hello world" program that includes an example of
calling a function from the math library. In Table 1 we compare the C
version of this program to the Python version. The C version might be put in a
file named hello.c
(.c
is the suffix convention for C source code files),
whereas the Python version might be in a file named hello.py
.
Python version (hello.py) | C version (hello.c) |
---|---|
|
|
Notice that both versions of this program have similar structure and language constructs, albeit with different language syntax. In particular:
Comments:
-
In Python, multiline comments begin and end with
'''
, and single-line comments begin with#
. -
In C, multiline comments begin with
/*
and end with*/
, and single-line comments begin with//
.
Importing library code:
-
In Python, libraries are included (imported) using
import
. -
In C, libraries are included (imported) using
#include
. All#include
statements appear at the top of the program, outside of function bodies.
Blocks:
-
In Python, indentation denotes a block.
-
In C, blocks (for example, function, loop, and conditional bodies) start with
{
and end with}
.
The main function:
-
In Python,
def main():
defines the main function. -
In C,
int main(void){ }
defines the main function. Themain
function returns a value of typeint
, which is C’s name for specifying the signed integer type (signed integers are values like -3, 0, 1234). Themain
function returns theint
value 0 to signify running to completion without error. Thevoid
means it doesn’t expect to receive a parameter. Future sections show howmain
can take parameters to receive command line arguments.
Statements:
-
In Python, each statement is on a separate line.
-
In C, each statement ends with a semicolon
;
. In C, statements must be within the body of some function (inmain
in this example).
Output:
-
In Python, the
print
function prints a formatted string. Values for the placeholders in the format string follow a%
symbol in a comma-separated list of values (for example, the value ofsqrt(4)
will be printed in place of the%f
placeholder in the format string). -
In C, the
printf
function prints a formatted string. Values for the placeholders in the format string are additional arguments separated by commas (for example, the value ofsqrt(4)
will be printed in place of the%f
placeholder in the format string).
There are a few important differences to note in the C and Python versions of this program:
Indentation: In C, indentation doesn’t have meaning, but it’s good programming style to indent statements based on the nested level of their containing block.
Output: C’s printf
function doesn’t automatically print a newline character
at the end like Python’s print
function does. As a result, C programmers
need to explicitly specify a newline character (\n
) in the format string when
a newline is desired in the output.
main
function:
-
A C program must have a function named
main
, and its return type must beint
. This means that themain
function returns a signed integer type value. Python programs don’t need to name their main functionmain
, but they often do by convention. -
The C
main
function has an explicitreturn
statement to return anint
value (by convention,main
should return0
if the main function is successfully executed without errors). -
A Python program needs to include an explicit call to its
main
function to run it when the program executes. In C, itsmain
function is automatically called when the C program executes.
1.1.1. Compiling and Running C Programs
Python is an interpreted programming language, which means that another
program, the Python interpreter, runs Python programs: the Python interpreter
acts like a virtual machine on which Python programs are run. To run a Python
program, the program source code (hello.py
) is given as input to the Python
interpreter program that runs it. For example ($
is the Linux shell prompt):
$ python hello.py
The Python interpreter is a program that is in a form that can be run directly on the underlying system (this form is called binary executable) and takes as input the Python program that it runs (Figure 4).
To run a C program, it must first be translated into a form that a computer system can directly execute. A C compiler is a program that translates C source code into a binary executable form that the computer hardware can directly execute. A binary executable consists of a series of 0’s and 1’s in a well-defined format that a computer can run.
For example, to run the C program hello.c
on a Unix system, the C code must
first be compiled by a C compiler (for example, the GNU C
compiler, GCC) that produces a binary executable (by default named a.out
).
The binary executable version of the program can then be run directly on the
system (Figure 5):
$ gcc hello.c
$ ./a.out
(Note that some C compilers might need to be explicitly told to link in the math
library: -lm
):
$ gcc hello.c -lm
Detailed Steps
In general, the following sequence describes the necessary steps for editing, compiling, and running a C program on a Unix system:
-
Using a text editor (for example,
vim
), write and save your C source code program in a file (e.g.,hello.c
):$ vim hello.c
-
Compile the source to an executable form, and then run it. The most basic syntax for compiling with
gcc
is:$ gcc <input_source_file>
If compilation yields no errors, the compiler creates a binary executable file named a.out
. The compiler also allows you to specify the name of the binary executable file to generate using the -o
flag:
$ gcc -o <output_executable_file> <input_source_file>
For example, this command instructs gcc
to compile hello.c
into an
executable file named hello
:
$ gcc -o hello hello.c
We can invoke the executable program using ./hello
:
$ ./hello
Any changes made to the C source code (the hello.c
file) must be recompiled
with gcc
to produce a new version of hello
. If the compiler detects any
errors during compilation, the ./hello
file won’t be created/re-created (but
beware, an older version of the file from a previous successful compilation might
still exist).
Often when compiling with gcc
, you want to include several command line
options. For example, these options enable more compiler warnings and build a
binary executable with extra debugging information:
$ gcc -Wall -g -o hello hello.c
Because the gcc
command line can be long, frequently the make
utility is
used to simplify compiling C programs and for cleaning up files created by
gcc
.
Using make
and writing Makefiles are important skills that you will develop as you build
up experience with C programming.
We cover compiling and linking with C library code in more detail at the end of Chapter 2.
1.1.2. Variables and C Numeric Types
Like Python, C uses variables as named storage locations for holding data. Thinking about the scope and type of program variables is important to understand the semantics of what your program will do when you run it. A variable’s scope defines when the variable has meaning (that is, where and when in your program it can be used) and its lifetime (that is, it could persist for the entire run of a program or only during a function activation). A variable’s type defines the range of values that it can represent and how those values will be interpreted when performing operations on its data.
In C, all variables must be declared before they can be used. To declare a variable, use the following syntax:
type_name variable_name;
A variable can have only a single type. The basic C types include char
,
int
, float
, and double
. By convention, C variables should be declared at
the beginning of their scope (at the top of a { }
block), before any C
statements in that scope.
Below is an example C code snippet that shows declarations and uses of variables of some different types. We discuss types and operators in more detail after the example.
{
/* 1. Define variables in this block's scope at the top of the block. */
int x; // declares x to be an int type variable and allocates space for it
int i, j, k; // can define multiple variables of the same type like this
char letter; // a char stores a single-byte integer value
// it is often used to store a single ASCII character
// value (the ASCII numeric encoding of a character)
// a char in C is a different type than a string in C
float winpct; // winpct is declared to be a float type
double pi; // the double type is more precise than float
/* 2. After defining all variables, you can use them in C statements. */
x = 7; // x stores 7 (initialize variables before using their value)
k = x + 2; // use x's value in an expression
letter = 'A'; // a single quote is used for single character value
letter = letter + 1; // letter stores 'B' (ASCII value one more than 'A')
pi = 3.1415926;
winpct = 11 / 2.0; // winpct gets 5.5, winpct is a float type
j = 11 / 2; // j gets 5: int division truncates after the decimal
x = k % 2; // % is C's mod operator, so x gets 9 mod 2 (1)
}
Note the semicolons galore. Recall that C statements are delineated by ;
,
not line breaks — C expects a semicolon after every statement. You’ll forget
some, and gcc
almost never informs you that you missed a semicolon, even
though that might be the only syntax error in your program. In fact, often
when you forget a semicolon, the compiler indicates a syntax error on the
line after the one with the missing semicolon: the reason is that gcc
interprets it as part of the statement from the previous line. As you continue
to program in C, you’ll learn to correlate gcc
errors with the specific C
syntax mistakes that they describe.
1.1.3. C Types
C supports a small set of built-in data types, and it provides a few ways in which programmers can construct basic collections of types (arrays and structs). From these basic building blocks, a C programmer can build complex data structures.
C defines a set of basic types for storing numeric values. Here are some examples of numeric literal values of different C types:
8 // the int value 8
3.4 // the double value 3.4
'h' // the char value 'h' (its value is 104, the ASCII value of h)
The C char
type stores a numeric value. However, it’s often used by
programmers to store the value of an ASCII character. A character literal
value is specified in C as a single character between single quotes.
C doesn’t support a string type, but programmers can create strings from the
char
type and C’s support for constructing arrays of values, which we discuss
in later sections. C does, however, support a way of expressing string literal
values in programs: a string literal is any sequence of characters between
double quotes. C programmers often pass string literals as the format string
argument to printf
:
printf("this is a C string\n");
Python supports strings, but it doesn’t have a char
type. In C, a string and
a char
are two very different types, and they evaluate differently. This
difference is illustrated by contrasting a C string literal that contains one
character with a C char
literal. For example:
'h' // this is a char literal value (its value is 104, the ASCII value of h)
"h" // this is a string literal value (its value is NOT 104, it is not a char)
We discuss C strings and char
variables in more detail in the
Strings section later in
this chapter. Here, we’ll mainly focus on C’s numeric types.
C Numeric Types
C supports several different types for storing numeric values. The types
differ in the format of the numeric values they represent. For example, the
float
and double
types can represent real values, int
represents signed
integer values, and unsigned int
represents unsigned integer values. Real
values are positive or negative values with a decimal point, such as -1.23
or
0.0056
. Signed integers store positive, negative, or zero integer values,
such as -333
, 0
, or 3456
. Unsigned integers store strictly nonnegative
integer values, such as 0
or 1234
.
C’s numeric types also differ in the range and precision of the values they can represent. The range or precision of a value depends on the number of bytes associated with its type. Types with more bytes can represent a larger range of values (for integer types), or higher-precision values (for real types), than types with fewer bytes.
Table 2 shows the number of storage bytes, the kind of numeric values stored, and how to declare a variable for a variety of common C numeric types (note that these are typical sizes — the exact number of bytes depends on the hardware architecture).
Type name | Usual size | Values stored | How to declare |
---|---|---|---|
|
1 byte |
integers |
|
|
2 bytes |
signed integers |
|
|
4 bytes |
signed integers |
|
|
4 or 8 bytes |
signed integers |
|
|
8 bytes |
signed integers |
|
|
4 bytes |
signed real numbers |
|
|
8 bytes |
signed real numbers |
|
C also provides unsigned versions of the integer numeric types (char
,
short
, int
, long
, and long long
). To declare a variable as unsigned,
add the keyword unsigned
before the type name. For example:
int x; // x is a signed int variable
unsigned int y; // y is an unsigned int variable
The C standard doesn’t specify whether the char
type is signed or unsigned.
As a result, some implementations might implement char
as signed integer values
and others as unsigned. It’s good programming practice to explicitly declare
unsigned char
if you want to use the unsigned version of a char
variable.
The exact number of bytes for each of the C types might vary from one
architecture to the next. The sizes in Table 2 are minimum (and
common) sizes for each type. You can print the exact size on a given machine
using C’s sizeof
operator, which takes the name of a type as an argument and
evaluates to the number of bytes used to store that type. For example:
printf("number of bytes in an int: %lu\n", sizeof(int));
printf("number of bytes in a short: %lu\n", sizeof(short));
The sizeof
operator evaluates to an unsigned long value, so in the call to
printf
, use the placeholder %lu
to print its value. On most architectures
the output of these statements will be:
number of bytes in an int: 4 number of bytes in a short: 2
Arithmetic Operators
Arithmetic operators combine values of numeric types. The resulting type of
the operation is based on the types of the operands. For example, if two int
values are combined with an arithmetic operator, the resulting type is also an
integer.
C performs automatic type conversion when an operator combines operands of
two different types. For example, if an int
operand is combined with
a float
operand, the integer operand is first converted to its floating-point
equivalent before the operator is applied, and the type of the operation’s result
is float
.
The following arithmetic operators can be used on most numeric type operands:
-
add (
+
) and subtract (-
) -
multiply (
*
), divide (/
), and mod (%
):The mod operator (
%
) can only take integer-type operands (int
,unsigned int
,short
, and so on).If both operands are
int
types, the divide operator (/
) performs integer division (the resulting value is anint
, truncating anything beyond the decimal point from the division operation). For example8/3
evaluates to2
.If one or both of the operands are
float
(ordouble
),/
performs real division and evaluates to afloat
(ordouble
) result. For example,8 / 3.0
evaluates to approximately2.666667
. -
assignment (
=
):variable = value of expression; // e.g., x = 3 + 4;
-
assignment with update (
+=
,-=
,*=
,/=
, and%=
):variable op= expression; // e.g., x += 3; is shorthand for x = x + 3;
-
increment (
++
) and decrement (--
):variable++; // e.g., x++; assigns to x the value of x + 1
Pre- vs. Post-increment
The operators
In many cases, it doesn’t matter which you use because the value of the incremented or decremented variable isn’t being used in the statement. For example, these two statements are equivalent (although the first is the most commonly used syntax for this statement):
In some cases, the context affects the outcome (when the value of the incremented or decremented variable is being used in the statement). For example:
Code like the preceding example that uses an arithmetic expression with an
increment operator is often hard to read, and it’s easy to get wrong. As a
result, it’s generally best to avoid writing code like this; instead, write
separate statements for exactly the order you want. For example, if you want
to first increment Instead of writing this:
write it as two separate statements:
|
1.2. Input/Output (printf and scanf)
C’s printf
function prints values to the terminal, and the scanf
function
reads in values entered by a user. The printf
and scanf
functions belong
to C’s standard I/O library, which needs to be explicitly included at the top
of any .c
file that uses these functions by using #include <stdio.h>
. In
this section, we introduce the basics of using printf
and scanf
in C
programs. The "I/O" section in Chapter 2 discusses
C’s input and output functions in more detail.
1.2.1. printf
C’s printf
function is very similar to formatted print in Python, where the
caller specifies a format string to print. The format string often contains
formatting specifiers, such as special characters that will print tabs (\t
)
or newlines (\n
), or placeholders for values in the output. Placeholders
consist of %
followed by a type specifier letter (for example, %d
represents a placeholder for an integer value). For each placeholder in the
format string, printf
expects an additional argument. Table 3
contains an example program in Python and C with formatted output:
Python version | C version |
---|---|
|
|
When run, both versions of this program produce identically formatted output:
Name: Vijay, Info: Age: 20 Ht: 5.9 Year: 3 Dorm: Alice Paul
The main difference between C’s printf
and Python’s print
functions are
that the Python version implicitly prints a newline character at the end of the
output string, but the C version does not. As a result, the C format strings in
this example have newline (\n
) characters at the end to explicitly print a
newline character. The syntax for listing the argument values for the
placeholders in the format string is also slightly different in C’s printf
and Python’s print
functions.
C uses the same formatting placeholders as Python for specifying different types of values. The preceding example demonstrates the following formatting placeholders:
%g: placeholder for a float (or double) value %d: placeholder for a decimal value (int, short, char) %s: placeholder for a string value
C additionally supports the %c
placeholder for printing a character value.
This placeholder is useful when a programmer wants to print the ASCII character
associated with a particular numeric encoding. Here’s a C code snippet that
prints a char
as its numeric value (%d
) and as its character encoding
(%c
):
// Example printing a char value as its decimal representation (%d)
// and as the ASCII character that its value encodes (%c)
char ch;
ch = 'A';
printf("ch value is %d which is the ASCII value of %c\n", ch, ch);
ch = 99;
printf("ch value is %d which is the ASCII value of %c\n", ch, ch);
When run, the program’s output looks like this:
ch value is 65 which is the ASCII value of A ch value is 99 which is the ASCII value of c
1.2.2. scanf
C’s scanf
function represents one method for reading in values entered by the
user (via the keyboard) and storing them in program variables. The scanf
function can be a bit picky about the exact format in which the user enters
data, which means that it’s not very robust to badly formed user input. In the
"I/O" section in Chapter 2, we discuss more robust
ways of reading input values from the user. For now, remember that if your
program gets into an infinite loop due to badly formed user input, you can
always press CTRL-C to terminate it.
Reading input is handled differently in Python and C: Python uses the input
function to read in a value as a string, and then the program converts the
string value to an int
, whereas C uses scanf
to read in an int
value and to
store it at the location in memory of an int
program variable (for example,
&num1
). Table 4 displays example programs for reading user
input values in Python and C:
Python version | C version |
---|---|
|
|
When run, both programs read in two values (here, 30 and 67):
Enter a number: 30 Enter another: 67 30 + 67 = 97
Like printf
, scanf
takes a format string that specifies the number and types
of values to read in (for example, "%d"
specifies one int
value). The
scanf
function skips over leading and trailing whitespace as it reads in a
numeric value, so its format string only needs to contain a sequence of
formatting placeholders, usually with no whitespace or other formatting
characters between the placeholders in its format string. The arguments for
the placeholders in the format string specify the locations of program
variables into which the values read in will be stored. Prefixing the name of
a variable with the &
operator produces the location of that variable in the
program’s memory — the memory address of the variable. The
"Pointers" section in Chapter 2
discusses the &
operator in more detail. For now, we use it only in the
context of the scanf
function.
Here’s another scanf
example, in which the format string has placeholders
for two values, the first an int
and the second a float
:
int x;
float pi;
// read in an int value followed by a float value ("%d%g")
// store the int value at the memory location of x (&x)
// store the float value at the memory location of pi (&pi)
scanf("%d%g", &x, &pi);
When inputting data to a program via scanf
, individual numeric input values
must be separated by at least one whitespace character. However, because
scanf
skips over additional leading and trailing whitespace characters (for
example, spaces, tabs, and newlines), a user could enter input values with any
amount of space before or after each input value. For instance, if a user
enters the following for the call to scanf
in the preceding example, scanf
will read in 8 and store it in the x
variable, and then read in 3.14 and
store it in the pi
variable:
8 3.14
1.3. Conditionals and Loops
Table 5 shows that the syntax and semantics of if
-else
statements in C and Python are very similar. The main syntactic difference is
that Python uses indentation to indicate "body" statements, whereas C uses
curly braces (but you should still use good indentation in your C code).
Python version | C version |
---|---|
|
|
The Python and C syntax for if
-else
statements is almost identical with
only minor differences. In both, the else
part is optional. Python and C
also support multiway branching by chaining if
and else if
statements. The
following describes the full if
-else
C syntax:
// a one-way branch:
if ( <boolean expression> ) {
<true body>
}
// a two-way branch:
if ( <boolean expression> ) {
<true body>
}
else {
<false body>
}
// a multibranch (chaining if-else if-...-else)
// (has one or more 'else if' following the first if):
if ( <boolean expression 1> ) {
<true body>
}
else if ( <boolean expression 2> ) {
// first expression is false, second is true
<true 2 body>
}
else if ( <boolean expression 3> ) {
// first and second expressions are false, third is true
<true 3 body>
}
// ... more else if's ...
else if ( <boolean expression N> ) {
// first N-1 expressions are false, Nth is true
<true N body>
}
else { // the final else part is optional
// if all previous expressions are false
<false body>
}
1.3.1. Boolean Values in C
C doesn’t provide a Boolean type with true or false values. Instead, integer values evaluate to true or false when used in conditional statements. When used in conditional expressions, any integer expression that is:
-
zero (0) evaluates to false
-
nonzero (any positive or negative value) evaluates to true
C has a set of relational and logical operators for Boolean expressions.
The relational operators take operand(s) of the same type and evaluate to zero (false) or nonzero (true). The set of relational operators are:
-
equality (
==
) and inequality (not equal,!=
) -
comparison operators: less than (
<
), less than or equal (<=
), greater than (>
), and greater than or equal (>=
)
Here are some C code snippets showing examples of relational operators:
// assume x and y are ints, and have been assigned
// values before this point in the code
if (y < 0) {
printf("y is negative\n");
} else if (y != 0) {
printf("y is positive\n");
} else {
printf("y is zero\n");
}
// set x and y to the larger of the two values
if (x >= y) {
y = x;
} else {
x = y;
}
C’s logical operators take integer "Boolean" operand(s) and evaluate to either zero (false) or nonzero (true). The set of logical operators are:
-
logical negation (
!
) -
logical and (
&&
): stops evaluating at the first false expression (short-circuiting) -
logical or (
||
): stops evaluating at the first true expression (short-circuiting)
C’s short-circuit logical operator evaluation stops evaluating a logical
expression as soon as the result is known. For example, if the first operand
to a logical and (&&
) expression evaluates to false, the result of the &&
expression must be false. As a result, the second operand’s value need not be
evaluated, and it is not evaluated.
The following is an example of conditional statements in C that use logical operators (it’s always best to use parentheses around complex Boolean expressions to make them easier to read):
if ( (x > 10) && (y >= x) ) {
printf("y and x are both larger than 10\n");
x = 13;
} else if ( ((-x) == 10) || (y > x) ) {
printf("y might be bigger than x\n");
x = y * x;
} else {
printf("I have no idea what the relationship between x and y is\n");
}
1.3.2. Loops in C
Like Python, C supports for
and while
loops. Additionally, C provides
do
-while
loops.
while Loops
The while
loop syntax in C and Python is almost identical, and the behavior
is the same. Table 6 shows example programs of while
loops
in C and Python.
Python version | C version |
---|---|
|
|
The while
loop syntax in C is very similar in Python, and both are evaluated
in the same way:
while ( <boolean expression> ) {
<true body>
}
The while
loop checks the Boolean expression first and executes the body if
true. In the preceding example program, the value of the val
variable will be
repeatedly printed in the while
loop until its value is greater than the
value of the num
variable. If the user enters 10
, the C and Python
programs will print:
1 2 4 8
C also has a do
-while
loop that is similar to its while
loop, but
it executes the loop body first and then checks a condition and repeats
executing the loop body for as long as the condition is true. That is,
a do
-while
loop will always execute the loop body at least one time:
do {
<body>
} while ( <boolean expression> );
For additional while
loop examples, try these two programs:
for Loops
The for
loop is different in C than it is in Python. In Python, for
loops
are iterations over sequences, whereas in C, for
loops are more general
looping constructs. Table 7 shows example programs that use for
loops to print all the values between 0 and a user-provided input number:
Python version | C version |
---|---|
|
|
In this example, you can see that the C for
loop syntax is quite different
from the Python for
loop syntax. It’s also evaluated differently.
The C for
loop syntax is:
for ( <initialization>; <boolean expression>; <step> ) {
<body>
}
The for
loop evaluation rules are:
-
Evaluate initialization one time when first entering the loop.
-
Evaluate the boolean expression. If it’s 0 (false), drop out of the
for
loop (that is, the program is done repeating the loop body statements). -
Evaluate the statements inside the loop body.
-
Evaluate the step expression.
-
Repeat from step (2).
Here’s a simple example for
loop to print the values 0, 1, and 2:
int i;
for (i = 0; i < 3; i++) {
printf("%d\n", i);
}
Executing the for
loop evaluation rules on the preceding loop yields the
following sequence of actions:
(1) eval init: i is set to 0 (i=0) (2) eval bool expr: i < 3 is true (3) execute loop body: print the value of i (0) (4) eval step: i is set to 1 (i++) (2) eval bool expr: i < 3 is true (3) execute loop body: print the value of i (1) (4) eval step: i is set to 2 (i++) (2) eval bool expr: i < 3 is true (3) execute loop body: print the value of i (2) (4) eval step: i is set to 3 (i++) (2) eval bool expr: i < 3 is false, drop out of the for loop
The following program shows a more complicated for
loop example (it’s also
available to download). Note that just
because C supports for
loops with a list of statements for its
initialization and step parts, it’s best to keep it simple (this example
illustrates a more complicated for
loop syntax, but the for
loop would be
easier to read and understand if it were simplified by moving the j += 10
step
statement to the end of the loop body and having just a single step statement,
i += 1
).
/* An example of a more complex for loop which uses multiple variables.
* (it is unusual to have for loops with multiple statements in the
* init and step parts, but C supports it and there are times when it
* is useful...don't go nuts with this just because you can)
*/
#include <stdio.h>
int main(void) {
int i, j;
for (i=0, j=0; i < 10; i+=1, j+=10) {
printf("i+j = %d\n", i+j);
}
return 0;
}
// the rules for evaluating a for loop are the same no matter how
// simple or complex each part is:
// (1) evaluate the initialization statements once on the first
// evaluation of the for loop: i=0 and j=0
// (2) evaluate the boolean condition: i < 10
// if false (when i is 10), drop out of the for loop
// (3) execute the statements inside the for loop body: printf
// (4) evaluate the step statements: i += 1, j += 10
// (5) repeat, starting at step (2)
In C, for
loops and while
loops are equivalent in power, meaning that any
while
loop can be expressed as a for
loop, and vice versa. The same is not true
in Python, where for
loops are iterations over a sequence of values. As
such, they cannot express some looping behavior that the more general Python
while
loop can express. Indefinite loops are one example that can only be
written as a while
loop in Python.
Consider the following while
loop in C:
int guess = 0;
while (guess != num) {
printf("%d is not the right number\n", guess);
printf("Enter another guess: ");
scanf("%d", &guess);
}
This loop can be translated to an equivalent for
loop in C:
int guess;
for (guess = 0; guess != num; ) {
printf("%d is not the right number\n", guess);
printf("Enter another guess: ");
scanf("%d", &guess);
}
In Python, however, this type of looping behavior can be expressed only by
using a while
loop.
Because for
and while
loops are equally expressive in C, only one looping
construct is needed in the language. However, for
loops are a more natural
language construct for definite loops (like iterating over a range of values),
whereas while
loops are a more natural language construct for indefinite loops
(like repeating until the user enters an even number). As a result, C provides
both to programmers.
1.4. Functions
Functions break code into manageable pieces and reduce code duplication. Functions might take zero or more parameters as input and they return a single value of a specific type. A function declaration or prototype specifies the function’s name, its return type, and its parameter list (the number and types of all the parameters). A function definition includes the code to be executed when the function is called. All functions in C must be declared before they’re called. This can be done by declaring a function prototype or by fully defining the function before calling it:
// function definition format: // --------------------------- <return type> <function name> (<parameter list>) { <function body> } // parameter list format: // --------------------- <type> <param1 name>, <type> <param2 name>, ..., <type> <last param name>
Here’s an example function definition. Note that the comments describe what the function does, the details of each parameter (what it’s used for and what it should be passed), and what the function returns:
/* This program computes the larger of two
* values entered by the user.
*/
#include <stdio.h>
/* max: computes the larger of two integer values
* x: one integer value
* y: the other integer value
* returns: the larger of x and y
*/
int max(int x, int y) {
int bigger;
bigger = x;
if (y > x) {
bigger = y;
}
printf(" in max, before return x: %d y: %d\n", x, y);
return bigger;
}
Functions that don’t return a value should specify the void
return type.
Here’s an example of a void
function:
/* prints out the squares from start to stop
* start: the beginning of the range
* stop: the end of the range
*/
void print_table(int start, int stop) {
int i;
for (i = start; i <= stop; i++) {
printf("%d\t", i*i);
}
printf("\n");
}
As in any programming language that supports functions or procedures, a function call invokes a function, passing specific argument values for the particular call. A function is called by its name and is passed arguments, with one argument for each corresponding function parameter. In C, calling a function looks like this:
// function call format:
// ---------------------
function_name(<argument list>);
// argument list format:
// ---------------------
<argument 1 expression>, <argument 2 expression>, ..., <last argument expression>
Arguments to C functions are passed by value: each function parameter is assigned the value of the corresponding argument passed to it in the function call by the caller. Pass by value semantics mean that any change to a parameter’s value in the function (that is, assigning a parameter a new value in the function) is not visible to the caller.
Here are some example function calls to the max
and print_table
functions
listed earlier:
int val1, val2, result;
val1 = 6;
val2 = 10;
/* to call max, pass in two int values, and because max returns an
int value, assign its return value to a local variable (result)
*/
result = max(val1, val2); /* call max with argument values 6 and 10 */
printf("%d\n", result); /* prints out 10 */
result = max(11, 3); /* call max with argument values 11 and 3 */
printf("%d\n", result); /* prints out 11 */
result = max(val1 * 2, val2); /* call max with argument values 12 and 10 */
printf("%d\n", result); /* prints out 12 */
/* print_table does not return a value, but takes two arguments */
print_table(1, 20); /* prints a table of values from 1 to 20 */
print_table(val1, val2); /* prints a table of values from 6 to 10 */
Here is another example of a full program that shows a call to a
slightly different implementation of the max
function that has an
additional statement to change the value of its parameter (x = y
):
/* max: computes the larger of two int values
* x: one value
* y: the other value
* returns: the larger of x and y
*/
int max(int x, int y) {
int bigger;
bigger = x;
if (y > x) {
bigger = y;
// note: changing the parameter x's value here will not
// change the value of its corresponding argument
x = y;
}
printf(" in max, before return x: %d y: %d\n", x, y);
return bigger;
}
/* main: shows a call to max */
int main(void) {
int a, b, res;
printf("Enter two integer values: ");
scanf("%d%d", &a, &b);
res = max(a, b);
printf("The larger value of %d and %d is %d\n", a, b, res);
return 0;
}
The following output shows what two runs of this program might look like. Note
the difference in the parameter x’s value (printed from inside the `max
function) in the two runs. Specifically, notice that changing the value of
parameter x
in the second run does not affect the variable that was passed
in as an argument to max
after the call returns:
$ ./a.out Enter two integer values: 11 7 in max, before return x: 11 y: 7 The larger value of 11 and 7 is 11 $ ./a.out Enter two integer values: 13 100 in max, before return x: 100 y: 100 The larger value of 13 and 100 is 100
Because arguments are passed by value to functions, the preceding version of
the max
function that changes one of its parameter values behaves identically
to the original version of max
that does not.
1.4.1. The Stack
The execution stack keeps track of the state of active functions in a program. Each function call creates a new stack frame (sometimes called an activation frame or activation record) containing its parameter and local variable values. The frame on the top of the stack is the active frame; it represents the function activation that is currently executing, and only its local variables and parameters are in scope. When a function is called, a new stack frame is created for it (pushed on the top of the stack), and space for its local variables and parameters is allocated in the new frame. When a function returns, its stack frame is removed from the stack (popped from the top of the stack), leaving the caller’s stack frame on the top of the stack.
For the example preceding program, at the point in its execution right before max
executes the return
statement, the execution stack will look like
Figure 6. Recall that the argument values to max
passed by
main
are passed by value, meaning that the parameters to max
, x
and
y
, are assigned the values of their corresponding arguments, a
and b
from
the call in main
. Despite the max
function changing the value of x
, the
change doesn’t affect the value of a
in main
.
The following full program includes two functions and shows examples of calling
them from the main
function. In this program, we declare function prototypes
for max
and print_table
above the main
function so that main
can access them
despite being defined first. The main
function contains the high-level steps
of the full program, and defining it first echoes the top-down design of the
program. This example includes comments describing the parts of the program
that are important to functions and function calls. You can also download and
run the full program.
/* This file shows examples of defining and calling C functions.
* It also demonstrates using scanf().
*/
#include <stdio.h>
/* This is an example of a FUNCTION PROTOTYPE. It declares just the type
* information for a function (the function's name, return type, and parameter
* list). A prototype is used when code in main wants to call the function
* before its full definition appears in the file.
*/
int max(int n1, int n2);
/* A prototype for another function. void is the return type of a function
* that does not return a value
*/
void print_table(int start, int stop);
/* All C programs must have a main function. This function defines what the
* program does when it begins executing, and it's typically used to organize
* the big-picture behavior of the program.
*/
int main(void) {
int x, y, larger;
printf("This program will operate over two int values.\n");
printf("Enter the first value: ");
scanf("%d", &x);
printf("Enter the second value: ");
scanf("%d", &y);
larger = max(x, y);
printf("The larger of %d and %d is %d\n", x, y, larger);
print_table(x, larger);
return 0;
}
/* This is an example of a FUNCTION DEFINITION. It specifies not only the
* function name and type, but it also fully defines the code of its body.
* (Notice, and emulate, the complete function comment!)
*/
/* Computes the max of two integer values.
* n1: the first value
* n2: the other value
* returns: the larger of n1 and n2
*/
int max(int n1, int n2) {
int result;
result = n1;
if (n2 > n1) {
result = n2;
}
return result;
}
/* prints out the squares from start to stop
* start: the beginning of the range
* stop: the end of the range
*/
void print_table(int start, int stop) {
int i;
for (i = start; i <= stop; i++) {
printf("%d\t", i*i);
}
printf("\n");
}
1.5. Arrays and Strings
An array is a C construct that creates an ordered collection of data elements of the same type and associates this collection with a single program variable. Ordered means that each element is in a specific position in the collection of values (that is, there is an element in position 0, position 1, and so on), not that the values are necessarily sorted. Arrays are one of C’s primary mechanisms for grouping multiple data values and referring to them by a single name. Arrays come in several flavors, but the basic form is a one-dimensional array, which is useful for implementing list-like data structures and strings in C.
1.5.1. Introduction to Arrays
C arrays can store multiple data values of the same type. In this chapter, we discuss statically declared arrays, meaning that the total capacity (the maximum number of elements that can be stored in an array) is fixed and is defined when the array variable is declared. In the next chapter, we discuss dynamically allocated arrays and multi-dimensional arrays.
Table 8 shows Python and C versions of a program that
initializes and then prints a collection of integer values. The Python
version uses its built-in list type to store the list of values, whereas the C
version uses an array of int
types to store the collection of values.
In general, Python provides a high-level list interface to the programmer that
hides much of the low-level implementation details. C, on the other hand,
exposes a low-level array implementation to the programmer and leaves it up to
the programmer to implement higher-level functionality. In other words, arrays
enable low-level data storage without higher-level list functionality, such as
len
, append
, insert
, and so on.
Python version | C version |
---|---|
|
|
The C and Python versions of this program have several similarities,
most notably that individual elements can be accessed via indexing,
and that index values start at 0
. That is, both languages refer to the
very first element in a collection as the element at position 0
.
The main differences in the C and Python versions of this program relate to the capacity of the list or array and how their sizes (number of elements) are determined.
my_lst[3] = 100 # Python syntax to set the element in position 3 to 100.
my_lst[0] = 5 # Python syntax to set the first element to 5.
my_arr[3] = 100; // C syntax to set the element in position 3 to 100.
my_arr[0] = 5; // C syntax to set the first element to 5.
In the Python version, the programmer doesn’t need to specify the capacity of
a list in advance: Python automatically increases a list’s capacity as
needed by the program. For example, the Python append
function automatically
increases the size of the Python list and adds the passed value to the end.
In contrast, when declaring an array variable in C, the programmer must specify its type (the type of each value stored in the array) and its total capacity (the maximum number of storage locations). For example:
int arr[10]; // declare an array of 10 ints
char str[20]; // declare an array of 20 chars
The preceding declarations create one variable named arr
, an array of int
values with a total capacity of 10, and another variable named str
, an array
of char
values with a total capacity of 20.
To compute the size of a list (size meaning the total number of values in the
list), Python provides a len
function that returns the size of any list
passed to it. In C, the programmer has to explicitly keep track of the number
of elements in the array (for example, the size
variable in
Table 8).
Another difference that might not be apparent from looking at the Python and C versions of this program is how the Python list and the C array are stored in memory. C dictates the array layout in program memory, whereas Python hides how lists are implemented from the programmer. In C, individual array elements are allocated in consecutive locations in the program’s memory. For example, the third array position is located in memory immediately following the second array position and immediately before the fourth array position.
1.5.2. Array Access Methods
Python provides multiple ways to access elements in its lists. C, however, supports only indexing, as described earlier. Valid index values range from 0 to the capacity of the array minus 1. Here are some examples:
int i, num;
int arr[10]; // declare an array of ints, with a capacity of 10
num = 6; // keep track of how many elements of arr are used
// initialize first 5 elements of arr (at indices 0-4)
for (i=0; i < 5; i++) {
arr[i] = i * 2;
}
arr[5] = 100; // assign the element at index 5 the value 100
This example declares the array with a capacity of 10 (it has 10 elements), but
it only uses the first six (our current collection of values is size 6, not 10).
It’s often the case when using statically declared arrays that some of an
array’s capacity will remain unused. As a result, we need another program
variable to keep track of the actual size (number of elements) in the array
(num
in this example).
Python and C differ in their error-handling approaches when a program attempts
to access an invalid index. Python throws an IndexError
exception if an
invalid index value is used to access elements in a list (for example, indexing
beyond the number of elements in a list). In C, it’s up to the programmer to
ensure that their code uses only valid index values when indexing into arrays. As a
result, for code like the following that accesses an array element beyond the
bounds of the allocated array, the program’s runtime behavior is undefined:
int array[10]; // an array of size 10 has valid indices 0 through 9
array[10] = 100; // 10 is not a valid index into the array
The C compiler is happy to compile code that accesses array positions beyond the bounds of the array; there is no bounds checking by the compiler or at runtime. As a result, running this code can lead to unexpected program behavior (and the behavior might differ from run to run). It can lead to your program crashing, it can change another variable’s value, or it might have no effect on your program’s behavior. In other words, this situation leads to a program bug that might or might not show up as unexpected program behavior. Thus, as a C programmer, it’s up to you to ensure that your array accesses refer to valid positions!
1.5.3. Arrays and Functions
The semantics of passing arrays to functions in C is similar to that of passing
lists to functions in Python: the function can alter the elements in the passed
array or list. Here’s an example function that takes two parameters, an int
array parameter (arr
), and an int
parameter (size
):
void print_array(int arr[], int size) {
int i;
for (i = 0; i < size; i++) {
printf("%d\n", arr[i]);
}
}
The []
after the parameter name tells the compiler that the type of the
parameter arr
is array of int, not int
like the parameter size
. In the
next chapter, we show an alternate syntax for specifying array parameters. The
capacity of the array parameter arr
isn’t specified: arr[]
means that
this function can be called with an array argument of any capacity. Because
there is no way to get an array’s size or capacity just from the array
variable, functions that are passed arrays almost always also have a second
parameter that specifies the array’s size (the size
parameter in the
preceding example).
To call a function that has an array parameter, pass the name of the array as
the argument. Here is a C code snippet with example calls to the print_array
function:
int some[5], more[10], i;
for (i = 0; i < 5; i++) { // initialize the first 5 elements of both arrays
some[i] = i * i;
more[i] = some[i];
}
for (i = 5; i < 10; i++) { // initialize the last 5 elements of "more" array
more[i] = more[i-1] + more[i-2];
}
print_array(some, 5); // prints all 5 values of "some"
print_array(more, 10); // prints all 10 values of "more"
print_array(more, 8); // prints just the first 8 values of "more"
In C, the name of the array variable is equivalent to the base address of the array (that is, the memory location of its 0th element). Due to C’s pass by value function call semantics, when you pass an array to a function, each element of the array is not individually passed to the function. In other words, the function isn’t receiving a copy of each array element. Instead, an array parameter gets the value of the array’s base address. This behavior implies that when a function modifies the elements of an array that was passed as a parameter, the changes will persist when the function returns. For example, consider this C program snippet:
void test(int a[], int size) {
if (size > 3) {
a[3] = 8;
}
size = 2; // changing parameter does NOT change argument
}
int main(void) {
int arr[5], n = 5, i;
for (i = 0; i < n; i++) {
arr[i] = i;
}
printf("%d %d", arr[3], n); // prints: 3 5
test(arr, n);
printf("%d %d", arr[3], n); // prints: 8 5
return 0;
}
The call in main
to the test
function is passed the argument arr
, whose
value is the base address of the arr
array in memory. The parameter a
in
the test function gets a copy of this base address value. In other words,
parameter a
refers to the same array storage locations as its argument,
arr
. As a result, when the test function changes a value stored in the a
array (a[3] = 8
), it affects the corresponding position in the argument array
(arr[3]
is now 8). The reason is that the value of a
is the base address
of arr
, and the value of arr
is the base address of arr
, so both a
and
arr
refer to the same array (the same storage locations in memory)!
Figure 7 shows the stack contents at the point in the execution just
before the test function returns.
Parameter a
is passed the value of the base address of the array argument
arr
, which means they both refer to the same set of array storage locations
in memory. We indicate this with the arrow from a
to arr
. Values that get
modified by the function test
are highlighted. Changing the value of the
parameter size
does not change the value of its corresponding argument n
,
but changing the value of one of the elements referred to by a
(for example,
a[3] = 8
) does affect the value of the corresponding position in arr
.
1.5.4. Introduction to Strings and the C String Library
Python implements a string type and provides a rich interface for using
strings, but there is no corresponding string type in C. Instead, strings are
implemented as arrays of char
values. Not every character array is used as a
C string, but every C string is a character array.
Recall that arrays in C might be defined with a larger size than a program
ultimately uses. For example, we saw earlier in the section
"Array Access Methods" that
we might declare an array of size 10 but only use the first six positions. This
behavior has important implications for strings: we can’t assume that a
string’s length is equal to that of the array that stores it. For this reason,
strings in C must end with a special character value, the null character
('\0'
), to indicate the end of the string.
Strings that end with a null character are said to be null-terminated.
Although all strings in C should be null-terminated, failing to properly
account for null characters is a common source of errors for novice C
programmers. When using strings, it’s important to keep in mind that your
character arrays must be declared with enough capacity to store each character
value in the string plus the null character ('\0'
). For example, to
store the string "hi"
, you need an array of at least three chars (one to store
'h'
, one to store 'i'
, and one to store '\0'
).
Because strings are commonly used, C provides a string library that contains
functions for manipulating strings. Programs that use these string library
functions need to include the string.h
header.
When printing the value of a string with printf
, use the %s
placeholder in
the format string. The printf
function will print all the characters in the
array argument until it encounters the '\0'
character. Similarly,
string library functions often either locate the end of a string by searching
for the '\0'
character or add a '\0'
character to the end of
any string that they modify.
Here’s an example program that uses strings and string library functions:
#include <stdio.h>
#include <string.h> // include the C string library
int main(void) {
char str1[10];
char str2[10];
int len;
str1[0] = 'h';
str1[1] = 'i';
str1[2] = '\0';
len = strlen(str1);
printf("%s %d\n", str1, len); // prints: hi 2
strcpy(str2, str1); // copies the contents of str1 to str2
printf("%s\n", str2); // prints: hi
strcpy(str2, "hello"); // copy the string "hello" to str2
len = strlen(str2);
printf("%s has %d chars\n", str2, len); // prints: hello has 5 chars
}
The strlen
function in the C string library returns the number of characters
in its string argument. A string’s terminating null character doesn’t count as
part of the string’s length, so the call to strlen(str1)
returns 2 (the
length of the string "hi"
). The strcpy
function copies one character at a
time from a source string (the second parameter) to a destination string (the
first parameter) until it reaches a null character in the source.
Note that most C string library functions expect the call to pass in a
character array that has enough capacity for the function to perform its job.
For example, you wouldn’t want to call strcpy
with a destination string that
isn’t large enough to contain the source; doing so will lead to undefined
behavior in your program!
C string library functions also require that string values passed to them are
correctly formed, with a terminating '\0'
character. It’s up to you
as the C programmer to ensure that you pass in valid strings for C library
functions to manipulate. Thus, in the call to strcpy
in the preceding example, if the
source string (str1
) was not initialized to have a terminating '\0'
character, strcpy
would continue beyond the end of the str1
array’s bounds,
leading to undefined behavior that could cause it to crash.
The previous example uses the We chose to show |
In the next chapter, we discuss C strings and the C string library in more detail.
1.6. Structs
Arrays and structs are the two ways in which C supports creating collections of data elements. Arrays are used to create an ordered collection of data elements of the same type, whereas structs are used to create a collection of data elements of different types. A C programmer can combine array and struct building blocks in many different ways to create more complex data types and structures. This section introduces structs, and in the next chapter we characterize structs in more detail and show how you can combine them with arrays.
C is not an object-oriented language; thus, it doesn’t support classes. It
does, however, support defining structured types, which are like the data part
of classes. A struct
is a type used to represent a heterogeneous collection
of data; it’s a mechanism for treating a set of different types as a single,
coherent unit. C structs provide a level of abstraction on top of individual
data values, treating them as a single type. For example, a student has a
name, age, grade point average (GPA), and graduation year. A programmer could
define a new struct
type to combine those four data elements into a single
struct student
variable that contains a name value (type char []
, to hold a
string), an age value (type int
), a GPA value (type float
), and a
graduation year value (type int
). A single variable of this struct type can
store all four pieces of data for a particular student; for example, ("Freya",
19, 3.7, 2021).
There are three steps to defining and using struct
types in C programs:
-
Define a new
struct
type that represents the structure. -
Declare variables of the new
struct
type. -
Use dot (
.
) notation to access individual field values of the variable.
1.6.1. Defining a Struct Type
A struct type definition should appear outside of any function, typically
near the top of the program’s .c
file. The syntax for defining a new struct
type is the following (struct
is a reserved keyword):
struct <struct_name> {
<field 1 type> <field 1 name>;
<field 2 type> <field 2 name>;
<field 3 type> <field 3 name>;
...
};
Here’s an example of defining a new struct studentT
type for storing student
data:
struct studentT {
char name[64];
int age;
float gpa;
int grad_yr;
};
This struct definition adds a new type to C’s type system, and the type’s name is
struct studentT
. This struct defines four fields, and each field definition
includes the type and name of the field. Note that in this example, the name
field’s type is a character array, for
use
as a string.
1.6.2. Declaring Variables of Struct Types
Once the type has been defined, you can declare variables of the new type,
struct studentT
. Note that unlike the other types we’ve encountered so far
that consist of just a single word (for example, int
, char
, and float
),
the name of our new struct type is two words, struct studentT
.
struct studentT student1, student2; // student1, student2 are struct studentT
1.6.3. Accessing Field Values
To access field values in a struct variable, use dot notation:
<variable name>.<field name>
When accessing structs and their fields, carefully consider the types of the
variables you’re using. Novice C programmers often introduce bugs into their
programs by failing to account for the types of struct fields.
Table 9 shows the types of several expressions surrounding our
struct studentT
type.
Expression | C type |
---|---|
|
|
|
integer ( |
|
array of characters ( |
|
character ( |
Here are some examples of assigning a struct studentT
variable’s fields:
// The 'name' field is an array of characters, so we can use the 'strcpy'
// string library function to fill in the array with a string value.
strcpy(student1.name, "Kwame Salter");
// The 'age' field is an integer.
student1.age = 18 + 2;
// The 'gpa' field is a float.
student1.gpa = 3.5;
// The 'grad_yr' field is an int
student1.grad_yr = 2020;
student2.grad_yr = student1.grad_yr;
Figure 8 illustrates the layout of the student1
variable in
memory after the field assignments in the preceding example. Only the struct
variable’s fields (the areas in boxes) are stored in memory. The field names
are labeled on the figure for clarity, but to the C compiler, fields are simply
storage locations or offsets from the start of the struct variable’s memory.
For example, based on the definition of a struct studentT
, the compiler knows
that to access the field named gpa
, it must skip past an array of 64
characters (name
) and one integer (age
). Note that in the figure, the
name
field only depicts the first six characters of the 64-character array.
C struct types are lvalues, meaning they can appear on the left side of an assignment statement. Thus, a struct variable can be assigned the value of another struct variable using a simple assignment statement. The field values of the struct on the right side of the assignment statement are copied to the field values of the struct on the left side of the assignment statement. In other words, the contents of memory of one struct are copied to the memory of the other. Here’s an example of assigning a struct’s values in this way:
student2 = student1; // student2 gets the value of student1
// (student1's field values are copied to
// corresponding field values of student2)
strcpy(student2.name, "Frances Allen"); // change one field value
Figure 9 shows the values of the two student variables after the
assignment statement and call to strcpy
have executed. Note that the figure
depicts the name
fields as the string values they contain rather than the
full array of 64 characters.
C provides a sizeof
operator that takes a type and returns the number of
bytes used by the type. The sizeof
operator can be used on any C type,
including struct types, to see how much memory space a variable of that type
needs. For example, we can print the size of a struct studentT
type:
// Note: the `%lu` format placeholder specifies an unsigned long value.
printf("number of bytes in student struct: %lu\n", sizeof(struct studentT));
When run, this line should print out a value of at least 76 bytes, because 64
characters are in the name
array (1 byte for each char
), 4 bytes for the
int
age
field, 4 bytes for the float
gpa
field, and 4 bytes for the
int
grad_yr
field. The exact number of bytes might be larger than 76 on
some machines.
Here’s a full example program that
defines and demonstrates the use of our struct studentT
type:
#include <stdio.h>
#include <string.h>
// Define a new type: struct studentT
// Note that struct definitions should be outside function bodies.
struct studentT {
char name[64];
int age;
float gpa;
int grad_yr;
};
int main(void) {
struct studentT student1, student2;
strcpy(student1.name, "Kwame Salter"); // name field is a char array
student1.age = 18 + 2; // age field is an int
student1.gpa = 3.5; // gpa field is a float
student1.grad_yr = 2020; // grad_yr field is an int
/* Note: printf doesn't have a format placeholder for printing a
* struct studentT (a type we defined). Instead, we'll need to
* individually pass each field to printf. */
printf("name: %s age: %d gpa: %g, year: %d\n",
student1.name, student1.age, student1.gpa, student1.grad_yr);
/* Copy all the field values of student1 into student2. */
student2 = student1;
/* Make a few changes to the student2 variable. */
strcpy(student2.name, "Frances Allen");
student2.grad_yr = student1.grad_yr + 1;
/* Print the fields of student2. */
printf("name: %s age: %d gpa: %g, year: %d\n",
student2.name, student2.age, student2.gpa, student2.grad_yr);
/* Print the size of the struct studentT type. */
printf("number of bytes in student struct: %lu\n", sizeof(struct studentT));
return 0;
}
When run, this program outputs the following:
name: Kwame Salter age: 20 gpa: 3.5, year: 2020 name: Frances Allen age: 20 gpa: 3.5, year: 2021 number of bytes in student struct: 76
1.6.4. Passing Structs to Functions
In C, arguments of all types are passed by value to functions. Thus, if a function has a struct type parameter, then when called with a struct argument, the argument’s value is passed to its parameter, meaning that the parameter gets a copy of its argument’s value. The value of a struct variable is the contents of its memory, which is why we can assign the fields of one struct to be the same as another struct in a single assignment statement like this:
student2 = student1;
Because the value of a struct variable represents the full contents of its memory, passing a struct as an argument to a function gives the parameter a copy of all the argument struct’s field values. If the function changes the field values of a struct parameter, the changes to the parameter’s field values have no effect on the corresponding field values of the argument. That is, changes to the parameter’s fields only modify values in the parameter’s memory locations for those fields, not in the argument’s memory locations for those fields.
Here’s a full example program using the
checkID
function that takes a struct parameter:
#include <stdio.h>
#include <string.h>
/* struct type definition: */
struct studentT {
char name[64];
int age;
float gpa;
int grad_yr;
};
/* function prototype (prototype: a declaration of the
* checkID function so that main can call it, its full
* definition is listed after main function in the file):
*/
int checkID(struct studentT s1, int min_age);
int main(void) {
int can_vote;
struct studentT student;
strcpy(student.name, "Ruth");
student.age = 17;
student.gpa = 3.5;
student.grad_yr = 2021;
can_vote = checkID(student, 18);
if (can_vote) {
printf("%s is %d years old and can vote.\n",
student.name, student.age);
} else {
printf("%s is only %d years old and cannot vote.\n",
student.name, student.age);
}
return 0;
}
/* check if a student is at least the min age
* s: a student
* min_age: a minimum age value to test
* returns: 1 if the student is min_age or older, 0 otherwise
*/
int checkID(struct studentT s, int min_age) {
int ret = 1; // initialize the return value to 1 (true)
if (s.age < min_age) {
ret = 0; // update the return value to 0 (false)
// let's try changing the student's age
s.age = min_age + 1;
}
printf("%s is %d years old\n", s.name, s.age);
return ret;
}
When main
calls checkID
, the value of the student
struct
(a copy of the memory contents of all its fields) is passed to the
s
parameter. When the function changes the value of its parameter’s
age
field, it doesn’t affect the age
field of its argument (student
).
This behavior can be seen by running the program, which outputs the following:
Ruth is 19 years old Ruth is only 17 years old and cannot vote.
The output shows that when checkID
prints the age
field, it reflects the
function’s change to the age
field of the parameter s
. However, after the
function call returns, main
prints the age
field of student
with the same
value it had prior to the checkID
call. Figure 10 illustrates the
contents of the call stack just before the checkID
function returns.
Understanding the pass-by-value semantics of struct parameters is particularly
important when a struct contains a statically declared array field (like the
name
field in struct studentT
). When such a struct is passed to a
function, the struct argument’s entire memory contents, including every array
element in the array field, is copied to its parameter. If the parameter
struct’s array contents are changed by the function, those changes will not
persist after the function returns. This behavior might seem odd given what we
know about how arrays are
passed to functions, but it’s consistent with the struct-copying behavior
described earlier.
1.7. Summary
In this chapter, we introduced many parts of the C programming language by comparing them to similar language constructs in Python, a language that many readers might know. C has similar language features to those of many other high-level imperative and object-oriented programming languages, including variables, loops, conditionals, functions, and I/O. Some key differences between the C and Python features we discussed include C requiring that all variables be declared of a specific type before they’re used, and that C arrays and strings are a lower-level abstraction than Python’s lists and strings. The lower-level abstractions allow a C programmer more control over how their program accesses its memory and thus more control over their program’s efficiency.
In the next chapter, we cover the C programming language in detail. We revisit in more depth the many language features presented in this chapter, and we introduce some new C language features, most notably C pointer variables and support for dynamic memory allocation.
2. A Deeper Dive into C Programming
With many of the basics of C programming covered in the previous chapter, we now dive deeper into the details of C. In this chapter we revisit many of the topics from the previous chapter, such as arrays, strings, and structs, discussing them in more detail. We also introduce C’s pointer variables and dynamic memory allocation. Pointers provide a level of indirection to accessing program state, and dynamic memory allocation allows a program to adjust to changes in size and space needs as it runs, allocating more space as it needs it and freeing space it no longer needs. By understanding how and when to use pointer variables and dynamic memory allocation, a C programmer can design programs that are both powerful and efficient.
We begin with a discussion of the parts of program memory, as this will help in understanding many of the topics presented later. As the chapter progresses, we cover C file I/O and some advanced C topics including library linking and compiling to assembly code.
2.1. Parts of Program Memory and Scope
The following C program shows examples of functions, parameters, and local and global variables (function comments are omitted to shorten this code listing):
/* An example C program with local and global variables */
#include <stdio.h>
int max(int n1, int n2); /* function prototypes */
int change(int amt);
int g_x; /* global variable: declared outside function bodies */
int main(void) {
int x, result; /* local variables: declared inside function bodies */
printf("Enter a value: ");
scanf("%d", &x);
g_x = 10; /* global variables can be accessed in any function */
result = max(g_x, x);
printf("%d is the largest of %d and %d\n", result, g_x, x);
result = change(10);
printf("g_x's value was %d and now is %d\n", result, g_x);
return 0;
}
int max(int n1, int n2) { /* function with two parameters */
int val; /* local variable */
val = n1;
if ( n2 > n1 ) {
val = n2;
}
return val;
}
int change(int amt) {
int val;
val = g_x; /* global variables can be accessed in any function */
g_x += amt;
return val;
}
This example shows program variables with different scope. A variable’s scope defines when its name has meaning. In other words, scope defines the set of program code blocks in which a variable is bound to (associated with) a program memory location and can be used by program code.
Declaring a variable outside of any function body creates a global variable. Global variables remain permanently in scope and can be used by any code in the program because they’re always bound to one specific memory location. Every global variable must have a unique name — its name uniquely identifies a specific storage location in program memory for the entire duration of the program.
Local variables and parameters are only in scope inside the function
in which they are defined. For example, the amt
parameter
is in scope only inside the change
function. This means that only
statements within the change
function body can access the amt
parameter, and an instance of the amt
parameter is bound to a
specific memory storage location only within a specific active
execution of the function. Space to store a parameter’s
value is allocated on the
stack when the function gets called, and it is deallocated from the stack
when the function returns. Each activation of a function gets its own
bindings for its parameters and local variables. Thus, for recursive
function calls, each call (or activation) gets a separate stack
frame containing space for its parameters and local variables.
Because parameters and
local variables are only in scope inside the function that defines them,
different functions can use the same names for local variables and
parameters. For example, both
the change
and the max
functions have a local variable named val
.
When code in the max
function refers to val
it refers to its
local variable val
and not to the change
function’s local variable
val
(which is not in scope inside the max
function.)
While there may occasionally be times when using global variables in C programs is necessary, we strongly recommend that you avoid programming with global variables whenever possible. Using only local variables and parameters yields code that’s more modular, more general-purpose, and easier to debug. Also, because a function’s parameters and local variables are only allocated in program memory when the function is active, they may result in more space-efficient programs.
Upon launching a new program, the operating system allocates the new program’s address space. A program’s address space (or memory space) represents storage locations for everything it needs in its execution, namely storage for its instructions and data. A program’s address space can be thought of as an array of addressable bytes; each used address in the program’s address space stores all or part of a program instruction or data value (or some additional state necessary for the program’s execution).
A program’s memory space is divided into several parts, each of which is used to store a different kind of entity in the process’s address space. Figure 11 illustrates the parts of a program’s memory space.
The top of a program’s memory is reserved for use by the operating system, but
the remaining parts are usable by the running program. The program’s
instructions are stored in the code section of the memory. For example, the
program listed above stores instructions for the main
, max
, and change
functions in this region of memory.
Local variables and parameters reside in the portion of memory for the stack. Because the amount of stack space grows and shrinks over the program’s execution as functions are called and returned from, the stack part of memory is typically allocated near the bottom of memory (at the highest memory addresses) to leave space for it to change. Stack storage space for local variables and parameters exists only when the function is active (within the stack frame for the function’s activation on the stack.)
Global variables are stored in the data section. Unlike the stack, the data region does not grow or shrink — storage space for globals persists for the entire run of the program.
Finally, the heap portion of memory is the part of a program’s address space associated with dynamic memory allocation. The heap is typically located far from stack memory, and grows into higher addresses as more space is dynamically allocated by the running program.
2.2. C’s Pointer Variables
C’s pointer variables provide a level of indirection to accessing program memory. By understanding how to use pointer variables, a programmer can write C programs that are both powerful and efficient. For example, through pointer variables, a C programmer can:
-
implement functions whose parameters can modify values in the caller’s stack frame
-
dynamically allocate (and deallocate) program memory at runtime when the program needs it
-
efficiently pass large data structures to functions
-
create linked dynamic data structures
-
interpret bytes of program memory in different ways.
In this section we introduce the syntax and semantics of C’s pointer variables and introduce common examples of how to use them in C programs.
2.2.1. Pointer Variables
A pointer variable stores the address of a memory location in which a value
of a specific type can be stored. For example, a pointer variable can store
the value of an int
address at which the integer value 12 is stored. The
pointer variable points to (refers to) the value. A pointer provides a
level of indirection for accessing values stored in memory. Figure 12
illustrates an example of what a pointer variable might look like in memory:
Through the pointer variable, ptr
, the value (12
) stored in the memory
location it points to can be indirectly accessed.
C programs most frequently use pointer variables for:
-
"Pass by pointer" parameters, for writing functions that can modify their argument’s value through a pointer parameter
-
Dynamic memory allocation, for writing programs that allocate (and free) space as the program runs. Dynamic memory is commonly used for dynamically allocating arrays. It is useful when a programmer doesn’t know the size of a data structure at compile time (e.g., the array size depends on user input at runtime). It also enables data structures to be resized as the program runs.
Rules for Using Pointer Variables
The rules for using pointer variables are similar to regular variables, except that you need to think about two types: the type of the pointer variable, and the type stored in the memory address to which the pointer variable points.
-
First, declare a pointer variable using
type_name *var_name
:int *ptr; // stores the memory address of an int (ptr "points to" an int) char *cptr; // stores the memory address of a char (cptr "points to" a char)
Pointer TypesNote that although
ptr
andcptr
are both pointers, they refer to different types:-
The type of
ptr
is "pointer to int" (int *
). It can point to a memory location that stores anint
value. -
The type of
cptr
is "pointer to char" (char *
). It can point to a memory location that stores achar
value.
-
-
Next, initialize the pointer variable (make it point to something). Pointer variables store address values. A pointer should be initialized to store the address of a memory location whose type matches the type to which the pointer variable points. One way to initialize a pointer is to use the address operator (
&
) with a variable to get the variable’s address value:int x; char ch; ptr = &x; // ptr gets the address of x, pointer "points to" x cptr = &ch; // cptr gets the address of ch, pointer "points to" ch
Figure 13. A program can initialize a pointer by assigning it the address of an existing variable of the appropriate type.Here’s an example of an invalid pointer initialization due to mismatched types:
cptr = &x; // ERROR: cptr can hold a char memory location // (&x is the address of an int)
Even though the C compiler may allow this type of assignment (with a warning about incompatible types), the behavior of accessing and modifying
x
throughcptr
will likely not behave as the programmer expects. Instead, the programmer should use anint *
variable to point to anint
storage location.All pointer variables can also be assigned a special value, NULL, which represents an invalid address. While a null pointer (one whose value is
NULL
) should never be used to access memory, the valueNULL
is useful for testing a pointer variable to see if it points to a valid memory address. That is, C programmers will commonly check a pointer to ensure that its value isn’tNULL
before attempting to access the memory location to which it points. To set a pointer toNULL
:ptr = NULL; cptr = NULL;
Figure 14. Any pointer can be given the special value NULL, which indicates that it doesn’t refer to any particular address. Null pointers should never be dereferenced.
-
Finally, use the pointer variable: the dereference operator (
*
) follows a pointer variable to the location in memory that it points to and accesses the value at that location:/* Assuming an integer named x has already been declared, this code sets the value of x to 8. */ ptr = &x; /* initialize ptr to the address of x (ptr points to variable x) */ *ptr = 8; /* the memory location ptr points to is assigned 8 */
Figure 15. Dereferencing a pointer accesses the value to which the pointer refers.
Pointer Examples
Here’s an example sequence of C statements using two pointer variables:
int *ptr1, *ptr2, x, y;
x = 8;
ptr2 = &x; // ptr2 is assigned the address of x
ptr1 = NULL;
*ptr2 = 10; // the memory location ptr2 points to is assigned 10
y = *ptr2 + 3; // y is assigned what ptr2 points to plus 3
ptr1 = ptr2; // ptr1 gets the address value stored in ptr2 (both point to x)
*ptr1 = 100;
ptr1 = &y; // change ptr1's value (change what it points to)
*ptr1 = 80;
When using pointer variables, carefully consider the types of the relevant
variables. Drawing pictures of memory (like those shown above) can help with
understanding what pointer code is doing. Some common errors involve misusing the
dereference operator (*
) or the address operator (&
). For example:
ptr = 20; // ERROR?: this assigns ptr to point to address 20
ptr = &x;
*ptr = 20; // CORRECT: this assigns 20 to the memory pointed to by ptr
If your program dereferences a pointer variable that does not contain a valid address, the program crashes:
ptr = NULL;
*ptr = 6; // CRASH! program crashes with a segfault (a memory fault)
ptr = 20;
*ptr = 6; // CRASH! segfault (20 is not a valid address)
ptr = x;
*ptr = 6; // likely CRASH or may set some memory location with 6
// (depends on the value of x which is used as an address value)
ptr = &x; // This is probably what the programmer intended
*ptr = 6;
These types of errors exemplify one reason to initialize pointer variables to
NULL
; a program can then test a pointer’s value for NULL
before
dereferencing it:
if (ptr != NULL) {
*ptr = 6;
}
2.3. Pointers and Functions
Pointer parameters provide a mechanism through which functions can modify argument values. The commonly used pass by pointer pattern uses a pointer function parameter that gets the value of the address of some storage location passed to it by the caller. For example, the caller could pass the address of one of its local variables. By dereferencing the pointer parameter inside the function, the function can modify the value at the storage location to which it points.
We have already seen similar functionality with array parameters, where an array function parameter gets the value of the base address of the passed array (the parameter refers to the same set of array elements as its argument), and the function can modify the values stored in the array. In general, this same idea can be applied by passing pointer parameters to functions that point to the memory locations in the caller’s scope.
Pass by Value
All arguments in C are passed by value and follow pass-by-value
semantics: the parameter gets a copy of its argument value, and
modifying the parameter’s value does not change its argument’s value.
When passing base type values, like the value of an In the pass-by-pointer pattern, the parameter still gets the value of its argument, but it is passed the value of an address. Just like in passing base types, changing a pointer parameter’s value will not change its argument’s value (that is, assigning the parameter to point to a different address will not change the argument’s address value). However, by dereferencing a pointer parameter, the function can change the contents of memory that both the parameter and its argument refer to; through a pointer parameter, a function can modify a variable that is visible to the caller after the function returns. |
Here are the steps for implementing and calling a function with a pass by pointer parameter, with example code snippets showing each step:
-
Declare the function parameter to be a pointer to the variable type:
/* input: an int pointer that stores the address of a memory * location that can store an int value (it points to an int) */ int change_value(int *input) {
-
When making the function call, pass in the address of a variable as the argument:
int x; change_value(&x);
In the preceding example, since the parameter’s type is
int *
, the address passed must be the address of anint
variable. -
In the body of the function, dereference the pointer parameter to change the argument’s value:
*input = 100; // the location input points to (x's memory) is assigned 100
Next, let’s examine a larger example program:
#include <stdio.h>
int change_value(int *input);
int main(void) {
int x;
int y;
x = 30;
y = change_value(&x);
printf("x: %d y: %d\n", x, y); // prints x: 100 y: 30
return 0;
}
/*
* changes the value of the argument
* input: a pointer to the value to change
* returns: the original value of the argument
*/
int change_value(int *input) {
int val;
val = *input; /* val gets the value input points to */
if (val < 100) {
*input = 100; /* the value input points to gets 100 */
} else {
*input = val * 2;
}
return val;
}
When run, the output is:
x: 100 y: 30
Figure 16 shows what the call stack looks like before executing the
return in change_value
.
The input parameter gets a copy of the value of its argument (the address of
x
). The value of x
is 30 when the function call is made. Inside the
change_value
function, the parameter is dereferenced to assign the value 100
to the memory location pointed to by the parameter (*input = 100;
, meaning
"the location input
points to gets the value 100"). Since the parameter
stores the address of a local variable in the main
function’s stack frame,
through dereferencing the parameter, the value stored in the caller’s local
variable can be changed. When the function returns, the argument’s value
reflects the change made to it through the pointer parameter (the value of x
in main
was changed to 100 by the change_value
function through its input
parameter).
2.4. Dynamic Memory Allocation
In addition to pass-by-pointer parameters, programs commonly use pointer variables to dynamically allocate memory. Such dynamic memory allocation allows a C program to request more memory as it’s running, and a pointer variable stores the address of the dynamically allocated space. Programs often allocate memory dynamically to tailor the size of an array for a particular run.
Dynamic memory allocation grants flexibility to programs that:
-
do not know the size of arrays or other data structures until runtime (e.g. the size depends on user input)
-
need to allow for a variety of input sizes (not just up to some fixed capacity)
-
want to allocate exactly the size of data structures needed for a particular execution (don’t waste capacity)
-
grow or shrink the sizes of memory allocated as the program runs, reallocating more space when needed and freeing up space when it’s no longer required.
2.4.1. Heap Memory
Every byte of memory in a program’s memory space has an associated address. Everything the program needs to run is in its memory space, and different types of entities reside in different parts of a program’s memory space. For example, the code region contains the program’s instructions, global variables reside in the data region, local variables and parameters occupy the stack, and dynamically allocated memory comes from the heap. Because the stack and the heap grow at runtime (as functions are called and return and as dynamic memory is allocated and freed), they are typically far apart in a program’s address space to leave a large amount of space for each to grow into as the program runs.
Dynamically allocated memory occupies the heap memory region of a program’s address space. When a program dynamically requests memory at runtime, the heap provides a chunk of memory whose address must be assigned to a pointer variable.
Figure 17 illustrates the parts of a running program’s memory with an
example of a pointer variable (ptr
) on the stack that stores the
address of dynamically allocated heap memory (it points to heap memory).
It’s important to remember that heap memory is anonymous memory, where "anonymous" means that addresses in the heap are not bound to variable names. Declaring a named program variable allocates it on the stack or in the data part of program memory. A local or global pointer variable can store the address of an anonymous heap memory location (e.g. a local pointer variable on the stack can point to heap memory), and dereferencing such a pointer enables a program to store data in the heap.
2.4.2. malloc and free
malloc and free are functions in the standard C library (stdlib
)
that a program can call to allocate and deallocate memory in the heap.
Heap memory must be explicitly allocated (malloc’ed) and deallocated (freed)
by a C program.
To allocate heap memory, call malloc
, passing in the total number of bytes
of contiguous heap memory to allocate. Use the sizeof
operator to compute
the number of bytes to request. For example, to allocate space on the heap to
store a single integer, a program could call:
// Determine the size of an integer and allocate that much heap space.
malloc(sizeof(int));
The malloc
function returns the base address of the allocated heap memory to
the caller (or NULL
if an error occurs). Here’s a full example program with a call to
malloc
to allocate heap space to store a single int
value:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int *p;
p = malloc(sizeof(int)); // allocate heap memory for storing an int
if (p != NULL) {
*p = 6; // the heap memory p points to gets the value 6
}
}
The malloc
function returns a void *
type, which represents a generic
pointer to a non-specified type (or to any type). When a program calls
malloc
and assigns the result to a pointer variable, the program associates
the allocated memory with the type of the pointer variable.
Sometimes you may see calls to malloc
that explicitly recast its return type
from void *
to match the type of the pointer variable. For example:
p = (int *) malloc(sizeof(int));
The (int *)
before malloc
tells the compiler that the void *
type
returned by malloc
will be used as an int *
in this call (it recasts
the return type of malloc
to an int *
). We discuss
type recasting and the void *
type in more detail later in this chapter.
A call to malloc
fails if there is not enough free heap memory to satisfy
the requested number of bytes to allocate. Usually, malloc
failing
indicates an error in the program such as passing malloc
a very large request,
passing a negative number of bytes, or calling malloc
in an infinite loop and
running out of heap memory. Because any call to malloc
can fail, you should
always test its return value for NULL (indicating malloc
failed) before
dereferencing the pointer value. Dereferencing a NULL pointer will cause your
program to crash! For example:
int *p;
p = malloc(sizeof(int));
if (p == NULL) {
printf("Bad malloc error\n");
exit(1); // exit the program and indicate error
}
*p = 6;
When a program no longer needs the heap memory it dynamically allocated with
malloc
, it should explicitly deallocate the memory by calling the free
function. It’s also a good idea to set the pointer’s value to NULL
after
calling free
, so that if an error in the program causes it to be accidentally
dereferenced after the call to free
, the program will crash rather than modify
parts of heap memory that have been reallocated by subsequent calls to malloc
.
Such unintended memory references can result in undefined program behavior
that is often very difficult to debug, whereas a null pointer dereference will
fail immediately, making it a relatively easy bug to find and to fix.
free(p);
p = NULL;
2.4.3. Dynamically Allocated Arrays and Strings
C programmers often dynamically allocate memory to store arrays. A successful
call to malloc
allocates one contiguous chunk of heap memory of the requested
size. It returns the address of the start of this chunk of memory to the
caller, making the returned address value suitable for the base address of a
dynamically allocated array in heap memory.
To dynamically allocate space for an array of elements, pass malloc
the total
number of bytes in the desired array. That is, the program should request from
malloc
the total number of bytes in each array element times the number of
elements in the array. Pass malloc
an expression for the total number of bytes
in the form of sizeof(<type>) * <number of elements>
. For example:
int *arr;
char *c_arr;
// allocate an array of 20 ints on the heap:
arr = malloc(sizeof(int) * 20);
// allocate an array of 10 chars on the heap:
c_arr = malloc(sizeof(char) * 10);
After the calls to malloc
in this example, the int
pointer variable arr
stores the base address of an array of 20 contiguous integer storage
locations in heap memory, and the c_arr
char pointer variable stores the
base address of an array of 10 contiguous char storage locations in heap
memory. Figure 18 depicts what this might look like.
Note that while malloc
returns a pointer to dynamically allocated space in
heap memory, C programs store the pointer to heap locations on the stack. The
pointer variables contain only the base address (the starting address) of the
array storage space in the heap. Just like statically declared arrays, the
memory locations for dynamically allocated arrays are in contiguous memory
locations. While a single call to malloc
results in a chunk of memory of the
requested number of bytes being allocated, multiple calls to malloc
will
not result in heap addresses that are contiguous (on most systems).
In the example above, the char
array elements and the int
array
elements may be at addresses that are far apart in the heap.
After dynamically allocating heap space for an array, a program can access the array through the pointer variable. Because the pointer variable’s value represents the base address of the array in the heap, we can use the same syntax to access elements in dynamically allocated arrays as we use to access elements in statically declared arrays. Here’s an example:
int i;
int s_array[20];
int *d_array;
d_array = malloc(sizeof(int) * 20);
if (d_array == NULL) {
printf("Error: malloc failed\n");
exit(1);
}
for (i=0; i < 20; i++) {
s_array[i] = i;
d_array[i] = i;
}
printf("%d %d \n", s_array[3], d_array[3]); // prints 3 3
It may not be obvious why the same syntax can be used for accessing elements in
dynamically allocated arrays as is used in accessing elements in statically
declared arrays. However, even though their types are different, the values of
s_array
and d_array
both evaluate to the base address of the array in
memory.
Expression | Value | Type |
---|---|---|
s_array |
base address of array in memory |
(static) array of int |
d_array |
base address of array in memory |
int pointer (int *) |
Because the names of both variables evaluate to the base address
of the array in memory (the address of the first element memory),
the semantics of the [i]
syntax following the name of the variable remain
the same for both: [i]
dereferences the int storage location at offset i
from the base address of the array in memory — it’s accessing the _i_th
element.
For most purposes, we recommend using the [i]
syntax to access the elements
of a dynamically allocated array. However, programs can also use the pointer
dereferencing syntax (the *
operator) to access array elements. For
example, placing a *
in front of a pointer that refers to a
dynamically allocated array will dereference the pointer to access element 0 of
the array:
/* these two statements are identical: both put 8 in index 0 */
d_array[0] = 8; // put 8 in index 0 of the d_array
*d_array = 8; // in the location pointed to by d_array store 8
The Arrays section describes arrays in more detail, and the Pointer Arithmetic section discusses accessing array elements through pointer variables.
When a program is finished using a dynamically allocated array, it should call
free
to deallocate the heap space. As mentioned earlier, we recommend
setting the pointer to NULL
after freeing it:
free(arr);
arr = NULL;
free(c_arr);
c_arr = NULL;
free(d_array);
d_array = NULL;
2.4.4. Pointers to Heap Memory and Functions
When passing a dynamically allocated array to a function, the pointer variable
argument’s value is passed to the function (i.e., the base address of the
array in the heap is passed to the function). Thus, when passing either
statically declared or dynamically allocated arrays to functions, the parameter
gets exactly the same value — the base address of the array in memory. As
a result, the same function can be used for statically and dynamically
allocated arrays of the same type, and identical syntax can be used inside the
function for accessing array elements. The parameter declarations int *arr
and int arr[]
are equivalent. However, by convention, the pointer syntax
tends to be used for functions that may be called with dynamically allocated
arrays:
int main(void) {
int *arr1;
arr1 = malloc(sizeof(int) * 10);
if (arr1 == NULL) {
printf("malloc error\n");
exit(1);
}
/* pass the value of arr1 (base address of array in heap) */
init_array(arr1, 10);
...
}
void init_array(int *arr, int size) {
int i;
for (i = 0; i < size; i++) {
arr[i] = i;
}
}
At the point just before returning from the init_array
function, the contents
of memory will look like Figure 19. Note that when main
passes
arr1
to init_array
, it’s passing only the base address of the array. The
array’s large block of contiguous memory remains on the heap, and the function
can access it by dereferencing the arr
pointer parameter. It also passes the
size of the array so that init_array
knows how many elements to access.
2.5. Arrays in C
In the previous chapter we introduced statically declared one-dimensional C arrays and discussed the semantics of passing arrays to functions. In the dynamic memory allocation section of this chapter, we introduced dynamically allocated one dimensional arrays and discussed the semantics of passing them to functions.
In this section, we take a more in-depth look at arrays in C. We describe both statically and dynamically allocated arrays in more detail and discuss two-dimensional arrays.
2.5.1. Single-Dimensional Arrays
Statically Allocated
Before jumping into new content, we briefly summarize static arrays with an example. See the previous chapter for more detail on statically declared one-dimensional arrays.
Statically declared arrays are allocated either on the stack (for local variables) or in the data region of memory (for global variables). A programmer can declare an array variable by specifying its type (the type stored at each index) and its total capacity (number of elements).
When passing an array to a function, C copies the value of the base address to the parameter. That is, both the parameter and the argument refer to the same memory locations — the parameter pointer points to the argument’s array elements in memory. As a result, modifying the values stored in the array through an array parameter modifies the values stored in the argument array.
Here are some examples of static array declaration and use:
// declare arrays specifying their type and total capacity
float averages[30]; // array of float, 30 elements
char name[20]; // array of char, 20 elements
int i;
// access array elements
for (i = 0; i < 10; i++) {
averages[i] = 0.0 + i;
name[i] = 'a' + i;
}
name[10] = '\0'; // name is being used for storing a C-style string
// prints: 3 d abcdefghij
printf("%g %c %s\n", averages[3], name[3], name);
strcpy(name, "Hello");
printf("%s\n", name); // prints: Hello
Dynamically Allocated
In the Dynamic Memory Allocation section of this chapter, we introduced dynamically allocated one-dimensional arrays, including their access syntax and the syntax and semantics of passing dynamically allocated arrays to functions. Here, we present a short recap of that information with an example.
Calling the malloc
function dynamically allocates an array on the heap at
runtime. The address of the allocated heap space can be assigned to a global
or local pointer variable, which then points to the first element of the array.
To dynamically allocate space, pass malloc
the total number of bytes to
allocate for the array (using the sizeof
operator to get the size of a
specific type). A single call to malloc
allocates a contiguous chunk of heap
space of the requested size. For example:
// declare a pointer variable to point to allocated heap space
int *p_array;
double *d_array;
// call malloc to allocate the appropriate number of bytes for the array
p_array = malloc(sizeof(int) * 50); // allocate 50 ints
d_array = malloc(sizeof(double) * 100); // allocate 100 doubles
// always CHECK RETURN VALUE of functions and HANDLE ERROR return values
if ( (p_array == NULL) || (d_array == NULL) ) {
printf("ERROR: malloc failed!\n");
exit(1);
}
// use [] notation to access array elements
for (i = 0; i < 50; i++) {
p_array[i] = 0;
d_array[i] = 0.0;
}
// free heap space when done using it
free(p_array);
p_array = NULL;
free(d_array);
d_array = NULL;
Array Memory Layout
Whether an array is statically declared or dynamically allocated
via a single call to malloc
, array elements represent contiguous
memory locations (addresses):
array [0]: base address array [1]: next address array [2]: next address ... ... array [99]: last address
The location of element i
is at an offset i
from the base address of the
array. The exact address of the ith element depends on the number of bytes of
the type stored in the array. For example, consider the following array
declarations:
int iarray[6]; // an array of six ints, each of which is four bytes
char carray[4]; // an array of four chars, each of which is one byte
The addresses of their individual array elements might look something like this:
addr element ---- ------- 1230: iarray[0] 1234: iarray[1] 1238: iarray[2] 1242: iarray[3] 1246: iarray[4] 1250: iarray[5] ... 1280: carray[0] 1281: carray[1] 1282: carray[2] 1283: carray[3]
In this example, 1230
is the base address of iarray
and 1280
the
base address of carray
. Note that individual elements of each
array are allocated to contiguous memory addresses: each element
of iarray
stores a 4-byte int
value, so its element addresses differ
by 4, and each element of carray
stores a 1-byte char
value, so its
addresses differ by 1. There is no guarantee that the set of local
variables are allocated to contiguous memory locations on the stack
(hence, there could be a gap in the addresses between the end of
iarray
and the start of carray
, as shown in this example.)
Constants are often used when defining the total capacity of an array rather than using a literal numeric value. Constants are aliases for C literal values, and are used instead of literals to make the code easier to read and to allow for it to be more easily updated. See C Constants to learn more about defining and using C constants. Here is an example defining and using a constant (
|
2.5.2. Two-Dimensional Arrays
C supports multidimensional arrays, but we limit our discussion of multidimensional arrays to two-dimensional (2D) arrays, since 1D and 2D arrays are the most commonly used by C programmers.
Statically Allocated 2D Arrays
To statically declare a multidimensional array variable, specify the size of each dimension. For example:
int matrix[50][100];
short little[10][10];
Here, matrix
is a 2D array of int
values with 50 rows and 100 columns, and
little
is a 2D array of short
values with 10 rows and 10 columns.
To access an individual element, indicate both the row and the column index:
int val;
short num;
val = matrix[3][7]; // get int value in row 3, column 7 of matrix
num = little[8][4]; // get short value in row 8, column 4 of little
Figure 20 illustrates the 2D array as a matrix of integer values, where a specific element in the 2D array is indexed by row and column index values.
Programs often access the elements of a 2D array by iterating with
nested loops. For example, the following nested loop initializes all elements
in matrix
to 0:
int i, j;
for (i = 0; i < 50; i++) { // for each row i
for (j = 0; j < 100; j++) { // iterate over each column element in row i
matrix[i][j] = 0;
}
}
Two-Dimensional Array Parameters
The same rules for passing one-dimensional array arguments to functions apply
to passing two-dimensional array arguments: the parameter gets the value of the
base address of the 2D array (&arr[0][0]
). In other words, the parameter
points to the argument’s array elements and therefore the function can change
values stored in the passed array.
For multidimensional array parameters, you must indicate that the parameter is a multidimensional array, but you can leave the size of the first dimension unspecified (for good generic design). The sizes of other dimensions must be fully specified so that the compiler can generate the correct offsets into the array. Here’s a 2D example:
// a C constant definition: COLS is defined to be the value 100
#define COLS (100)
/*
* init_matrix: initializes the passed matrix elements to the
* product of their index values
* m: a 2D array (the column dimension must be 100)
* rows: the number of rows in the matrix
* return: does not return a value
*/
void init_matrix(int m[][COLS], int rows) {
int i, j;
for (i = 0; i < rows; i++) {
for (j = 0; j < COLS; j++) {
m[i][j] = i*j;
}
}
}
int main(void) {
int matrix[50][COLS];
int bigger[90][COLS];
init_matrix(matrix, 50);
init_matrix(bigger, 90);
...
Both the matrix
and the bigger
arrays can be passed as arguments to the
init_matrix
function because they have the same column dimension as
the parameter definition.
The column dimension must be specified in the parameter definition of a 2D array so that the compiler can calculate the offset from the base address of the 2D array to the start of a particular row of elements. The offset calculation follows from the layout of 2D arrays in memory. |
Two-Dimensional Array Memory Layout
Statically allocated 2D arrays are arranged in memory in row-major order, meaning that all of row 0’s elements come first, followed by all of row 1’s elements, and so on. For example, given the following declaration of a 2D array of integers:
int arr[3][4]; // int array with 3 rows and 4 columns
its layout in memory might look like Figure 21.
Note that all array elements are allocated to contiguous memory addresses.
That is, the base address of the 2D array is the memory address of the [0][0]
element (&arr[0][0]
), and subsequent elements are stored contiguously in
row-major order (e.g., the entirety of row 1 is followed immediately by the entirety
of row 2, and so on).
Dynamically Allocated 2D Arrays
Dynamically allocated 2D arrays can be allocated in two ways. For an NxM 2D array, either:
-
Make a single call to
malloc
, allocating one large chunk of heap space to store all NxM array elements. -
Make multiple calls to
malloc
, allocating an array of arrays. First, allocate a 1D array of N pointers to the element type, with a 1D array of pointers for each row in the 2D array. Then, allocate N 1D arrays of size M to store the set of column values for each row in the 2D array. Assign the addresses of each of these N arrays to the elements of the first array of N pointers.
The variable declarations, allocation code, and array element access syntax differ depending on which of these two methods a programmer chooses to use.
Method 1: Memory-Efficient Allocation
In this method, a single call to malloc
allocates the total number of bytes
needed to store the NxM array of values. This method has the benefit
of being more memory efficient because the entire space for all NxM
elements will be allocated at once, in contiguous memory locations.
The call to malloc
returns the starting address of the allocated space (the base
address of the array), which (like a 1D array) should be stored in a pointer
variable. In fact, there is no semantic difference between allocating a 1D or
2D array using this method: the call to malloc
returns the starting address of a
contiguously allocated chunk of heap memory of the requested number of bytes.
Because allocation of a 2D array using this method looks just like allocation
for a 1D array, the programmer has to explicitly map 2D row and column indexing
on top of this single chunk of heap memory space (the compiler has no implicit
notion of rows or columns and thus cannot interpret double indexing syntax into
this malloc’ed space).
Here’s an example C code snippet that dynamically allocates a 2D array using method 1:
#define N 3
#define M 4
int main(void) {
int *two_d_array; // the type is a pointer to an int (the element type)
// allocate in a single malloc of N x M int-sized elements:
two_d_array = malloc(sizeof(int) * N * M);
if (two_d_array == NULL) {
printf("ERROR: malloc failed!\n");
exit(1);
}
...
Figure 22 shows an example of allocating a 2D array using this method
and illustrates what memory might look like after the call to malloc
.
Like 1D dynamically allocated arrays, the pointer variable for a 2D array is
allocated on the stack. That pointer is then assigned the value returned by
the call to malloc
, which represents the base address of the contiguous chunk
of NxM int
storage locations in the heap memory.
Because this method uses a single chunk of malloc’ed space for the 2D array,
the memory allocation is as efficient as possible (it only requires one call to
malloc
for the entire 2D array). It’s the more efficient way to access
memory due to all elements being located close together in contiguous memory,
with each access requiring only a single level of indirection from the pointer
variable.
However, the C compiler does not know the difference between a 2D or
1D array allocation using this method. As a result,
the double indexing syntax ([i][j]
) of statically declared 2D arrays cannot
be used when allocating a 2D array using this method. Instead, the
programmer must explicitly compute the offset into the contiguous chunk
of heap memory using a function of row and column index values ([i*M + j]
,
where M
is the column dimension).
Here’s an example of how a programmer would structure code to initialize all the elements of a 2D array:
// access using [] notation:
// cannot use [i][j] syntax because the compiler has no idea where the
// next row starts within this chunk of heap space, so the programmer
// must explicitly add a function of row and column index values
// (i*M+j) to map their 2D view of the space into the 1D chunk of memory
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
two_d_array[i*M + j] = 0;
}
}
Method 1 (Single malloc) and Function Parameters
The base address of an array of int
types allocated via a single malloc
is
a pointer to an int
, so it can be passed to a function with an (int *
)
parameter. Additionally, the function must be passed row and column dimensions
so that it can correctly compute offsets into the 2D array. For example:
/*
* initialize all elements in a 2D array to 0
* arr: the array
* rows: number of rows
* cols: number of columns
*/
void init2D(int *arr, int rows, int cols) {
int i, j;
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
arr[i*cols + j] = 0;
}
}
}
int main(void) {
int *array;
array = malloc(sizeof(int) * N * M);
if (array != NULL) {
init2D(array, N, M);
}
...
Method 2: The Programmer-Friendly Way
The second method for dynamically allocating a 2D array stores the array as
an array of N 1D arrays (one 1D array per row). It requires N+1 calls to
malloc
: one malloc
for the array of row arrays, and one malloc
for each
of the N row’s
column arrays. As a result, the element locations within a row are
contiguous, but elements are not contiguous across rows of the 2D array.
Allocation and element access are not as efficient as in method 1, and the type
definitions for variables can be a bit more confusing. However, using this
method, a programmer can use double indexing syntax to access individual
elements of the 2D array (the first index is an index into the array of rows,
the second index is an index into the array of column elements within that row).
Here is an example of allocating a 2D array using method 2 (with the error detection and handling code removed for readability):
// the 2D array variable is declared to be `int **` (a pointer to an int *)
// a dynamically allocated array of dynamically allocated int arrays
// (a pointer to pointers to ints)
int **two_d_array;
int i;
// allocate an array of N pointers to ints
// malloc returns the address of this array (a pointer to (int *)'s)
two_d_array = malloc(sizeof(int *) * N);
// for each row, malloc space for its column elements and add it to
// the array of arrays
for (i = 0; i < N; i++) {
// malloc space for row i's M column elements
two_d_array[i] = malloc(sizeof(int) * M);
}
In this example, note the types of the variables and the sizes passed to the
calls to malloc
. To refer to the dynamically allocated 2D array, the
programmer declares a variable (two_d_array
) of type int **
that will store
the address of a dynamically allocated array of int *
element values. Each
element in two_d_array
stores the address of a dynamically allocated array of
int
values (the type of two_d_array[i]
is int *
).
Figure 23 shows what memory might look like after the above
example’s N+1 calls to malloc
.
Note that when using this method, only the elements allocated as part of a
single call to malloc
are contiguous in memory. That is, elements within
each row are contiguous, but elements from different rows (even neighboring
rows) are not.
Once allocated, individual elements of the 2D array can be accessed using
double-indexing notation. The first index specifies an element in the outer
array of int *
pointers (which row), and the second index specifies an
element in the inner int
array (which column within the row).
int i, j;
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
two_d_array[i][j] = 0;
}
}
To understand how double indexing is evaluated, consider the type and value of the following parts of the expression:
two_d_array: an array of int pointers, it stores the base address of an array of (int *) values. Its type is int** (a pointer to int *). two_d_array[i]: the ith index into the array of arrays, it stores an (int *) value that represents the base address of an array of (int) values. Its type is int*. two_d_array[i][j]: the jth element pointed to by the ith element of the array of arrays, it stores an int value (the value in row i, column j of the 2D array). Its type is int.
Method 2 (An Array of Arrays) and Function Parameters
The array argument’s type is int **
(a pointer to a pointer to an int
), and
the function parameter matches its argument’s type. Additionally, row and
column sizes should be passed to the function. Because this is a different
type from method 1, both array types cannot use a common function (they are not
the same C type).
Here’s an example function that takes a method 2 (array of arrays) 2D array as a parameter:
/*
* initialize a 2D array
* arr: the array
* rows: number of rows
* cols: number of columns
*/
void init2D_Method2(int **arr, int rows, int cols) {
int i,j;
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
arr[i][j] = 0;
}
}
}
/*
* main: example of calling init2D_Method2
*/
int main(void) {
int **two_d_array;
// some code to allocate the row array and multiple col arrays
// ...
init2D_Method2(two_d_array, N, M);
...
Here, the function implementation can use double-indexing syntax. Unlike
statically declared 2D arrays, both the row and column dimensions need
to be passed as parameters: the rows
parameter specifies the bounds on
the outermost array (the array of row arrays), and the cols
parameter
specifies the bounds on the inner arrays (the array column values for each
row).
2.6. Strings and the String Library
In the previous chapter we introduced arrays and strings in C. In this chapter we discuss dynamically allocated C strings and their use with the C string library. We first give a brief overview of statically declared strings.
2.6.1. C’s Support for Statically Allocated Strings (Arrays of char)
C does not support a separate string type, but a string can be implemented in C
programs using an array of char
values that is terminated by a special null
character value '\0'
. The terminating null character identifies
the end of the sequence of character values that make up a string. Not every
character array is a C string, but every C string is an array of char
values.
Because strings frequently appear in programs, C provides
libraries with functions for manipulating strings. Programs that use
the C string library need to include string.h
. Most string
library functions require the programmer to allocate space for
the array of characters that the functions manipulate.
When printing out the value of a string, use the %s
placeholder.
Here’s an example program that uses strings and some string library functions:
#include <stdio.h>
#include <string.h> // include the C string library
int main(void) {
char str1[10];
char str2[10];
str1[0] = 'h';
str1[1] = 'i';
str1[2] = '\0'; // explicitly add null terminating character to end
// strcpy copies the bytes from the source parameter (str1) to the
// destination parameter (str2) and null terminates the copy.
strcpy(str2, str1);
str2[1] = 'o';
printf("%s %s\n", str1, str2); // prints: hi ho
return 0;
}
2.6.2. Dynamically Allocating Strings
Arrays of characters can be dynamically allocated (as discussed in the
Pointers and
Arrays sections). When dynamically
allocating space to store a string, it’s important to remember to allocate
space in the array for the terminating '\0'
character at the end of the
string.
The following example program demonstrates static and dynamically
allocated strings (note the value passed to malloc
):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
int size;
char str[64]; // statically allocated
char *new_str = NULL; // for dynamically allocated
strcpy(str, "Hello");
size = strlen(str); // returns 5
new_str = malloc(sizeof(char) * (size+1)); // need space for '\0'
if(new_str == NULL) {
printf("Error: malloc failed! exiting.\n");
exit(1);
}
strcpy(new_str, str);
printf("%s %s\n", str, new_str); // prints "Hello Hello"
strcat(str, " There"); // concatenate " There" to the end of str
printf("%s\n", str); // prints "Hello There"
free(new_str); // free malloc'ed space when done
new_str = NULL;
return 0;
}
C String Functions and Destination Memory
Many C string functions (notably Failure to allocate enough memory will yield undefined results that range from
program crashes to major
security vulnerabilities. For example, the following calls to
|
2.6.3. Libraries for Manipulating C Strings and Characters
C provides several libraries with functions for manipulating strings and
characters. The string library (string.h
) is particularly useful
when writing programs that use C strings. The stdlib.h
and stdio.h
libraries also contain functions for string manipulation, and the ctype.h
library contains functions for manipulating individual character values.
When using C string library functions, it’s important to remember that most do
not allocate space for the strings they manipulate, nor do they check that you
pass in valid strings; your program must allocate space for the strings that
the C string library will use. Furthermore, if the library function modifies
the passed string, the caller needs to ensure that the string is correctly
formatted (that is, it has a terminating \0
character at the end). Calling
string library functions with bad array argument values will often cause a
program to crash. The documentation (for example, manual pages) for different
library functions specifies whether the library function allocates space or if
the caller is responsible for passing in allocated space to the library
function.
char[] and char * Parameters and char * Return TypeBoth statically declared and dynamically allocated arrays of characters can be
passed to a If a function returns a string (its return type is a |
strlen, strcpy, strncpy
The string library provides functions for copying strings and finding the length of a string:
// returns the number of characters in the string (not including the null character)
int strlen(char *s);
// copies string src to string dst up until the first '\0' character in src
// (the caller needs to make sure src is initialized correctly and
// dst has enough space to store a copy of the src string)
// returns the address of the dst string
char *strcpy(char *dst, char *src);
// like strcpy but copies up to the first '\0' or size characters
// (this provides some safety to not copy beyond the bounds of the dst
// array if the src string is not well formed or is longer than the
// space available in the dst array); size_t is an unsigned integer type
char *strncpy(char *dst, char *src, size_t size);
The strcpy
function is unsafe to use in situations when the source string
might be longer than the total capacity of the destination string. In this
case, one should use strncpy
. The size
parameter stops strncpy
from
copying more than size
characters from the src
string into the dst
string. When the length of the src
string is greater than or equal to
size
, strncpy
copies the first size
characters from src
to dst
and does not add a null character to the end of the dst
. As a result,
the programmer should explicitly add a null character to
the end of dst
after calling strncpy
.
Here are some example uses of these functions in a program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h> // include the string library
int main(void) {
// variable declarations that will be used in examples
int len, i, ret;
char str[32];
char *d_str, *ptr;
strcpy(str, "Hello There");
len = strlen(str); // len is 11
d_str = malloc(sizeof(char) * (len+1));
if (d_str == NULL) {
printf("Error: malloc failed\n");
exit(1);
}
strncpy(d_str, str, 5);
d_str[5] = '\0'; // explicitly add null terminating character to end
printf("%d:%s\n", strlen(str), str); // prints 11:Hello There
printf("%d:%s\n", strlen(d_str), d_str); // prints 5:Hello
free(d_str);
return 0;
}
strlcpy The // like strncpy but copies up to the first '\0' or size-1 characters // and null terminates the dest string (if size > 0). char *strlcpy(char *dest, char *src, size_t size); Linux’s GNU C library added On systems where // copy up to 5 chars from str to d_str strncpy(d_str, str, 5); d_str[5] = '\0'; // explicitly add null terminating character to end could be replaced with this call to // copy up to 5 chars from str to d_str strlcpy(d_str, str, 6); // strlcpy always adds '\0' to the end |
strcmp, strncmp
The string library also provides a function to compare two strings.
Comparing string variables using the ==
operator does not compare
the characters in the strings — it compares only the base addresses
of the two strings. For example, the expression:
if (d_str == str) { ...
compares the base address of the char
array in the heap pointed to by
d_str
to the base address of the str
char
array allocated on the
stack.
To compare the values of the strings, a programmer needs to
either write code by hand to compare corresponding element values,
or use the strcmp
or strncmp
functions from the string library:
int strcmp(char *s1, char *s2);
// returns 0 if s1 and s2 are the same strings
// a value < 0 if s1 is less than s2
// a value > 0 if s1 is greater than s2
int strncmp(char *s1, char *s2, size_t n);
// compare s1 and s2 up to at most n characters
The strcmp
function compares strings character by character based on their
ASCII representation. In
other words, it compares the char
values in corresponding positions of the
two parameter arrays to produce the result of the string comparison, which
occasionally yields unintuitive results. For example, the ASCII encoding for
the char
value 'a'
is larger than the encoding for the char
value 'Z'
. As
a result, strcmp("aaa", "Zoo")
returns a positive value indicating that
"aaa"
is greater than "Zoo"
, and a call to strcmp("aaa", "zoo")
returns a
negative value indicating that "aaa"
is less than "zoo"
.
Here are some string comparison examples:
strcpy(str, "alligator");
strcpy(d_str, "Zebra");
ret = strcmp(str,d_str);
if (ret == 0) {
printf("%s is equal to %s\n", str, d_str);
} else if (ret < 0) {
printf("%s is less than %s\n", str, d_str);
} else {
printf("%s is greater than %s\n", str, d_str); // true for these strings
}
ret = strncmp(str, "all", 3); // returns 0: they are equal up to first 3 chars
strcat, strstr, strchr
String library functions can concatenate strings (note that it’s up to the caller to ensure that the destination string has enough space to store the result):
// append chars from src to end of dst
// returns ptr to dst and adds '\0' to end
char *strcat(char *dst, char *src)
// append the first chars from src to end of dst, up to a maximum of size
// returns ptr to dst and adds '\0' to end
char *strncat(char *dst, char *src, size_t size);
It also provides functions for finding substrings or character values in strings:
// locate a substring inside a string
// (const means that the function doesn't modify string)
// returns a pointer to the beginning of substr in string
// returns NULL if substr not in string
char *strstr(const char *string, char *substr);
// locate a character (c) in the passed string (s)
// (const means that the function doesn't modify s)
// returns a pointer to the first occurrence of the char c in string
// or NULL if c is not in the string
char *strchr(const char *s, int c);
Here are some examples using these functions (we omit some error handling for the sake of readability):
char str[32];
char *ptr;
strcpy(str, "Zebra fish");
strcat(str, " stripes"); // str gets "Zebra fish stripes"
printf("%s\n", str); // prints: Zebra fish stripes
strncat(str, " are black.", 8);
printf("%s\n", str); // prints: Zebra fish stripes are bla (spaces count)
ptr = strstr(str, "trip");
if (ptr != NULL) {
printf("%s\n", ptr); // prints: tripes are bla
}
ptr = strchr(str, 'e');
if (ptr != NULL) {
printf("%s\n", ptr); // prints: ebra fish stripes are bla
}
Calls to strchr
and strstr
return the address of the first element in the
parameter array with a matching character value or a matching
substring value, respectively. This element address is the start of an array of char
values terminated by a \0
character. In other words, ptr
points to the
beginning of a substring inside another string. When printing the value of
ptr
as a string with printf
, the character values starting at the index
pointed to by ptr
are printed, yielding the results listed above.
strtok, strtok_r
The string library also provides functions that divide a string into tokens. A token refers to a subsequence of characters in a string separated by any number of delimiter characters of the programmer’s choosing.
char *strtok(char *str, const char *delim);
// a reentrant version of strtok (reentrant is defined in later chapters):
char *strtok_r(char *str, const char *delim, char **saveptr);
The strtok
(or strtok_r
) functions find individual tokens within a larger
string. For example, setting strtok
's delimiters to the set of whitespace
characters yields words in a string that originally contains an English
sentence. That is, each word in the sentence is a token in the string.
Below is an example program that uses strtok
to find individual words as the
tokens in an input string. (it can also be copied from here:
strtokexample.c).
/*
* Extract whitespace-delimited tokens from a line of input
* and print them one per line.
*
* to compile:
* gcc -g -Wall strtokexample.c
*
* example run:
* Enter a line of text: aaaaa bbbbbbbbb cccccc
*
* The input line is:
* aaaaa bbbbbbbbb cccccc
* Next token is aaaaa
* Next token is bbbbbbbbb
* Next token is cccccc
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(void) {
/* whitespace stores the delim string passed to strtok. The delim
* string is initialized to the set of characters that delimit tokens
* We initialize the delim string to the following set of chars:
* ' ': space '\t': tab '\f': form feed '\r': carriage return
* '\v': vertical tab '\n': new line
* (run "man ascii" to list all ASCII characters)
*
* This line shows one way to statically initialize a string variable
* (using this method the string contents are constant, meaning that they
* cannot be modified, which is fine for the way we are using the
* whitespace string in this program).
*/
char *whitespace = " \t\f\r\v\n"; /* Note the space char at beginning */
char *token; /* The next token in the line. */
char *line; /* The line of text read in that we will tokenize. */
/* Allocate some space for the user's string on the heap. */
line = malloc(200 * sizeof(char));
if (line == NULL) {
printf("Error: malloc failed\n");
exit(1);
}
/* Read in a line entered by the user from "standard in". */
printf("Enter a line of text:\n");
line = fgets(line, 200 * sizeof(char), stdin);
if (line == NULL) {
printf("Error: reading input failed, exiting...\n");
exit(1);
}
printf("The input line is:\n%s\n", line);
/* Divide the string into tokens. */
token = strtok(line, whitespace); /* get the first token */
while (token != NULL) {
printf("Next token is %s\n", token);
token = strtok(NULL, whitespace); /* get the next token */
}
free(line);
return 0;
}
sprintf
The C stdio
library also provides functions that manipulate C strings.
Perhaps the most useful is the sprintf
function, which "prints" into a string
rather than printing output to a terminal:
// like printf(), the format string allows for placeholders like %d, %f, etc.
// pass parameters after the format string to fill them in
int sprintf(char *s, const char *format, ...);
sprintf
initializes the contents of a string from values of various types.
Its parameter format
resembles those of printf
and scanf
. Here are some
examples:
char str[64];
float ave = 76.8;
int num = 2;
// initialize str to format string, filling in each placeholder with
// a char representation of its arguments' values
sprintf(str, "%s is %d years old and in grade %d", "Henry", 12, 7);
printf("%s\n", str); // prints: Henry is 12 years old and in grade 7
sprintf(str, "The average grade on exam %d is %g", num, ave);
printf("%s\n", str); // prints: The average grade on exam 2 is 76.8
Functions for Individual Character Values
The standard C library (stdlib.h
) contains a set of functions
for manipulating and testing individual char
values, including:
#include <stdlib.h> // include stdlib and ctypes to use these
#include <ctype.h>
int islower(ch);
int isupper(ch); // these functions return a non-zero value if the
int isalpha(ch); // test is TRUE, otherwise they return 0 (FALSE)
int isdigit(ch);
int isalnum(ch);
int ispunct(ch);
int isspace(ch);
char tolower(ch); // returns ASCII value of lower-case of argument
char toupper(ch);
Here are some examples of their use:
char str[64];
int len, i;
strcpy(str, "I see 20 ZEBRAS, GOATS, and COWS");
if ( islower(str[2]) ){
printf("%c is lower case\n", str[2]); // prints: s is lower case
}
len = strlen(str);
for (i = 0; i < len; i++) {
if ( isupper(str[i]) ) {
str[i] = tolower(str[i]);
} else if( isdigit(str[i]) ) {
str[i] = 'X';
}
}
printf("%s\n", str); // prints: i see XX zebras, goats, and cows
Functions to Convert Strings to Other Types
stdlib.h
also contains functions to convert between
strings and other C types. For example:
#include <stdlib.h>
int atoi(const char *nptr); // convert a string to an integer
double atof(const char *nptr); // convert a string to a float
Here’s an example:
printf("%d %g\n", atoi("1234"), atof("4.56"));
For more information about these and other C library functions (including
what they do, their parameter format, what they return, and which
headers need to be included to use them), see their
man pages.
For example, to view the strcpy
man page, run:
$ man strcpy
2.7. C Structs
In the previous chapter we introduced C struct types. In this chapter we dive deeper into C structs, examine statically and dynamically allocated structs, and combine structs and pointers to create more complex data types and data structures.
We begin with a quick overview of statically declared structs. See the previous chapter for more details.
2.7.1. Review of the C struct Type
A struct type represents a heterogeneous collection of data; it’s a mechanism for treating a set of different types as a single, coherent unit.
There are three steps to defining and using struct
types in C programs:
-
Define a
struct
type that defines the field values and their types. -
Declare variables of the
struct
type. -
Use dot notation to access individual field values in the variable.
In C, structs are lvalues (they can appear on the left-hand side of
an assignment statement). The value of a struct
variable is the contents
of its memory (all of the bytes making up its field values). When
calling functions with struct
parameters, the value of the struct
argument
(a copy of all of the bytes of all of its fields) gets copied to the
struct
function parameter.
When programming with structs, and in particular when combining structs and arrays, it’s
critical to carefully consider the type of every expression. Each field in a struct
represents a
specific type, and the syntax for accessing field values and the semantics of
passing individual field values to functions follow those of their specific
type.
The following full example program
demonstrates defining a struct
type, declaring variables of that type,
accessing field values, and passing structs and individual field values to
functions. (We omit some error handling and comments for readability).
#include <stdio.h>
#include <string.h>
/* define a new struct type (outside function bodies) */
struct studentT {
char name[64];
int age;
float gpa;
int grad_yr;
};
/* function prototypes */
int checkID(struct studentT s1, int min_age);
void changeName(char *old, char *new);
int main(void) {
int can_vote;
// declare variables of struct type:
struct studentT student1, student2;
// access field values using .
strcpy(student1.name, "Ruth");
student1.age = 17;
student1.gpa = 3.5;
student1.grad_yr = 2021;
// structs are lvalues
student2 = student1;
strcpy(student2.name, "Frances");
student2.age = student1.age + 4;
// passing a struct
can_vote = checkID(student1, 18);
printf("%s %d\n", student1.name, can_vote);
can_vote = checkID(student2, 18);
printf("%s %d\n", student2.name, can_vote);
// passing a struct field value
changeName(student2.name, "Kwame");
printf("student 2's name is now %s\n", student2.name);
return 0;
}
int checkID(struct studentT s, int min_age) {
int ret = 1;
if (s.age < min_age) {
ret = 0;
// changes age field IN PARAMETER COPY ONLY
s.age = min_age + 1;
}
return ret;
}
void changeName(char *old, char *new) {
if ((old == NULL) || (new == NULL)) {
return;
}
strcpy(old,new);
}
When run, the program produces:
Ruth 0 Frances 1 student 2's name is now Kwame
When working with structs, it’s particularly important to think about the types
of the struct
and its fields. For example, when passing a struct
to a
function, the parameter gets a copy of the struct’s value (a copy of all bytes
from the argument). Consequently, changes to the parameter’s field values do
not change the argument’s value. This behavior is illustrated in the
preceding program in the
call to checkID
, which modifies the parameter’s age field. The changes in
checkID
have no effect on the corresponding argument’s age field value.
When passing a field of a struct
to a function, the semantics match the type
of the field (the type of the function’s parameter). For example, in the call
to changeName
, the value of the name
field (the base address of the name
array inside the student2
struct) gets copied to the parameter old
, meaning
that the parameter refers to the same set of array elements in memory as its
argument. Thus, changing an element of the array in the function also changes
the element’s value in the argument; the semantics of passing the name
field
match the type of the name
field.
2.7.2. Pointers and Structs
Just like other C types, programmers can declare a variable as a pointer to a
user-defined struct
type. The semantics of using a struct
pointer variable
resemble those of other pointer types such as int *
.
Consider the struct studentT
type introduced in the previous
program example:
struct studentT {
char name[64];
int age;
float gpa;
int grad_yr;
};
A programmer can declare variables of type
struct studentT
or struct studentT *
(a pointer to a struct studentT
):
struct studentT s;
struct studentT *sptr;
// think very carefully about the type of each field when
// accessing it (name is an array of char, age is an int ...)
strcpy(s.name, "Freya");
s.age = 18;
s.gpa = 4.0;
s.grad_yr = 2020;
// malloc space for a struct studentT for sptr to point to:
sptr = malloc(sizeof(struct studentT));
if (sptr == NULL) {
printf("Error: malloc failed\n");
exit(1);
}
Note that the call to malloc
initializes sptr
to point to a dynamically
allocated struct in heap memory. Using the sizeof
operator to compute
malloc’s size request (e.g., `sizeof(struct studentT)
) ensures that malloc
allocates space for all of the field values in the struct.
To access individual fields in a pointer to a struct
, the pointer variable
first needs to be dereferenced. Based on the rules for
pointer dereferencing, you may be
tempted to access struct
fields like so:
// the grad_yr field of what sptr points to gets 2021:
(*sptr).grad_yr = 2021;
// the age field of what sptr points to gets s.age plus 1:
(*sptr).age = s.age + 1;
However, because pointers to structs are so commonly used, C provides a special
operator (→
) that both dereferences a struct
and accesses one of its field
values. For example, sptr→year
is equivalent to (*sptr).year
. Here are
some examples of accessing field values using this notation:
// the gpa field of what sptr points to gets 3.5:
sptr->gpa = 3.5;
// the name field of what sptr points to is a char *
// (can use strcpy to init its value):
strcpy(sptr->name, "Lars");
Figure 24 sketches what the variables s
and sptr
may look like in
memory after the code above executes. Recall that malloc
allocates
memory from the heap, and local variables are allocated on the stack.
2.7.3. Pointer Fields in Structs
Structs can also be defined to have pointer types as field values. For example:
struct personT {
char *name; // for a dynamically allocated string field
int age;
};
int main(void) {
struct personT p1, *p2;
// need to malloc space for the name field:
p1.name = malloc(sizeof(char) * 8);
strcpy(p1.name, "Zhichen");
p1.age = 22;
// first malloc space for the struct:
p2 = malloc(sizeof(struct personT));
// then malloc space for the name field:
p2->name = malloc(sizeof(char) * 4);
strcpy(p2->name, "Vic");
p2->age = 19;
...
// Note: for strings, we must allocate one extra byte to hold the
// terminating null character that marks the end of the string.
}
In memory, these variables will look like Figure 25 (note which parts are allocated on the stack and which are on the heap).
As structs and the types of their fields increase in complexity, be careful
with their syntax. To access field values appropriately, start from the
outermost variable type and use its type syntax to access individual parts.
For example, the types of the struct
variables shown in Table 11
govern how a programmer should access their fields.
Expression | Type | Field Access Syntax |
---|---|---|
p1 |
struct personT |
p1.age, p1.name |
p2 |
struct personT * |
p2->age, p2->name |
Further, knowing the types of field values allows a program to use the correct syntax in accessing them, as shown by the examples in Table 12.
Expression | Type | Example Access Syntax |
---|---|---|
p1.age |
int |
p1.age = 18; |
p2->age |
int |
p2->age = 18; |
p1.name |
char * |
printf("%s", p1.name); |
p2->name |
char * |
printf("%s", p2->name); |
p1.name[2] |
char |
p1.name[2] = 'a'; |
p2->name[2] |
char |
p2->name[2] = 'a'; |
In examining the last example, start by considering the type
of the outermost variable (p2
is a pointer to a struct personT
).
Therefore, to access a field value in the struct, the programmer
needs to use →
syntax (p2→name
). Next, consider the type of the
name
field, which is a char *
, used in this program to
point to an array of char
values. To access a specific char
storage location through the name
field, use array
indexing notation: p2→name[2] = 'a'
.
2.7.4. Arrays of Structs
Arrays, pointers, and structs can be combined to create more complex data structures. Here are some examples of declaring variables of different types of arrays of structs:
struct studentT classroom1[40]; // an array of 40 struct studentT
struct studentT *classroom2; // a pointer to a struct studentT
// (for a dynamically allocated array)
struct studentT *classroom3[40]; // an array of 40 struct studentT *
// (each element stores a (struct studentT *)
Again, thinking very carefully about variable and field types is necessary for understanding the syntax and semantics of using these variables in a program. Here are some examples of the correct syntax for accessing these variables:
// classroom1 is an array:
// use indexing to access a particular element
// each element in classroom1 stores a struct studentT:
// use dot notation to access fields
classroom1[3].age = 21;
// classroom2 is a pointer to a struct studentT
// call malloc to dynamically allocate an array
// of 15 studentT structs for it to point to:
classroom2 = malloc(sizeof(struct studentT) * 15);
// each element in array pointed to by classroom2 is a studentT struct
// use [] notation to access an element of the array, and dot notation
// to access a particular field value of the struct at that index:
classroom2[3].year = 2013;
// classroom3 is an array of struct studentT *
// use [] notation to access a particular element
// call malloc to dynamically allocate a struct for it to point to
classroom3[5] = malloc(sizeof(struct studentT));
// access fields of the struct using -> notation
// set the age field pointed to in element 5 of the classroom3 array to 21
classroom3[5]->age = 21;
A function that takes an array of type struct studentT
as a parameter might
look like this:
void updateAges(struct studentT *classroom, int size) {
int i;
for (i = 0; i < size; i++) {
classroom[i].age += 1;
}
}
A program could pass this function either a statically or dynamically allocated
array of struct studentT
:
updateAges(classroom1, 40);
updateAges(classroom2, 15);
The semantics of passing classroom1
(or classroom2
) to updateAges
match
the semantics of passing a statically declared (or dynamically allocated)
array to a function: the parameter refers to the same set of elements
as the argument, and thus changes to the array’s values within the function affect the
argument’s elements.
Figure 26 shows what the stack might look like for the second call
to the updateAges
function (showing the passed classroom2
array with example
field values for the struct in each of its elements).
As always, the parameter gets a copy of the value of its argument (the memory address of the array in heap memory). Thus, modifying the array’s elements in the function will persist to its argument’s values (both the parameter and the argument refer to the same array in memory).
The updateAges
function cannot be passed the classroom3
array because its type
is not the same as the parameter’s type: classroom3
is
an array of struct studentT *
, not an array of struct studentT
.
2.7.5. Self-Referential Structs
A struct can be defined with fields whose type is a pointer to
the same struct
type. These self-referential struct
types can be
used to build linked implementations of data structures, such
as linked lists, trees, and graphs.
The details of these data types and their linked implementations
are beyond the scope of this book. However,
we briefly show one example of how to define and use a
self-referential struct
type to create a linked list in C. Refer
to a textbook on data structures and algorithms for more information
about linked lists.
A linked list is one way to implement a list abstract data type.
A list represents a sequence of elements that are ordered by their position in
the list. In C, a list data structure could be implemented as an
array or as a linked list using a self-referential struct
type
for storing individual nodes in the list.
To build the latter, a programmer would define a node
struct to contain one list
element and a link to the next node in the list. Here’s an example
that could store a linked list of integer values:
struct node {
int data; // used to store a list element's data value
struct node *next; // used to point to the next node in the list
};
Instances of this struct
type can be linked together through the
next
field to create a linked list.
This example code snippet creates a linked list containing three
elements (the list itself is referred to by the head
variable that
points to the first node in the list):
struct node *head, *temp;
int i;
head = NULL; // an empty linked list
head = malloc(sizeof(struct node)); // allocate a node
if (head == NULL) {
printf("Error malloc\n");
exit(1);
}
head->data = 10; // set the data field
head->next = NULL; // set next to NULL (there is no next element)
// add 2 more nodes to the head of the list:
for (i = 0; i < 2; i++) {
temp = malloc(sizeof(struct node)); // allocate a node
if (temp == NULL) {
printf("Error malloc\n");
exit(1);
}
temp->data = i; // set data field
temp->next = head; // set next to point to current first node
head = temp; // change head to point to newly added node
}
Note that the temp
variable temporarily points to a malloc’ed node
that
gets initialized and then added to the beginning of the list by setting its
next
field to point to the node currently pointed to by head
, and then
by changing the head
to point to this new node.
The result of executing this code would look like Figure 27 in memory.
2.8. I/O in C (Standard and File)
C supports many functions for performing standard I/O as well as file I/O. In this section, we discuss some of the most commonly used interfaces for I/O in C.
2.8.1. Standard Input/Output
Every running program begins with three default I/O streams: standard out (stdout
),
standard in (stdin
), and standard error (stderr
). A program can write
(print) output to stdout
and stderr
, and it can read input values
from stdin
. stdin
is usually defined to read in input from the
keyboard, whereas stdout
and stderr
output to the terminal.
The C stdio.h
library provides the printf
function used for printing to
standard out and the scanf
function that can be used to read in values from
standard in. C also has functions to read and write one character at a time
(getchar
and putchar
), as well as other functions and libraries for reading
and writing characters to standard I/O streams. A C program must explicitly
include stdio.h
to call these functions.
You can change the location that a running program’s stdin
, stdout
and/or stderr
read from
or write to. One way to do this is by re-directing one or all of these to read
or write to a file. Here are some example shell commands for redirecting a
program’s stdin
, stdout
, or stderr
to a file ($
is the shell prompt):
# redirect a.out's stdin to read from file infile.txt:
$ ./a.out < infile.txt
# redirect a.out's stdout to print to file outfile.txt:
$ ./a.out > outfile.txt
# redirect a.out's stdout and stderr to a file out.txt
$ ./a.out &> outfile.txt
# redirect all three to different files:
# (< redirects stdin, 1> stdout, and 2> stderr):
$ ./a.out < infile.txt 1> outfile.txt 2> errorfile.txt
printf
C’s printf
function resembles formatted print
calls in Python, where the
caller specifies a format string to print. The format string often contains
special format specifiers, including special characters that will print tabs
(\t
) or newlines (\n
), or that specify placeholders for values in the
output (%
followed by a type specifier). When adding placeholders in a
format string passed to printf
, pass their corresponding values as additional
arguments following the format string. Here are some example calls to printf
:
int x = 5, y = 10;
float pi = 3.14;
printf("x is %d and y is %d\n", x, y);
printf("%g \t %s \t %d\n", pi, "hello", y);
When run, these printf
statements output:
x is 5 and y is 10 3.14 hello 10
Note how the tab characters (\t
) get printed in the second call,
and the different formatting placeholders for different types
of values (%g
, %s
, and %d
).
Here’s a set of formatting placeholders for common C types. Note that
placeholders for long
and long long
values include an l
or ll
prefix.
%f, %g: placeholders for a float or double value
%d: placeholder for a decimal value (char, short, int)
%u: placeholder for an unsigned decimal
%c: placeholder for a single character
%s: placeholder for a string value
%p: placeholder to print an address value
%ld: placeholder for a long value
%lu: placeholder for an unsigned long value
%lld: placeholder for a long long value
%llu: placeholder for an unsigned long long value
Here are some examples of their use:
float labs;
int midterm;
labs = 93.8;
midterm = 87;
printf("Hello %s, here are your grades so far:\n", "Tanya");
printf("\t midterm: %d (out of %d)\n", midterm, 100);
printf("\t lab ave: %f\n", labs);
printf("\t final report: %c\n", 'A');
When run, the output will look like this:
Hello Tanya, here are your grades so far: midterm: 87 (out of 100) lab ave: 93.800003 final report: A
C also allows you to specify the field width with format placeholders. Here are some examples:
%5.3f: print float value in space 5 chars wide, with 3 places beyond decimal
%20s: print the string value in a field of 20 chars wide, right justified
%-20s: print the string value in a field of 20 chars wide, left justified
%8d: print the int value in a field of 8 chars wide, right justified
%-8d: print the int value in a field of 8 chars wide, left justified
Here’s a larger example that uses field width specifiers with placeholders in the format string:
#include <stdio.h> // library needed for printf
int main(void) {
float x, y;
char ch;
x = 4.50001;
y = 5.199999;
ch = 'a'; // ch stores ASCII value of 'a' (the value 97)
// .1: print x and y with single precision
printf("%.1f %.1f\n", x, y);
printf("%6.1f \t %6.1f \t %c\n", x, y, ch);
// ch+1 is 98, the ASCII value of 'b'
printf("%6.1f \t %6.1f \t %c\n", x+1, y+1, ch+1);
printf("%6.1f \t %6.1f \t %c\n", x*20, y*20, ch+2);
return 0;
}
When run, the program output looks like this:
4.5 5.2 4.5 5.2 a 5.5 6.2 b 90.0 104.0 c
Note how the use of tabs and field width in the last three printf
statements result in a tabular output.
Finally, C defines placeholders for displaying values in different representations:
%x: print value in hexadecimal (base 16)
%o: print value in octal (base 8)
%d: print value in signed decimal (base 10)
%u: print value in unsigned decimal (unsigned base 10)
%e: print float or double in scientific notation
(there is no formatting option to display a value in binary)
Here is an example using placeholders to print values in different representations:
int x;
char ch;
x = 26;
ch = 'A';
printf("x is %d in decimal, %x in hexadecimal and %o in octal\n", x, x, x);
printf("ch value is %d which is the ASCII value of %c\n", ch, ch);
When run, the program output looks like this:
x is 26 in decimal, 1a in hexadecimal and 32 in octal ch value is 65 which is the ASCII value of A
scanf
The scanf
function provides one method for reading in values from stdin
(usually from the user entering them via the keyboard) and storing them in
program variables. The scanf
function is a bit picky about the exact format
in which the user enters data, which can make it sensitive to badly formed user
input.
The arguments to the scanf
function are similar to those of printf
:
scanf
takes a format string that specifies the number and type of input values
to read in, followed by the locations of program variables into which
the values should be stored.
Programs typically combine the address of (&
) operator with a variable name to
produce the location of the variable in the program’s memory — the
memory address of the variable. Here’s an example call to
scanf
that reads in two values (an int
and a float
):
int x;
float pi;
// read in an int value followed by a float value ("%d%g")
// store the int value at the memory location of x (&x)
// store the float value at the memory location of pi (&pi)
scanf("%d%g", &x, &pi);
Individual input values must be separated by at least one whitespace
character (e.g., spaces, tabs, newlines). However, scanf
skips over leading and
trailing whitespace characters as it finds the start and end of each numeric
literal value. As a result, a user could enter the value 8 and 3.14 with any
amount of whitespace before or after the two values (and at least one or more
whitespace characters between), and scanf
will always read in 8 and assign it
to x
and read in 3.14 and assign it to pi
. For example, this input with
lots of spaces between the two values will result in reading in 8 and storing
it in x
, and 3.14 and storing in pi
:
8 3.14
Programmers often write format strings for scanf
that only consist of
placeholder specifiers without any other characters between them. For reading
in the two numbers above, the format string might look like:
// read in an int and a float separated by at least one white space character
scanf("%d%g",&x, &pi);
getchar and putchar
The C functions getchar
and putchar
respectively read or write a single character
value from stdin
and to stdout
. getchar
is particularly useful in
C programs that need to support careful error detection and handling of
badly formed user input (scanf
is not robust in this way).
ch = getchar(); // read in the next char value from stdin
putchar(ch); // write the value of ch to stdout
2.8.2. File Input/Output
The C standard I/O library (stdio.h
) includes a stream interface for file
I/O. A file stores persistent data: data that lives beyond the execution of
the program that created it. A text file represents a stream of characters,
and each open file tracks its current position in the character stream. When
opening a file, the current position starts at the very first character
in the file, and it moves as a result of every character read (or written) to
the file. To read the 10th character in a file, the first 9 characters need to
first be read (or the current position must be explicitly moved to the 10th
character using the fseek
function).
C’s file interface views a file as an input or output stream, and library
functions read from or write to the next position in the file stream. The fprintf
and fscanf
functions serve as the file I/O counterparts to printf
and
scanf
. They use a format string to specify what to write or read, and they include
arguments that provide values or storage for the data that gets written or
read. Similarly, the library provides the fputc
, fgetc
, fputs
, and fgets
functions for reading and writing individual characters or strings to file
streams. Although there are many libraries that support file I/O in C, we only present
the stdio.h
library’s stream interface to text files in detail.
Text files may contain special chars like the stdin
and stdout
streams:
newlines ('\n'
), tabs ('\t'
), etc. Additionally, upon reaching the end of
a file’s data, C’s I/O library generates a special end-of-file character
(EOF
) that represents the end of the file. Functions reading from a file can
test for EOF
to determine when they have reached the end of the file stream.
2.8.3. Using Text Files in C
To read or write a file in C, follow these steps:
-
Declare a
FILE *
variable:FILE *infile; FILE *outfile;
These declarations create pointer variables to a library-defined
FILE *
type. These pointers cannot be dereferenced in an application program. Instead, they refer to a specific file stream when passed to I/O library functions. -
Open the file: associate the variable with an actual file stream by calling
fopen
. When opening a file, the mode parameter determines whether the program opens it for reading ("r"
), writing ("w"
), or appending ("a"
):infile = fopen("input.txt", "r"); // relative path name of file, read mode if (infile == NULL) { printf("Error: unable to open file %s\n", "input.txt"); exit(1); } // fopen with absolute path name of file, write mode outfile = fopen("/home/me/output.txt", "w"); if (outfile == NULL) { printf("Error: unable to open outfile\n"); exit(1); }
The
fopen
function returnsNULL
to report errors, which may occur if it’s given an invalid filename or the user doesn’t have permission to open the specified file (e.g., not having write permissions to theoutput.txt
file). -
Use I/O operations to read, write, or move the current position in the file:
int ch; // EOF is not a char value, but is an int. // since all char values can be stored in int, use int for ch ch = getc(infile); // read next char from the infile stream if (ch != EOF) { putc(ch, outfile); // write char value to the outfile stream }
-
Close the file: use
fclose
to close the file when the program no longer needs it:fclose(infile); fclose(outfile);
The stdio
library also provides functions to change the current
position in a file:
// to reset current position to beginning of file
void rewind(FILE *f);
rewind(infile);
// to move to a specific location in the file:
fseek(FILE *f, long offset, int whence);
fseek(f, 0, SEEK_SET); // seek to the beginning of the file
fseek(f, 3, SEEK_CUR); // seek 3 chars forward from the current position
fseek(f, -3, SEEK_END); // seek 3 chars back from the end of the file
2.8.4. Standard and File I/O Functions in stdio.h
The C stdio.h
library has many functions for reading and writing to files and
to the standard file-like streams (stdin
, stdout
, and stderr
). These functions
can be classified into character-based, string-based, and formatted I/O
functions. Briefly, here’s some additional details about a subset of these
functions:
// --------------- // Character Based // --------------- // returns the next character in the file stream (EOF is an int value) int fgetc(FILE *f); // writes the char value c to the file stream f // returns the char value written int fputc(int c, FILE *f); // pushes the character c back onto the file stream // at most one char (and not EOF) can be pushed back int ungetc(int c, FILE *f); // like fgetc and fputc but for stdin and stdout int getchar(); int putchar(int c); // ------------- // String Based // ------------- // reads at most n-1 characters into the array s stopping if a newline is // encountered, newline is included in the array which is '\0' terminated char *fgets(char *s, int n, FILE *f); // writes the string s (make sure '\0' terminated) to the file stream f int fputs(char *s, FILE *f); // --------- // Formatted // --------- // writes the contents of the format string to file stream f // (with placeholders filled in with subsequent argument values) // returns the number of characters printed int fprintf(FILE *f, char *format, ...); // like fprintf but to stdout int printf(char *format, ...); // use fprintf to print stderr: fprintf(stderr, "Error return value: %d\n", ret); // read values specified in the format string from file stream f // store the read-in values to program storage locations of types // matching the format string // returns number of input items converted and assigned // or EOF on error or if EOF was reached int fscanf(FILE *f, char *format, ...); // like fscanf but reads from stdin int scanf(char *format, ...);
In general, scanf
and fscanf
are sensitive to badly formed input.
However, for file I/O, often programmers can assume that an input file
is well formatted, so fscanf
may be robust enough in such cases.
With scanf
, badly formed user input will often cause a program
to crash. Reading in one character at a time
and including code to test values before converting them to different
types is more robust, but it requires the programmer to implement more
complex I/O functionality.
The format string for fscanf
can include the following syntax
specifying different types of values and ways of reading from the
file stream:
%d integer
%f float
%lf double
%c character
%s string, up to first white space
%[...] string, up to first character not in brackets
%[0123456789] would read in digits
%[^...] string, up to first character in brackets
%[^\n] would read everything up to a newline
It can be tricky to get the fscanf
format string correct, particularly
when reading a mix of numeric and string or character types from a file.
Here are a few example calls to fscanf
(and one to fprintf
) with different
format strings (let’s assume that the fopen
calls from above have executed
successfully):
int x;
double d;
char c, array[MAX];
// write int & char values to file separated by colon with newline at the end
fprintf(outfile, "%d:%c\n", x, c);
// read an int & char from file where int and char are separated by a comma
fscanf(infile, "%d,%c", &x, &c);
// read a string from a file into array (stops reading at whitespace char)
fscanf(infile,"%s", array);
// read a double and a string up to 24 chars from infile
fscanf(infile, "%lf %24s", &d, array);
// read in a string consisting of only char values in the specified set (0-5)
// stops reading when...
// 20 chars have been read OR
// a character not in the set is reached OR
// the file stream reaches end-of-file (EOF)
fscanf(infile, "%20[012345]", array);
// read in a string; stop when reaching a punctuation mark from the set
fscanf(infile, "%[^.,:!;]", array);
// read in two integer values: store first in long, second in int
// then read in a char value following the int value
fscanf(infile, "%ld %d%c", &x, &b, &c);
In the final example above, the format string explicitly reads in a character
value after a number to ensure that the file stream’s current position gets
properly advanced for any subsequent calls to fscanf
. For example, this
pattern is often used to explicitly read in (and discard) a whitespace
character (like '\n'), to ensure that the next call to fscanf
begins from the
next line in the file. Reading an additional character is necessary if the
next call to fscanf
attempts to read in a character value. Otherwise, having
not consumed the newline, the next call to fscanf
will read the newline rather
than the intended character. If the next call reads in a numeric type value,
then leading whitespace chars are automatically discarded by fscanf
and the
programmer does not need to explicitly read the \n
character from the file
stream.
2.9. Some Advanced C Features
Almost all of the C programming language has been presented in previous sections. In this section, we cover a few remaining advanced C language features and some advanced C programming and compiling topics:
-
C constants, the
switch
statement, enumerated types, and typedef -
writing and using your own C libraries (and dividing your program into multiple modules (
.c
and.h
files))
2.9.1. Constants, switch, enum, and typedef
Constants, switch statements, enumerated types, and typedef are features of the
C language that are useful for creating more readable code and maintainable
code. Constants, enumerated types, and typedefs are used to define aliases for
literal values and types in programs. Switch statements can be used in place
of some chaining if-else if
statements.
C Constants
A constant is an alias for a C literal value. Constants are used in place of the literal value to make code more readable and easier to modify. In C, constants are defined outside of a function body using the following syntax:
#define const_name (literal_value)
Here is an example of a partial program that defines and uses
three constants (N
, PI
, and NAME
):
#include <stdio.h>
#include <stdlib.h>
#define N (20) // N: alias for the literal value 20
#define PI (3.14) // PI: alias for the literal value 3.14
#define NAME ("Sarita") // NAME: alias for the string literal "Sarita"
int main(void) {
int array[N]; // an array of 20 ints
int *d_arr, i;
double area, circ, radius;
radius = 12.3;
area = PI*radius*radius;
circ = 2*PI*radius;
d_arr = malloc(sizeof(int)*N);
if(d_arr == NULL) {
printf("Sorry, %s, malloc failed!\n", NAME);
exit(1);
}
for(i=0; i < N; i++) {
array[i] = i;
d_arr[i] = i*2;
}
...
Using constants makes the code more readable (in an
expression, PI
has more meaning than 3.14
).
Using constants also makes code easier to modify.
For example, to change the bounds of the arrays and the
precision of the value of pi in the above program, the
programmer only needs to change their
constant definitions and recompile; all the code that uses the
constant will use their new values. For example:
#define N (50) // redefine N from 20 to 50
#define PI (3.14159) // redefine PI to higher precision
int main(void) {
int array[N]; // now allocates an array of size 50
...
area = PI*radius*radius; // now uses 3.14159 for PI
d_arr = malloc(sizeof(int)*N); // now mallocs array of 50 ints
...
for(i=0; i < N; i++) { // now iterates over 50 elements
...
It is important to remember that constants are not lvalues—they are aliases for literal values of C types. As a result, their values cannot be changed at runtime like those of variables. The following code, for example, causes a compilation error:
#define N 20
int main(void) {
...
N = 50; // compilation error: `20 = 50` is not valid C
Switch statements
The C switch
statement can be used in place of some, but not all, chaining
if
-else if
code sequences. While switch
doesn’t provide any additional
expressive power to the C programming language, it often yields more concise
code branching sequences. It may also allow the compiler to produce branching
code that executes more efficiently than equivalent chaining if
-else if
code.
The C syntax for a switch
statement looks like:
switch (<expression>) {
case <literal value 1>:
<statements>;
break; // breaks out of switch statement body
case <literal value 2>:
<statements>;
break; // breaks out of switch statement body
...
default: // default label is optional
<statements>;
}
A switch statement is executed as follows:
-
The
expression
evaluates first. -
Next, the
switch
searches for acase
literal value that matches the value of the expression. -
Upon finding a matching
case
literal, it begins executing the statements that immediately follow it. -
If no matching
case
is found, it will begin executing the statements in thedefault
label if one is present. -
Otherwise, no statements in the body of the
switch
statement get executed.
A few rules about switch
statements:
-
The value associated with each
case
must be a literal value — it cannot be an expression. The original expression gets matched for equality only with the literal values associated with eachcase
. -
Reaching a
break
statement stops the execution of all remaining statements inside the body of theswitch
statement. That is,break
breaks out of the body of theswitch
statement and continues execution with the next statement after the entireswitch
block. -
The
case
statement with a matching value marks the starting point into the sequence of C statements that will be executed — execution jumps to a location inside theswitch
body to start executing code. Thus, if there is nobreak
statement at the end of a particularcase
, then the statements under the subsequentcase
statements execute in order until either abreak
statement is executed or the end of the body of theswitch
statement is reached. -
The
default
label is optional. If present, it must be at the end.
Here’s an example program with a switch
statement:
#include <stdio.h>
int main(void) {
int num, new_num = 0;
printf("enter a number between 6 and 9: ");
scanf("%d", &num);
switch(num) {
case 6:
new_num = num + 1;
break;
case 7:
new_num = num;
break;
case 8:
new_num = num - 1;
break;
case 9:
new_num = num + 2;
break;
default:
printf("Hey, %d is not between 6 and 9\n", num);
}
printf("num %d new_num %d\n", num, new_num);
return 0;
}
Here are some example runs of this code:
./a.out enter a number between 6 and 9: 9 num 9 new_num 11 ./a.out enter a number between 6 and 9: 6 num 6 new_num 7 ./a.out enter a number between 6 and 9: 12 Hey, 12 is not between 6 and 9 num 12 new_num 0
Enumerated Types
An enumerated type (enum
) is a way to define a group of related integer
constants. Often switch statements and enumerated types are used together.
The enumerated type should be defined outside of a function body, using
the following syntax (enum
is a keyword in C):
enum type_name {
CONST_1_NAME,
CONST_2_NAME,
...
CONST_N_NAME
};
Note that the constant fields are specified by a comma separated list of names and are not explicitly given values. By default, the first constant in the list is assigned the value 0, the second the value 1, and so on.
Below is an example of defining an enumerated type for the days of the week:
enum days_of_week {
MON,
TUES,
WED,
THURS,
FRI
};
A variable of an enumerated type value is declared using the type name
enum type_name
, and the constant values it defines can be used in
expressions. For example:
enum days_of_week day;
day = THURS;
if (day > WED) {
printf("The weekend is arriving soon!\n");
}
An enumerated types is similar to defining a set of constants using #define
like the following:
#define MON 0
#define TUES 1
#define WED 2
#define THURS 3
#define FRI 4
The constant values in the enumerated type can be used in a similar way as
constants are used to make a program easier to read and code easier to update.
However, an enumerated type has an advantage of grouping together a set of
related integer constants together. It also is a type definition so variables
and parameters can be declared to be an enumerated type, whereas constants are
aliases for literal values. In addition, in enumerated types the specific
values of each constant is implicitly assigned in sequence starting at 0
, so
the programmer doesn’t have to specify each constant’s value.
Another nice feature of enumerated types is that
it is easy to add or remove constants from the set without
having to change all their values. For example, if the user wanted to add
Saturday and Sunday to the set of days and maintain the relative
ordering of the days, they can add them to the enumerated type
definition without having to explicitly redefine the values of the
others as they would need to do with #define
constant definitions:
enum days_of_week {
SUN, // SUN will now be 0
MON, // MON will now be 1, and so on
TUES,
WED,
THURS,
FRI,
SAT
};
Although values are implicitly assigned to the constants an enumerated type,
the programmer can also assign specific values to them
using = val
syntax. For example, if the programmer wanted the values of the
days of the week to start at 1 instead of 0, they could do the following:
enum days_of_week {
SUN = 1, // start the sequence at 1
MON, // this is 2 (next value after 1)
TUES, // this is 3, and so on
WED,
THURS,
FRI,
SAT
};
Because an enumerated type defines aliases for a set of int
literal values,
the value of an enumerated type prints out as its int
value and not
as the name of the alias. For example, given the above definition of the
enum days_of_week
, the following prints 3
not the string "TUES"
:
enum days_of_week day;
day = TUES;
printf("Today is %d\n", day);
Enumerated types are often used in combination with switch statements
as shown in the example code below.
The example also shows a switch statement with several cases
associated with the same set of statements, and a case statement
that does not have a break
before the next case statement
(when val
is FRI
two printf
statements are executed
before a break
is encountered, and when val
is MON
or WED
only one of the printf
statements is executed
before the break
):
// an int because we are using scanf to assign its value
int val;
printf("enter a value between %d and %d: ", SUN, SAT);
scanf("%d", &val);
switch (val) {
case FRI:
printf("Orchestra practice today\n");
case MON:
case WED:
printf("PSYCH 101 and CS 231 today\n");
break;
case TUES:
case THURS:
printf("Math 311 and HIST 140 today\n");
break;
case SAT:
printf("Day off!\n");
break;
case SUN:
printf("Do weekly pre-readings\n");
break;
default:
printf("Error: %d is not a valid day\n", val);
};
typedef
C provides a way to define a new type that is an alias for an existing
type using the keyword typedef
. Once defined, variables
can be declared using this new alias for the type. This feature
is commonly used to make the program more readable and to use shorter type
names, often for structs and enumerated types.
The following is the format for defining a new type with typedef
:
typedef existing_type_name new_type_alias_name;
Here is an example partial program that uses typedefs:
#define MAXNAME (30)
#define MAXCLASS (40)
enum class_year {
FIRST = 1,
SECOND,
JUNIOR,
SENIOR,
POSTGRAD
};
// classYr is an alias for enum class_year
typedef enum class_year classYr;
struct studentT {
char name[MAXNAME];
classYr year; // use classYr type alias for field type
float gpa;
};
// studentT is an alias for struct studentT
typedef struct studentT studentT;
// ull is an alias for unsigned long long
typedef unsigned long long ull;
int main(void) {
// declare variables using typedef type names
studentT class[MAXCLASS];
classYr yr;
ull num;
num = 123456789;
yr = JUNIOR;
strcpy(class[0].name, "Sarita");
class[0].year = SENIOR;
class[0].gpa = 3.75;
...
Because typedef is often used with structs, C provides syntax for combining
a typedef and a struct definition together by prefixing the struct
definition with typedef
and listing the name of the type alias
after the closing }
in the struct definition. For example, the
following defines both a struct type struct studentT
and an alias for
the type named studentT
:
typedef struct studentT {
char name[MAXNAME];
classYr year; // use classYr type alias for field type
float gpa;
} studentT;
This definition is equivalent to doing the typedef separately, after the the struct definition as in the previous example.
2.9.2. Command Line Arguments
A program can be made more general purpose by reading command line arguments, which are included as part of the command entered by the user to run a binary executable program. They specify input values or options that change the runtime behavior of the program. In other words, running the program with different command line argument values results in a program’s behavior changing from run to run without having to modify the program code and recompile it. For example, if a program takes the name of an input filename as a command line argument, a user can run it with any input filename as opposed to a program that refers to a specific input filename in the code.
Any command line arguments the user provides get passed to the main
function
as parameter values. To write a program that takes command line arguments, the
main
function’s definition must include two parameters, argc
and argv
:
int main(int argc, char *argv[]) { ...
Note that the type of the second parameter could also be represented
as char **argv
.
The first parameter, argc, stores the argument count. Its value represents the number of command line arguments passed to the main function (including the name of the program). For example, if the user enters
./a.out 10 11 200
then argc
will hold the value 4 (a.out
counts as the first command line argument,
and 10
, 11
, and 200
as the other three).
The second parameter, argv, stores the argument vector. It contains the
value of each command line argument. Each command line argument gets passed in
as a string value, thus argv
's type is an array of strings (or an array of char
arrays). The argv
array contains argc + 1
elements. The first argc
elements store the command line argument strings, and the last element stores
NULL
, signifying the end of the command line argument list. For example,
in the command line entered above, the argv
array would look like Figure 28:
Often, a program wants to interpret a command line argument passed to main
as
a type other than a string. In the example above, the program may want to
extract the integer value 10
from the string value "10"
of its first
command line argument. C’s standard library provides functions for converting
strings to other types. For example, the atoi
("a to i", for "ASCII to
integer") function converts a string of digit characters to its corresponding
integer value:
int x;
x = atoi(argv[1]); // x gets the int value 10
See the Strings and the String Library section for more information about these functions. And the commandlineargs.c program for another example of C command line arguments.
2.9.3. The void *
Type and Type Recasting
The C type void *
represents a generic pointer — a pointer to any type, or a
pointer to an unspecified type. C allows for a generic pointer type because
memory addresses on a system are always stored in the same number of bytes
(e.g., addresses are four bytes on 32-bit systems and eight bytes on 64-bit systems).
As a result, every pointer variable requires the same number of storage bytes,
and because they’re all the same size, the compiler can allocate space for a
void *
variable without knowing the type it points to. Here’s an example:
void *gen_ptr;
int x;
char ch;
gen_ptr = &x; // gen_ptr can be assigned the address of an int
gen_ptr = &ch; // or the address of a char (or the address of any type)
Typically, programmers do not declare variables of type void *
as in the
preceding example. Instead, it’s commonly used to specify generic return types
from functions or generic parameters to functions. The void *
type is often
used as a return type by functions that return newly allocated memory that can
be used to store any type (e.g., malloc
). It’s also used as a function
parameter for functions that can take any type of value. In this case,
individual calls to the function pass in a pointer to some specific type, which
can be passed to the function’s void *
parameter because it can store the
address of any type.
Because void *
is a generic pointer type, it cannot be directly dereferenced — the
compiler does not know the size of memory that the address points to.
For example, the address could refer to an int
storage location of
four bytes or it could refer to a char
storage location in memory of one
byte. Therefore, the programmer must explicitly
recast the void *
pointer to a pointer of a specific type before dereferencing
it. Recasting tells the compiler the specific type of pointer
variable, allowing the compiler to generate the correct memory access code for
pointer dereferences.
Here are two examples of void *
use:
-
A call to
malloc
recasts itsvoid *
return type to the specific pointer type of the variable used to store its returned heap memory address:int *array; char *str; array = (int *)malloc(sizeof(int) * 10); // recast void * return value str = (char *)malloc(sizeof(char) * 20); *array = 10; str[0] = 'a';
-
Students often encounter the
void *
when creating threads. Using avoid *
parameter type in a thread function allows the thread to take any type of application-specific pointer. Thepthread_create
function has a parameter for the thread main function and avoid *
parameter for the argument value that it passes to the thread main function that the newly created thread will execute. The use of thevoid *
parameter makespthread_create
a generic thread creation function; it can be used to point to any type of memory location. For a specific program that callspthread_create
, the programmer knows the type of the argument passed to thevoid *
parameter, so the programmer must recast it to its known type before dereferencing it. In this example, suppose that the address passed to theargs
parameter contains the address of an integer variable:/* * an application-specific pthread main function * must have this function prototype: int func_name(void *args) * * any given implementation knows what type is really passed in * args: pointer to an int value */ int my_thr_main(void *args) { int num; // first recast args to an int *, then dereference to get int value num = *((int *)args); // num gets 6 ... } int main(void) { int ret, x; pthread_t tid; x = 6; // pass the address of int variable (x) to pthread_create's void * param // (we recast &x as a (void *) to match the type of pthread_create's param) ret = pthread_create(&tid, NULL, my_thr_main, // a thread main function (void *)(&x)); // &x will be passed to my_thr_main // ...
2.9.4. Pointer Arithmetic
If a pointer variable points to an array, a program can perform arithmetic on the pointer to access any of the array’s elements. In most cases, we recommend against using pointer arithmetic to access array elements: it’s easy to make errors and more difficult to debug when you do. However, occasionally it may be convenient to successively increment a pointer to iterate over an array of elements.
When incremented, a pointer points to the next storage location of the type it
points to. For example, incrementing an integer pointer (int *
) makes it
point to the next int
storage address (the address four bytes beyond its
current value), and incrementing a character pointer makes it point to the next
char
storage address (the address one byte beyond its current value).
In the following example program, we demonstrate how to use pointer arithmetic to manipulate an array. First declare pointer variables whose type matches the array’s element type:
#define N 10
#define M 20
int main(void) {
// array declarations:
char letters[N];
int numbers[N], i, j;
int matrix[N][M];
// declare pointer variables that will access int or char array elements
// using pointer arithmetic (the pointer type must match array element type)
char *cptr = NULL;
int *iptr = NULL;
...
Next, initialize the pointer variables to the base address of the arrays over which they will iterate:
// make the pointer point to the first element in the array
cptr = &(letters[0]); // &(letters[0]) is the address of element 0
iptr = numbers; // the address of element 0 (numbers is &(numbers[0]))
Then, using pointer dereferencing, our program can access the array’s elements. Here, we’re dereferencing to assign a value to an array element and then incrementing the pointer variable by one to advance it to point to the next element:
// initialized letters and numbers arrays through pointer variables
for (i = 0; i < N; i++) {
// dereference each pointer and update the element it currently points to
*cptr = 'a' + i;
*iptr = i * 3;
// use pointer arithmetic to set each pointer to point to the next element
cptr++; // cptr points to the next char address (next element of letters)
iptr++; // iptr points to the next int address (next element of numbers)
}
Note that in this example, the pointer values are incremented inside the loop.
Thus, incrementing their value makes them point to the next element in the
array.
This pattern effectively walks
through each element of an array in the same way that accessing cptr[i]
or
iptr[i]
at each iteration would.
The semantics of pointer arithmetic and the underlying arithmetic function
The semantics of pointer arithmetic are type independent: changing
any type of pointer’s value by However, the actual arithmetic function that the compiler generates
for a pointer arithmetic expression varies depending on the type of the
pointer variable (depending on the number of bytes the system uses to store
the type to which it points).
For example, incrementing a A programmer can simply write |
You can see how the above code modified array elements by printing out their values (we show this first using array indexing and then using pointer arithmetic to access each array element’s value):
printf("\n array values using indexing to access: \n");
// see what the code above did:
for (i = 0; i < N; i++) {
printf("letters[%d] = %c, numbers[%d] = %d\n",
i, letters[i], i, numbers[i]);
}
// we could also use pointer arith to print these out:
printf("\n array values using pointer arith to access: \n");
// first: initialize pointers to base address of arrays:
cptr = letters; // letters == &letters[0]
iptr = numbers;
for (i = 0; i < N; i++) {
// dereference pointers to access array element values
printf("letters[%d] = %c, numbers[%d] = %d\n",
i, *cptr, i, *iptr);
// increment pointers to point to the next element
cptr++;
iptr++;
}
Here’s what the output looks like:
array values using indexing to access: letters[0] = a, numbers[0] = 0 letters[1] = b, numbers[1] = 3 letters[2] = c, numbers[2] = 6 letters[3] = d, numbers[3] = 9 letters[4] = e, numbers[4] = 12 letters[5] = f, numbers[5] = 15 letters[6] = g, numbers[6] = 18 letters[7] = h, numbers[7] = 21 letters[8] = i, numbers[8] = 24 letters[9] = j, numbers[9] = 27 array values using pointer arith to access: letters[0] = a, numbers[0] = 0 letters[1] = b, numbers[1] = 3 letters[2] = c, numbers[2] = 6 letters[3] = d, numbers[3] = 9 letters[4] = e, numbers[4] = 12 letters[5] = f, numbers[5] = 15 letters[6] = g, numbers[6] = 18 letters[7] = h, numbers[7] = 21 letters[8] = i, numbers[8] = 24 letters[9] = j, numbers[9] = 27
Pointer arithmetic can be used to iterate over any contiguous chunk of memory. Here’s an example using pointer arithmetic to initialize a statically declared 2D array:
// sets matrix to:
// row 0: 0, 1, 2, ..., 99
// row 1: 100, 110, 120, ..., 199
// ...
iptr = &(matrix[0][0]);
for (i = 0; i < N*M; i++) {
*iptr = i;
iptr++;
}
// see what the code above did:
printf("\n 2D array values inited using pointer arith: \n");
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
printf("%3d ", matrix[i][j]);
}
printf("\n");
}
return 0;
}
The output will look like:
2D array values initialized using pointer arith: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
Pointer arithmetic can access contiguous memory locations in any pattern, starting and ending anywhere in a contiguous chunk of memory. For example, after initializing a pointer to the address of an array element, its value can be changed by more than one. For example:
iptr = &numbers[2];
*iptr = -13;
iptr += 4;
*iptr = 9999;
After executing the preceding code, printing the numbers
array’s values would
look like this (note that the values at index 2 and index 6 have changed):
numbers[0] = 0 numbers[1] = 3 numbers[2] = -13 numbers[3] = 9 numbers[4] = 12 numbers[5] = 15 numbers[6] = 9999 numbers[7] = 21 numbers[8] = 24 numbers[9] = 27
Pointer arithmetic works on dynamically allocated arrays too. However,
programmers must be careful working with dynamically allocated
multidimensional arrays. If, for example, a program uses multiple malloc
calls
to dynamically allocate individual rows of a 2D array
(method 2, array of
arrays), then the pointer must be reset to point to the address of the starting
element of every row. Resetting the pointer is necessary because only elements
within a row are located in contiguous memory addresses. On the other hand, if
the 2D array is allocated as a single malloc
of total rows times columns
space (method 1),
then all the rows are in contiguous memory (like in the statically declared 2D
array from the example above). In the latter case, the pointer only needs to
be initialized to point to the base address, and then pointer arithmetic will
correctly access any element in the 2D array.
2.9.5. C Libraries: Using, Compiling, and Linking
A library implements a collection of functions and definitions that can be used by other programs. A C library consists of two parts:
-
The application programming interface (API) to the library, which gets defined in one or more header files (
.h
files) that must be included in C source code files that plan to use the library. The headers define what the library exports to its users. These definitions usually include library function prototypes, and they may also include type, constant, or global variable declarations. -
The implementation of the library’s functionality, often made available to programs in a precompiled binary format that gets linked (added) into the binary executable created by
gcc
. Precompiled library code might be in an archive file (libsomelib.a
) containing several.o
files that can be statically linked into the executable file at compile time. Alternatively, it may consist of a shared object file (libsomelib.so
) that can be dynamically linked at runtime into a running program.
For example, the C string library implements a set of functions to manipulate C
strings. The string.h
header file defines its interface, so any program that
wants to use string library functions must #include <string.h>
. The
implementation of the C string library is part of the larger standard C library
(libc
) that the gcc
compiler automatically links into every executable file
it creates.
A library’s implementation consists of one or more modules (.c
files), and
may additionally include header files that are internal to the library
implementation; internal header files are not part of the library’s API but are
part of well-designed, modular library code. Often the C source code
implementation of a library is not exported to the user of the library.
Instead, the library is made available in a precompiled binary form. These
binary formats are not executable programs (they cannot be run on their own),
but they provide executable code that can be linked into (added into) an
executable file by gcc
at compilation time.
There are numerous libraries available for C programmers to use. For example,
the POSIX thread library (discussed in
Chapter
10) enables multithreaded C programs. C programmers can also implement and
use their own libraries (discussed in the
next section). Large C
programs tend to use many C libraries, some of which gcc
links implicitly,
whereas others require explicit linking with the -l
command line option to
gcc
.
Standard C libraries normally do not need to be explicitly linked in with the
-l
option, but other libraries do. The documentation for a library function
often specifies whether the library needs to be explicitly linked in when
compiling. For example, the POSIX threads library (pthread
) and the
readline
library require explicit linking on the gcc
command line:
$ gcc -o myprog myprog.c -pthread -lreadline
Note that linking the POSIX thread library is a special case that does not
include the -l
prefix. However, most libraries are explicitly linked into
the executable using the -l
syntax on the gcc
command line.
Also note
that the full name of the library file should not be included in the -l
argument to gcc
; the library files are named something like libreadline.so
or libreadline.a
, but the lib
prefix and .so
or .a
suffix of the
filenames are not included. The actual library filename may also contain
version numbers (e.g. libreadline.so.8.0
), which are also not included in the
-l
command line option (-lreadline
). By not forcing the user to specify
(or even know) the exact name and location of the library files to link in,
gcc
is free to find the most recent version of a library in a user’s library
path. It also allows the compiler to choose to dynamically link when both a
shared object (.so
) and an archive (.a
) version of a library are available.
If users want to statically link libraries, then they can explicitly specify
static linking in the gcc
command line. The --static
option provides one
method for requesting static linking:
$ gcc -o myprog myprog.c --static -pthread -lreadline
Compilation Steps
Characterizing C’s program compilation steps will help to illustrate how library code gets linked into an executable binary file. We first present the compilation steps and then discuss (with examples) different types of errors that can occur when compiling programs that use libraries.
The C compiler translates a C source file (e.g., myprog.c
) into an executable
binary file (e.g., a.out
) in four distinct steps (plus a fifth step that
occurs at runtime).
-
The precompiler step runs first and expands preprocessor directives: the
#
directives that appear in the C program, such as#define
and#include
. Compilation errors at this step include syntax errors in preprocessor directives orgcc
not finding header files associated with#include
directives. To view the intermediate results of the precompiler step, pass the-E
flag togcc
(the output can be redirected to a file that can be viewed by a text editor):$ gcc -E myprog.c $ gcc -E myprog.c > out $ vim out
-
The compile step runs next and does the bulk of the compilation task. It translates the C program source code (
myprog.c
) to machine-specific assembly code (myprog.s
). Assembly code is a human-readable form of the binary machine code instructions that a computer can execute. Compilation errors at this step include C language syntax errors, undefined symbol warnings, and errors from missing definitions and function prototypes. To view the intermediate results of the compile step, pass the-S
flag togcc
(this option creates a text file namedmyprog.s
with the assembly translation ofmyprog.c
, which can be viewed in a text editor):$ gcc -S myprog.c $ vim myprog.s
-
The assembly step converts the assembly code into relocatable binary object code (
myprog.o
). The resulting object file contains machine code instructions, but it is not a complete executable program that can run on its own. Thegcc
compiler on Unix and Linux systems produces binary files in a specific format called ELF (Executable and Linkable Format). To stop compilation after this step, pass the-c
flag togcc
(this produces a file namedmyprog.o
). Binary files (e.g.a.out
and.o
files) can be viewed usingobjdump
or similar tools for displaying binary files:$ gcc -c myprog.c # disassemble functions in myprog.o with objdump: $ objdump -d myprog.o
-
The link editing step runs last and creates a single executable file (
a.out
) from relocatable binaries (.o
) and libraries (.a
or.so
). In this step, the linker verifies that any references to names (symbols) in a.o
file are present in other.o
,.a
, or.so
files. For example, the linker will find theprintf
function in the standard C library (libc.so
). If the linker cannot find the definition of a symbol, this step fails with an error stating that a symbol is undefined. Runninggcc
without flags for partial compilation performs all four steps of compiling a C source code file (myprog.c
) to an executable binary file (a.out
) that can be run:$ gcc myprog.c $ ./a.out # disassemble functions in a.out with objdump: $ objdump -d a.out
If the binary executable file (
a.out
) statically links in library code (from.a
library files), thengcc
embeds copies of library functions from the.a
file in the resultinga.out
file. All calls to library functions by the application are bound to the locations in thea.out
file to which the library function is copied. Binding associates a name with a location in the program memory. For example, binding a call to a library function namedgofish
means replacing the use of the function name with the address in memory of the function (in later chapters we discuss memory addresses in more detail.)If, however, the
a.out
was created by dynamically linking a library (from library shared object,.so
, files), thena.out
does not contain a copy of the library function code from these libraries. Instead, it contains information about which dynamically linked libraries are needed by thea.out
file to run it. Such executables require an additional linking step at runtime. -
The runtime linking step is needed if
a.out
was linked with shared object files during link editing (step 4). In such cases, the dynamic library code (in.so
files) must be loaded at runtime and linked with the running program. This runtime loading and linking of shared object libraries is called dynamic linking. When a user runs ana.out
executable with shared object dependencies, the system performs dynamic linking before the program begins executing itsmain
function.The compiler adds information about shared object dependencies into the
a.out
file during the link editing compilation step (step 4). When the program starts executing, the dynamic linker examines the list of shared object dependencies and finds and loads the shared object files into the running program. It then updates relocation table entries in thea.out
file, binding the program’s use of symbols in shared objects (such as calls to library functions) to their locations in the.so
file loaded at runtime. Runtime linking reports errors if the dynamic linker cannot find a shared object (.so
) file needed by the executable.The
ldd
utility lists an executable file’s shared object dependencies:$ ldd a.out
The GNU debugger (GDB) can examine a running program and show which shared object code is loaded and linked at runtime. We cover GDB in Chapter 3. However, the details of examining the Procedure Lookup Table (PLT), which is used for runtime linking of calls to dynamically linked library functions, is beyond the scope of this textbook.
For more details about the phases of compilation and about tools for examining different phases, see: Compilation Phases.
Common Compilation Errors Related to Compiling and Linking Libraries
Several compilation and linking errors can occur due to the programmer
forgetting to include library header files or forgetting to explicitly link in
library code. Identifying the gcc
compiler error or warning associated with
each of these errors will help in debugging errors related to using C
libraries.
Consider the following C program that makes a call to a function
libraryfunc
from the examplelib
library, which is available as
a shared object file, libexamplelib.so
:
#include <stdio.h>
#include <examplelib.h>
int main(int argc, char *argv[]) {
int result;
result = libraryfunc(6, MAX);
printf("result is %d\n", result);
return 0;
}
Assume that the header file, examplelib.h
, contains the
definitions in the following example:
#define MAX 10 // a constant exported by the library
// a function exported by the library
extern int libraryfunc(int x, int y);
The extern
prefix to the function prototype means that the function’s
definition comes from another file — it’s not in the examplelib.h
file, but
instead it’s provided by one of the .c
files in the library’s implementation.
Forgetting to include a header file
If the programmer forgets to include examplelib.h
in their program, then the
compiler produces warnings and errors about the program’s use
of library functions and constants that it does not know about.
For example, if the user compiles their program without #include <examplelib.h>
,
gcc
will produce the following output:
# '-g': add debug information, '-c': compile to .o gcc -g -c myprog.c myprog.c: In function main: myprog.c:8:12: warning: implicit declaration of function libraryfunc result = libraryfunc(6, MAX); ^~~~~~~~~~~ myprog.c:8:27: error: MAX undeclared (first use in this function) result = libraryfunc(6, MAX); ^~~
The first compiler warning (implicit declaration of function libraryfunc
)
tells the programmer that the compiler cannot find a function prototype for the
libraryfunc
function. This is just a compiler warning because gcc
will
guess that the function’s return type is an integer and will continue compiling
the program. However, programmers should not ignore warnings such as these!
They
indicate that the program isn’t including a function prototype before its use
in the myprog.c
file, which is often due to not including a header file that
contains the function prototype.
The second compiler error (MAX undeclared (first use in this function)
)
follows from a missing constant definition. The compiler cannot guess at the
value of the missing constant, so this missing definition fails with an error.
This type of "undeclared" message often indicates that a header file
defining a constant or global variable is missing or hasn’t been properly
included.
Forgetting to link a library
If the programmer includes the library header file (as shown in the previous
listing),
but forgets to explicitly link in the library during the link
editing step (step 4) of compilation, then gcc
indicates this with an
undefined reference
error:
$ gcc -g myprog.c In function main: myprog.c:9: undefined reference to libraryfunc collect2: error: ld returned 1 exit status
This error originates from ld
, the linker component of the compiler.
It indicates that the linker cannot find the implementation of the
library function libraryfunc
that gets called at line 9 in myprog.c
.
An undefined reference
error indicates that a library needs to be
explicitly linked into the executable. In this example, specifying
-lexamplelib
on the gcc
command line will fix the error:
$ gcc -g myprog.c -lexamplelib
gcc can’t find header or library files
Compilation will also fail with errors if a library’s header or implementation
files are not present in the directories that gcc
searches by default. For
example, if gcc
cannot find the examplelib.h
file, it will produce an error
message like this:
$ gcc -c myprog.c -lexamplelib myprog.c:1:10: fatal error: examplelib.h: No such file or directory #include <examplelib.h> ^~~~~~~ compilation terminated.
If the linker cannot find a .a
or .so
version of the library to link in
during the link editing step of compilation, gcc
will exit with an error like
the following:
$ gcc -c myprog.c -lexamplelib /usr/bin/ld: cannot find -lexamplelib collect2: error: ld returned 1 exit status
Similarly, if a dynamically linked executable cannot locate a shared object
file (e.g., libexamplelib.so
), it will fail to execute at runtime with an error
like the following:
$ ./a.out ./a.out: error while loading shared libraries: libexamplelib.so: cannot open shared object file: No such file or directory
To resolve these types of errors, programmers must specify additional options
to gcc
to indicate where the library’s files can be found. They may also
need to modify the LD_LIBRARY_PATH
environment variable for the runtime
linker to find a library’s .so
file.
Library and Include Paths
The compiler automatically searches in standard directory locations for
header and library files. For example, systems commonly store standard header
files in /usr/include
, and library files in /usr/lib
, and
gcc
automatically looks for headers and libraries in these
directories; gcc
also automatically searches for header files in the current
working directory.
If gcc
cannot find a header or a library file, then the user must explicitly
provide paths on the command line using -I
and -L
. For example, suppose
that a library named libexamplelib.so
exists in /home/me/lib
, and its
header file examplelib.h
is in /home/me/include
. Because gcc
knows nothing of
those paths by default, it must be explicitly told to include files there to
successfully compile a program that uses this library:
$ gcc -I/home/me/include -o myprog myprog.c -L/home/me/lib -lexamplelib
To specify the location of a dynamic library (e.g., libexamplelib.so
) when
launching a dynamically linked executable, set the LD_LIBRARY_PATH
environment variable to include the path to the library. Here’s an example
bash command that can be run at a shell prompt or added to a .bashrc
file:
export LD_LIBRARY_PATH=/home/me/lib:$LD_LIBRARY_PATH
When the gcc
command lines get long, or when an executable requires many
source and header files, it helps to simplify compilation by using make
and a
Makefile
. Here’s some more information about
make and Makefiles.
2.9.6. Writing and Using Your Own C Libraries
Programmers typically divide large C programs into separate modules (i.e.,
separate .c
files) of related functionality. Definitions shared by more
than one module are put in header files (.h
files) that are included by
the modules that need them.
Similarly, C library code is also implemented in one or more
modules (.c
files) and one or more
header files (.h
files). C programmers often implement their own
C libraries of commonly used functionality. By writing a library, a
programmer implements the functionality once, in the library, and
then can use this functionality in any subsequent C program that they
write.
In the Using, Compiling, and Linking Libraries section, we describe how to use, compile, and link C library code into C programs. In this section, we discuss how to write and use your own libraries in C. What we present here also applies to structuring and compiling larger C programs composed of multiple C source and header files.
To create a library in C:
-
Define an interface to the library in a header (
.h
) file. This header file must be included by any program that wants to use the library. -
Create an implementation of the library in one or more
.c
files. This set of function definitions implement the library’s functionality. Some functions may be interface functions that users of the library will call, and others may be internal functions that cannot be called by users of the library (internal functions are part of good modular design of the library’s implementation). -
Compile a binary form of the library that can be linked into programs that use the library.
The binary form of a library could be directly built from its source file(s) as
part of compiling the application code that uses the library. This method
compiles the library files into .o
files and statically links them into the
binary executable. Including libraries this way often applies to library code
that you write for your own use (since you have access to its .c
source
files), and it’s also the method to build an executable from multiple .c
modules.
Alternatively, a library could be compiled into a binary archive (.a
) or a
shared object (.so
) file for programs that want to use the library. In these
cases, users of the library often will not have access to the library’s C
source code files, and thus they are not able to directly compile the library
code with application code that uses it. When a program uses such a
precompiled library (e.g., a .a
or .so
), the library’s code must be
explicitly linked into the executable file using gcc
's -l
command
line option.
We focus our detailed discussion of writing, compiling, and linking library
code on the case in which the programmer has access to individual library
modules (either the .c
or .o
files). This focus also applies to designing
and compiling large C programs that are divided into multiple .c
and .h
files. We briefly show commands for building archive and shared object forms
of libraries. More information about building these types of library files is
available in the gcc
documentation, including the man pages for gcc
and ar
.
Library Details by Example
In the following, we show some examples of creating and using your own libraries.
Define the library interface:
Header files (.h
file) are text files that contain C function prototypes and
other definitions — they represent the interface of a library. A header file
must be included in any application that intends to use the library. For
example, the C standard library header files are usually stored in
/usr/include/
and can be viewed with an editor:
$ vi /usr/include/stdio.h
Here’s an example header file (mylib.h
) from a
library that contains some definitions for users of the library.
#ifndef _MYLIB_H_
#define _MYLIB_H_
// a constant definition exported by library:
#define MAX_FOO 20
// a type definition exported by library:
struct foo_struct {
int x;
float y;
};
// a global variable exported by library
// "extern" means that this is not a variable declaration,
// but it defines that a variable named total_times of type
// int exists in the library implementation and is available
// for use by programs using the library.
// It is unusual for a library to export global variables
// to its users, but if it does, it is important that
// extern appears in the definition in the .h file
extern int total_times;
// a function prototype for a function exported by library:
// extern means that this function definition exists
// somewhere else.
/*
* This function returns the larger of two float values
* y, z: the two values
* returns the value of the larger one
*/
extern float bigger(float y, float z);
#endif
Header files typically have special "boilerplate" code around their contents:
#ifndef
// header file contents
#endif
This boilerplate code ensures that the compiler’s
preprocessor only includes the contents of mylib.h
exactly once in any
C file that includes it. It is important
to include .h
file contents only once to avoid duplicate definition
errors at compile time. Similarly, if you forget to include a .h
file in a C
program that uses the library, the compiler will generate an undefined
symbol
warning.
The comments in the .h
file are part of the interface
to the library, written for users of the library. These
comments should be verbose, explaining definitions and describing
what each library function does, what parameters values it takes,
and what it returns. Sometimes a .h
file will also include a top-level
comment describing how to use the library.
The keyword extern before the global variable definition and function
prototype means that these names are defined somewhere else.
It is particularly important to include extern
before any global variables
that the library exports, as it distinguishes a
name and type definition (in the .h
file) from a variable declaration
in the library’s implementation. In the previous example, the global variable
is declared
exactly once inside the library, but it’s exported to library users
through its extern
definition in the library’s .h
file.
Implement the library functionality:
Programmers implement libraries in one or more .c
files (and sometimes
internal .h
files). The implementation includes definitions of all the
functions' prototypes in the .h
file as well as other functions that are
internal to its implementation. These internal functions are often defined
with the keyword static
, which scopes their availability to the module (.c
file) in which they are defined. The library implementation should also
include variable definitions for any extern
global variable declarations in
the .h
file. Here’s an example library
implementation (mylib.c
):
#include <stdlib.h>
// Include the library header file if the implementation needs
// any of its definitions (types or constants, for example.)
// Use " " instead of < > if the mylib.h file is not in a
// default library path with other standard library header
// files (the usual case for library code you write and use.)
#include "mylib.h"
// declare the global variable exported by the library
int total_times = 0;
// include function definitions for each library function:
float bigger(float y, float z) {
total_times++;
if (y > z) {
return y;
}
return z;
}
Create a binary form of the library:
To create a binary form of the library (a .o
file), compile with the
-c
option:
$ gcc -o mylib.o -c mylib.c
One or more .o
files can build an archive (.a
) or shared object (.so
)
version of the library.
-
To build a static library use the archiver (
ar
):
ar -rcs libmylib.a mylib.o
-
To build a dynamically linked library, the
mylib.o
object file(s) in the library must be built with position independent code (using-fPIC
). Alibmylib.so
shared object file can be created frommylib.o
by specifying the-shared
flag togcc
:
gcc -fPIC -o mylib.o -c mylib.c
gcc -shared -o libmylib.so mylib.o
-
Shared object and archive libraries are often built from multiple
.o
files, for example (remember that.o
for dynamically linked libraries need to be built using the-fPIC
flag):
gcc -shared -o libbiglib.so file1.o file2.o file3.o file4.o
ar -rcs libbiglib.a file1.o file2.o file3.o file4.o
Use and link the library:
In other .c
files that use this library:
-
#include
its header file, and -
explicitly link in the implementation (
.o
file) during compilation.
After including the library header file, your code then can call the library’s
functions (e.g., in myprog.c
):
#include <stdio.h>
#include "mylib.h" // include library header file
int main(void) {
float val1, val2, ret;
printf("Enter two float values: ");
scanf("%f%f", &val1, &val2);
ret = bigger(val1, val2); // use a library function
printf("%f is the biggest\n", ret);
return 0;
}
#include syntax and the preprocessorNote that the When $ gcc -I/home/me/myincludes -c myprog.c |
-
To compile a program (
myprog.c
) that uses the library (mylib.o
) into a binary executable:$ gcc -o myprog myprog.c mylib.o
-
Or, if the library’s implementation files are available at compile time, then the program can be built directly from the program and library
.c
files:$ gcc -o myprog myprog.c mylib.c
-
Or, if the library is available as an archive or shared object file, then it can be linked in using
-l
, (-lmylib
: note that the library name islibmylib.[a,so]
, but only themylib
part is included in thegcc
command line):$ gcc -o myprog myprog.c -L. -lmylib
The
-L.
option specifies the path to thelibmylib.[so,a]
files (the.
after the-L
indicates that it should search the current directory). By default,gcc
will dynamically link a library if it can find a.so
version. See the Using C libraries section for more information about linking and link paths.
The program can then be run:
$ ./myprog
If you run the dynamically linked version of myprog
, you may encounter an error
that looks like this:
/usr/bin/ld: cannot find -lmylib collect2: error: ld returned 1 exit status
This error is saying that the runtime linker cannot find libmylib.so
at
runtime. To fix this problem, set your LD_LIBRARY_PATH
environment variable
to include the path to the libmylib.so
file. Subsequent runs of myprog
use
the path you add to LD_LIBRARY_PATH
to find the libmylib.so
file and load
it at runtime. For example, if libmylib.so
is in the /home/me/mylibs/
subdirectory, run this (just once) at the bash shell prompt to set the
LD_LIBRARY_PATH
environment variable:
$ export LD_LIBRARY_PATH=/home/me/mylibs:$LD_LIBRARY_PATH
2.9.7. Compiling C to Assembly, and Compiling and Linking Assembly and C Code
A compiler can compile C code to assembly code, and it can
compile assembly code into a binary form that links into a
binary executable program. We use IA32 assembly and gcc
as our example
assembly language and compiler, but this functionality is supported
by any C compiler, and most compilers support compiling to a number
of different assembly languages. See
Chapter 8
for details about assembly code and assembly programming.
Consider this very simple C program:
int main(void) {
int x, y;
x = 1;
x = x + 2;
x = x - 14;
y = x*100;
x = x + y * 6;
return 0;
}
The gcc
compiler will compile it into an IA32 assembly text file (.s
)
using the -S
command line option to specify compiling to assembly
and the -m32
command line option to specify generating IA32 assembly:
$ gcc -m32 -S simpleops.c # runs the assembler to create a .s text file
This command creates a file named simpleops.s
with the compiler’s IA32 assembly
translation of the C code. Because the .s
file is a text file, a user can
view it (and edit it) using any text editor. For example:
$ vim simpleops.s
Passing additional compiler flags provides directions to gcc
that it should
use certain features or optimizations in its translation of C to IA32 assembly
code.
An assembly code file, either one generated from gcc
or one
written by hand by a programmer, can be compiled by gcc
into binary machine
code form using the -c
option:
$ gcc -m32 -c simpleops.s # compiles to a relocatable object binary file (.o)
The resulting simpleops.o
file can then be linked into a binary
executable file (note: this requires that the 32-bit version of the system libraries
are installed on your system):
$ gcc -m32 -o simpleops simpleops.o # creates a 32-bit executable file
This command creates a binary executable file, simpleops
, for IA32 (and
x86-64) architectures.
The gcc
command line to build an executable file can include .o
and
.c
files that will be compiled and linked together to create the single
binary executable.
Systems provide utilities that allow users to view binary files. For example,
objdump
displays the machine code and assembly code mappings in .o
files:
$ objdump -d simpleops.o
This output can be compared to the assembly file:
$ cat simpleops.s
You should see something like this (we’ve annotated some of the assembly code with its corresponding code from the C program):
.file "simpleops.c"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -8(%ebp) # x = 1
addl $2, -8(%ebp) # x = x + 2
subl $14, -8(%ebp) # x = x - 14
movl -8(%ebp), %eax # load x into R[%eax]
imull $100, %eax, %eax # into R[%eax] store result of x*100
movl %eax, -4(%ebp) # y = x*100
movl -4(%ebp), %edx
movl %edx, %eax
addl %eax, %eax
addl %edx, %eax
addl %eax, %eax
addl %eax, -8(%ebp)
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",@progbits
Writing and Compiling Assembly Code
Programmers can write their own assembly code by hand and compile it with gcc
into a binary executable program. For example, to implement a function in
assembly, add code to a .s
file and use gcc
to compile it. The following
example shows the basic structure of a function in IA32 assembly. Such code
would be written in a file (e.g., myfunc.s
) for a function with the
prototype int myfunc(int param);
. Functions with more parameters or needing
more space for local variables may differ slightly in their preamble code.
.text # this file contains instruction code
.globl myfunc # myfunc is the name of a function
.type myfunc, @function
myfunc: # the start of the function
pushl %ebp # function preamble:
movl %esp, %ebp # the 1st three instrs set up the stack
subl $16, %esp
# A programmer adds specific IA32 instructions
# here that allocate stack space for any local variables
# and then implements code using parameters and locals to
# perform the functionality of the myfunc function
#
# the return value should be stored in %eax before returning
leave # function return code
ret
A C program that wanted to call this function would need to include its function prototype:
#include <stdio.h>
int myfunc(int param);
int main(void) {
int ret;
ret = myfunc(32);
printf("myfunc(32) is %d\n", ret);
return 0;
}
The following gcc
commands build an executable file
(myprog
) from myfunc.s
and main.c
source files:
$ gcc -m32 -c myfunc.s
$ gcc -m32 -o myprog myfunc.o main.c
2.10. Summary
In this chapter, we covered the C programming language in depth and discussed some advanced C programming topics as well. In the next chapter, we present two very helpful C debugging tools: the GNU GDB debugger for general-purpose C program debugging, and the Valgrind memory debugger for finding memory access errors in C programs. With these programming tools and knowledge of the core C programming language presented in this chapter, a C programmer can design powerful, efficient, and robust software.
3. C Debugging Tools
In this section, we introduce two debugging tools: the GNU debugger (GDB), which is useful for examining a program’s runtime state, and Valgrind (pronounced "Val-grinned"), a popular code profiling suite. Specifically, we introduce Valgrind’s Memcheck tool, which analyzes a program’s memory accesses to detect invalid memory usage, uninitialized memory usage, and memory leaks.
The GDB section includes two sample GDB sessions that illustrate commonly used GDB commands for finding bugs in programs. We also discuss some advanced GDB features, including attaching GDB to a running process, GDB and Makefiles, signal control in GDB, debugging at the assembly code level, and debugging multithreaded Pthreads programs.
The Valgrind section discusses memory access errors and why they can be so difficult to detect. It also includes an example run of Memcheck on a program with some bad memory access errors. The Valgrind suite includes other program profiling and debugging tools, which we cover in later chapters. For example, we cover the cache profiling tool Cachegrind in Chapter 11, and the function call profiling tool Callgrind in Chapter 12.
3.1. Debugging with GDB
GDB can help programmers find and fix bugs in their programs. GDB works with programs compiled in a variety of languages, but we focus on C here. A debugger is a program that controls the execution of another program (the program being debugged) — it allows programmers to see what their programs are doing as they run. Using a debugger can help programmers discover bugs and determine the causes of the bugs they find. Here are some useful actions that GDB can perform:
-
Start a program and step through it line by line
-
Pause the execution of a program when it reaches certain points in its code
-
Pause the execution of a program on user-specified conditions
-
Show the values of variables at the point in execution that a program is paused
-
Continue a program’s execution after a pause
-
Examine the program’s execution state at the point when it crashes
-
Examine the contents of any stack frame on the call stack
GDB users typically set breakpoints in their programs. A breakpoint specifies a point in the program where GDB will pause the program’s execution. When the executing program hits a breakpoint, GDB pauses its execution and allows the user to enter GDB commands to examine program variables and stack contents, step through the execution of the program one line at a time, add new breakpoints, and continue the program’s execution until it hits the next breakpoint.
Many Unix systems also provide the Data Display Debugger (DDD), an easy-to-use GUI wrapper around a command-line debugger program (GDB, for example). The DDD program accepts the same parameters and commands as GDB, but it provides a GUI interface with debugging menu options as well as the command line interface to GDB.
After discussing a few preliminaries about how to get started with GDB, we present two example GDB debugging sessions that introduce commonly used GDB commands in the context of finding different types of bugs. The first session (GDB on badprog.c), shows how to use GDB commands to find logic bugs in a C program. The second session (GDB on segfaulter.c) shows an example of using GDB commands to examine the program execution state at the point when a program crashes in order to discover the cause of the crash.
In the common GDB commands section, we describe commonly used GDB commands in more detail, showing more examples of some commands. In later sections, we discuss some advanced GDB features.
3.1.1. Getting Started with GDB
When debugging a program, it helps to compile it with the -g
option, which
adds extra debugging information to the binary executable file. This extra
information helps the debugger find program variables and functions in the
binary executable and enables it to map machine code instructions to lines of C
source code (the form of the program that the C programmer understands). Also,
when compiling for debugging, avoid compiler optimizations (for example, do not build
with -O2
). Compiler-optimized code is often very difficult to debug because
sequences of optimized machine code often do not clearly map back to C source
code. Although we cover the use of the -g
flag in the following sections, some
users may get better results with the -g3
flag, which can reveal extra debugging
information.
Here is an example gcc
command that will build a suitable executable for debugging with GDB:
$ gcc -g myprog.c
To start GDB, invoke it on the executable file. For example:
$ gdb a.out (gdb) # the gdb command prompt
When GDB starts, it prints the (gdb)
prompt, which allows the user to
enter GDB commands (such as setting breakpoints) before it starts running
the a.out
program.
Similarly, to invoke DDD on the executable file:
$ ddd a.out
Sometimes, when a program terminates with an error, the operating system dumps a core file containing information about the state of the program when it crashed. The contents of this core file can be examined in GDB by running GDB with the core file and the executable that generated it:
$ gdb core a.out (gdb) where # the where command shows point of crash
3.1.2. Example GDB Sessions
We demonstrate common features of GDB through two example sessions of using GDB to debug programs. The first is an example of using GDB to find and fix two bugs in a program, and the second is an example of using GDB to debug a program that crashes. The set of GDB commands that we demonstrate in these two example sessions includes:
Command | Description |
---|---|
|
Set a breakpoint |
|
Start program running from the beginning |
|
Continue execution of the program until it hits a breakpoint |
|
Quit the GDB session |
|
Allow program to execute the next line of C code and then pause it |
|
Allow program to execute the next line of C code; if the next line contains a function call, step into the function and pause |
|
List C source code around pause point or specified point |
|
Print out the value of a program variable (or expression) |
|
Print the call stack |
|
Move into the context of a specific stack frame |
Example Using GDB to Debug a Program (badprog.c)
The first example GDB session debugs the
badprog.c program. This program is supposed
to find the largest value in an array of int
values. However, when run, it
incorrectly finds that 17 is the largest value in the array instead of the
correct largest value, which is 60. This example shows how GDB can examine the
program’s runtime state to determine why the program is not computing the
expected result. In particular, this example debugging session reveals two bugs:
-
An error with loop bounds resulting in the program accessing elements beyond the bounds of the array.
-
An error in a function not returning the correct value to its caller.
To examine a program with GDB, first compile the program with -g
to add
debugging information to the executable:
$ gcc -g badprog.c
Next, run GDB on the binary executable program (a.out
). GDB initializes
and prints the (gdb)
prompt, where the user can enter GDB commands:
$ gdb ./a.out GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. ... (gdb)
At this point GDB has not yet started running the program. A common first
debugging step is to set a breakpoint in the main()
function to pause the
program’s execution right before it executes the first instruction in main()
.
The break
command sets a "breakpoint" (pauses the program) at a specified
location (in this case at the start of the main()
function):
(gdb) break main Breakpoint 1 at 0x8048436: file badprog.c, line 36.
The run
command tells GDB to start the program:
(gdb) run Starting program: ./a.out
If the program takes command line arguments, provide them after
the run
command (for example, run 100 200
would run a.out
with the command line
arguments 100
and 200
).
After entering run
, GDB starts the program’s execution at its beginning,
and it runs until it hits a breakpoint. Upon reaching a breakpoint,
GDB pauses the program before executing the line of code at the breakpoint,
and prints out the breakpoint number and source code
line associated with the breakpoint. In this example, GDB pauses the
program just before executing line 36 of the program. It then
prints out the (gdb)
prompt and waits for further instructions:
Breakpoint 1, main (argc=1, argv=0x7fffffffe398) at badprog.c:36 36 int main(int argc, char *argv[]) { (gdb)
Often when a program pauses at a breakpoint, the user wants to see the C source code
around the breakpoint. The GDB list
command displays the code surrounding
the breakpoint:
(gdb) list 29 } 30 return 0; 31 } 32 33 /***************************************/ 34 int main(int argc, char *argv[]) { 35 36 int arr[5] = { 17, 21, 44, 2, 60 }; 37 38 int max = arr[0];
Subsequent calls to list
display the next lines of source code following
these. list
can also be used with a specific line number (for example, list 11
)
or with a function name to list the source code at a specified part of
the program. For example:
(gdb) list findAndReturnMax 12 * array: array of integer values 13 * len: size of the array 14 * max: set to the largest value in the array 15 * returns: 0 on success and non-zero on an error 16 */ 17 int findAndReturnMax(int *array1, int len, int max) { 18 19 int i; 20 21 if (!array1 || (len <=0) ) {
The user may want to execute one line of code at a time after hitting
a breakpoint, examining program state after each line is executed.
The GDB next
command executes just the very next line of C code.
After the program executes this line of code, GDB pauses the program again.
The print
command prints the values of program variables.
Here are a few calls to next
and print
to show their effects on the next
two lines of execution. Note that the source code line listed after
a next
has not yet been executed — it shows the line where the program
is paused, which represents the line that will be executed next:
(gdb) next 36 int arr[5] = { 17, 21, 44, 2, 60 }; (gdb) next 38 int max = arr[0]; (gdb) print max $3 = 0 (gdb) print arr[3] $4 = 2 (gdb) next 40 if ( findAndReturnMax(arr, 5, max) != 0 ) { (gdb) print max $5 = 17 (gdb)
At this point in the program’s execution, the main function has initialized
its local variables arr
and max
and is about to make a call
to the findAndReturnMax()
function. The GDB next
command executes
the next full line of C source code. If
that line includes a function call, the full execution of that function
call and its return is executed as part of a single next
command.
A user who wants to observe the execution of the function
should issue GDB’s step
command instead of the next
command:
step
steps into a function call, pausing the program before the first line
of the function is executed.
Because we suspect that the bug in this program is related to
the findAndReturnMax()
function, we want to step into the function’s
execution rather than past it. So, when paused at line 40, the step
command will next pause the program at the start of
the findAndReturnMax()
(alternately, the user could set a breakpoint
at findAndReturnMax()
to pause the program’s execution at that point):
(gdb) next 40 if ( findAndReturnMax(arr, 5, max) != 0 ) { (gdb) step findAndReturnMax (array1=0x7fffffffe290, len=5, max=17) at badprog.c:21 21 if (!array1 || (len <=0) ) { (gdb)
The program is now paused inside the findAndReturnMax
function, whose
local variables and parameters are now in scope. The print
command
shows their values, and list
displays the C source code around
the pause point:
(gdb) print array1[0] $6 = 17 (gdb) print max $7 = 17 (gdb) list 16 */ 17 int findAndReturnMax(int *array1, int len, int max) { 18 19 int i; 20 21 if (!array1 || (len <=0) ) { 22 return -1; 23 } 24 max = array1[0]; 25 for (i=1; i <= len; i++) { (gdb) list 26 if(max < array1[i]) { 27 max = array1[i]; 28 } 29 } 30 return 0; 31 } 32 33 /***************************************/ 34 int main(int argc, char *argv[]) { 35
Because we think there is a bug related to this function,
we may want to set a breakpoint inside
the function so that we can examine the runtime state part way through
its execution. In particular, setting a breakpoint on the line
when max
is changed may help us see what this function is doing.
We can set a breakpoint at a specific line number in the
program (line 27) and use the cont
command to tell GDB to let
the application’s execution continue from its paused point. Only when
the program hits a breakpoint will GDB pause the program and grab control
again, allowing the user to enter other GDB commands.
(gdb) break 27 Breakpoint 2 at 0x555555554789: file badprog.c, line 27. (gdb) cont Continuing. Breakpoint 2, findAndReturnMax (array1=0x...e290,len=5,max=17) at badprog.c:27 27 max = array1[i]; (gdb) print max $10 = 17 (gdb) print i $11 = 1
The display
command asks GDB to automatically print out the same set of
program variables every time a breakpoint is hit. For example, we will display
the values of i
, max
, and array1[i]
every time the program hits a
breakpoint (in each iteration of the loop in findAndReturnMax()
):
(gdb) display i 1: i = 1 (gdb) display max 2: max = 17 (gdb) display array1[i] 3: array1[i] = 21 (gdb) cont Continuing. Breakpoint 2, findAndReturnMax (array1=0x7fffffffe290, len=5, max=21) at badprog.c:27 27 max = array1[i]; 1: i = 2 2: max = 21 3: array1[i] = 44 (gdb) cont Continuing. Breakpoint 2, findAndReturnMax (array1=0x7fffffffe290, len=5, max=21) at badprog.c:27 27 max = array1[i]; 1: i = 3 2: max = 44 3: array1[i] = 2 (gdb) cont Breakpoint 2, findAndReturnMax (array1=0x7fffffffe290, len=5, max=44) at badprog.c:27 27 max = array1[i]; 1: i = 4 2: max = 44 3: array1[i] = 60 (gdb) cont Breakpoint 2, findAndReturnMax (array1=0x7fffffffe290, len=5, max=60) at badprog.c:27 27 max = array1[i]; 1: i = 5 2: max = 60 3: array1[i] = 32767 (gdb)
We found our first bug! The value of array1[i]
is 32767, a value not
in the passed array, and the value of i
is 5, but 5 is not a valid
index into this array. Through GDB we discovered that the for
loop bounds
need to be fixed to i < len
.
At this point, we could exit the GDB session and fix this bug in the code.
To quit a GDB session, type quit
:
(gdb) quit The program is running. Exit anyway? (y or n) y $
After fixing this bug, recompiling, and running the program, it still does not
find the correct max value (it still finds that 17 is the max value and not 60).
Based on our previous GDB run, we may suspect that there is an error in
calling or returning from the findAndReturnMax()
function. We re-run
the new version of our program in GDB, this time setting a breakpoint at
the entry to the findAndReturnMax()
function:
$ gdb ./a.out ... (gdb) break main Breakpoint 1 at 0x7c4: file badprog.c, line 36. (gdb) break findAndReturnMax Breakpoint 2 at 0x748: file badprog.c, line 21. (gdb) run Starting program: ./a.out Breakpoint 1, main (argc=1, argv=0x7fffffffe398) at badprog.c:36 36 int main(int argc, char *argv[]) { (gdb) cont Continuing. Breakpoint 2, findAndReturnMax (array1=0x7fffffffe290, len=5, max=17) at badprog.c:21 21 if (!array1 || (len <=0) ) { (gdb)
If we suspect a bug in the arguments or return value of a function, it may be
helpful to examine the contents of the stack. The where
(or bt
, for
"backtrace") GDB command prints the current state of the stack. In this
example, the main()
function is on the bottom of the stack (in frame 1) and is executing a
call to findAndReturnMax()
at line 40. The findAndReturnMax()
function is on the top of the
stack (in frame 0), and is currently paused at line 21:
(gdb) where #0 findAndReturnMax (array1=0x7fffffffe290, len=5, max=17) at badprog.c:21 #1 0x0000555555554810 in main (argc=1, argv=0x7fffffffe398) at badprog.c:40
GDB’s frame
command moves into the context of any frame on the stack. Within
each stack frame context, a user can examine the local variables and parameters
in that frame. In this example, we move into stack frame 1 (the caller’s
context) and print out the values of the arguments that the main()
function
passes to findAndReturnMax()
(for example, arr
and max
):
(gdb) frame 1 #1 0x0000555555554810 in main (argc=1, argv=0x7fffffffe398) at badprog.c:40 40 if ( findAndReturnMax(arr, 5, max) != 0 ) { (gdb) print arr $1 = {17, 21, 44, 2, 60} (gdb) print max $2 = 17 (gdb)
The argument values look fine, so let’s check the findAndReturnMax()
function’s return value.
To do this, we add a breakpoint right before findAndReturnMax()
returns to see what value it computes for its max
:
(gdb) break 30 Breakpoint 3 at 0x5555555547ae: file badprog.c, line 30. (gdb) cont Continuing. Breakpoint 3, findAndReturnMax (array1=0x7fffffffe290, len=5, max=60) at badprog.c:30 30 return 0; (gdb) print max $3 = 60
This shows that the function has found the correct max value (60). Let’s
execute the next few lines of code and see what value the main()
function
receives:
(gdb) next 31 } (gdb) next main (argc=1, argv=0x7fffffffe398) at badprog.c:44 44 printf("max value in the array is %d\n", max); (gdb) where #0 main (argc=1, argv=0x7fffffffe398) at badprog.c:44 (gdb) print max $4 = 17
We found the second bug! The findAndReturnMax()
function identifies the
correct largest value in the passed array (60), but it doesn’t
return that value back to the main()
function.
To fix this error, we need to either change findAndReturnMax()
to return
its value of max
or add a "pass-by-pointer" parameter that the
function will use to modify the value of the main()
function’s max
local variable.
Example Using GDB to Debug a Program That Crashes (segfaulter.c)
The second example GDB session (run on the segfaulter.c program) demonstrates how GDB behaves when a program crashes and how we can use GDB to help discover why the crash occurs.
In this example, we just run the segfaulter
program in GDB and let it crash:
$ gcc -g -o segfaulter segfaulter.c $ gdb ./segfaulter (gdb) run Starting program: ./segfaulter Program received signal SIGSEGV, Segmentation fault. 0x00005555555546f5 in initfunc (array=0x0, len=100) at segfaulter.c:14 14 array[i] = i;
As soon as the program crashes, GDB pauses the program’s execution
at the point it crashes and grabs control.
GDB allows a user to issue commands to examine the
program’s runtime state at the point of the program crash, often leading
to discovering why the program crashed and how to fix the cause of the crash.
The GDB where
and list
commands are particularly useful for determining where
a program crashes:
(gdb) where #0 0x00005555555546f5 in initfunc (array=0x0, len=100) at segfaulter.c:14 #1 0x00005555555547a0 in main (argc=1, argv=0x7fffffffe378) at segfaulter.c:37 (gdb) list 9 int initfunc(int *array, int len) { 10 11 int i; 12 13 for(i=1; i <= len; i++) { 14 array[i] = i; 15 } 16 return 0; 17 } 18
This output tells us that the program crashes on line 14, in the
initfunc()
function. Examining the values of the parameters and
local variables on line 14 may tell us why it crashes:
(gdb) print i $2 = 1 (gdb) print array[i] Cannot access memory at address 0x4
The value of i
seems fine, but we see an error when trying to access
index i
of array
. Let’s print out the value of array
(the
value of the base address of the array) to see if that tells us anything:
(gdb) print array $3 = (int *) 0x0
We have found the cause of the crash! The base address of the array is
zero (or NULL
), and we know that dereferencing a null pointer
(via array[i]
) causes programs to crash.
Let’s see if we can figure out why the array
parameter is NULL
by looking in
the caller’s stack frame:
(gdb) frame 1 #1 0x00005555555547a0 in main (argc=1, argv=0x7fffffffe378) at segfaulter.c:37 37 if(initfunc(arr, 100) != 0 ) { (gdb) list 32 int main(int argc, char *argv[]) { 33 34 int *arr = NULL; 35 int max = 6; 36 37 if(initfunc(arr, 100) != 0 ) { 38 printf("init error\n"); 39 exit(1); 40 } 41 (gdb) print arr $4 = (int *) 0x0 (gdb)
Moving into the caller’s stack frame and printing out the value of the
arguments main()
passes to initfunc()
shows that the main()
function
passes a null pointer to the initfunc()
function. In other words, the
user forgot to allocate the arr
array prior to the call to initfunc()
.
The fix is to use the malloc()
function to allocate some space to arr
at line 34.
These two example GDB sessions illustrate commonly used commands for finding bugs in programs. In the next section, we discuss these and other GDB commands in more detail.
3.2. GDB Commands in Detail
In this section, we list common GDB commands and show some of their features with examples. We first discuss some common keyboard shortcuts that make GDB even easier to use.
3.2.1. Keyboard Shortcuts in GDB
GDB supports command line completion. A user can enter a unique prefix
of a command and hit the TAB
key, and GDB will try to complete the command
line. Also, a unique short abbreviation can be used
to issue many common GDB commands. For example, rather than entering the
command print x
, a user can just enter p x
to print out the value of x
,
or l
can be used for the list
command, or n
for next
.
The up and down arrow keys scroll through previous GDB command lines, eliminating the need to retype them each time.
Hitting the RETURN
key at the GDB prompt executes the most
recent previous command. This is particularly useful when
stepping through the execution with a sequence of next
or step
commands; just press RETURN
and GDB executes the next instruction.
3.2.2. Common GDB Commands
We summarize GDB’s most common commands here, grouping them by similar
functionality: commands for controlling program execution; commands
for evaluating the point in the program’s execution; commands
for setting and controlling breakpoints; and commands for printing program
state and evaluating expressions. The GDB help
command provides
information about all GDB commands:
-
help
: Help documentation for topics and GDB commands.help <topic or command> Shows help available for topic or command help breakpoints Shows help information about breakpoints help print Shows help information about print command
Commands for Execution Control Flow
-
break
: Set a breakpoint.break <func-name> Set breakpoint at start of function <func-name> break <line> Set breakpoint at line number <line> break <filename:><line> Set breakpoint at <line> in file <filename> break main Set breakpoint at beginning of main break 13 Set breakpoint at line 13 break gofish.c:34 Set breakpoint at line 34 in gofish.c break main.c:34 Set breakpoint at line 34 in main.c
Specifying a line in a specific file (as in
break gofish.c:34
) allows a user to set breakpoints in C programs that span several C source code files (.c files). This feature is particularly useful when the breakpoint being set is not in the same file as the code at the pause point of the program. -
run
: Start running the debugged program from the beginning.run <command line arguments> run Run with no command line arguments run 2 40 100 Run with 3 command line arguments: 2, 40, 100
-
continue
(cont
): Continue execution from breakpointcontinue
-
step
(s
): Execute the next line(s) of the program’s C source code, stepping into a function if a function call is executed on the line(s).step Execute next line (stepping into a function) step <count> Executes next <count> lines of program code step 10 Executes the next 10 lines (stepping into functions)
In the case of the
step <count>
command, if a line contains a function call, lines of the called function are counted in thecount
total of lines to step through. Thus,step <count>
may result in the program pausing inside a function that was called from the pause point at which thestep <count>
command was issued. -
next
(n
): Similar to thestep
command, but it treats a function call as a single line. In other words, when the next instruction contains a function call,next
does not step into the execution of the function but pauses the program after the function call returns (pausing the program at the next line in the code following the one with the function call).next Execute the next line next <count> Executes next <count> instructions
-
until
: Execute the program until it reaches the specified source code line number.until <line> Executes until hit line number <line>
-
quit
: Exit GDBquit
Commands for Examining the Execution Point and Listing Program Code
-
list
: List program source code.list Lists next few lines of program source code list <line> Lists lines around line number <line> of program list <start> <end> Lists line numbers <start> through <end> list <func-name> Lists lines around beginning of function <func-name> list 30 100 List source code lines 30 to 100
-
where
(backtrace
,bt
): Show the contents of the stack (the sequence of function calls at the current point in the program’s execution). Thewhere
command is helpful for pinpointing the location of a program crash and for examining state at the interface between function calls and returns, such as argument values passed to functions.where
-
frame
<frame-num>: Move into the context of stack frame number <frame-num>. As a default, the program is paused in the context of frame 0, the frame at the top of the stack. Theframe
command can be used to move into the context of another stack frame. Typically, GDB users move into another stack frame to print out the values of parameters and local variables of another function.frame <frame-num> Sets current stack frame to <frame-num> info frame Show state about current stack frame frame 3 Move into stack frame 3's context (0 is top frame)
Commands for Setting and Manipulating Breakpoints
-
break
: Set a breakpoint (there is more explanation about this command in Commands for execution control flow section above.)break <func-name> Set a breakpoint at start of a function break <line> Set a breakpoint at a line number break main Set a breakpoint at start of main break 12 Set a breakpoint at line 12 break file.c:34 Set a breakpoint at line 34 of file.c
-
enable
,disable
,ignore
,delete
,clear
: Enable, disable, ignore for some number of times, or delete one or more breakpoints. Thedelete
command deletes a breakpoint by its number. In contrast, using theclear
command deletes a breakpoint at a particular location in the source code.disable <bnums ...> Disable one or more breakpoints enable <bnums ...> Enable one or more breakpoints ignore <bpnum> <num> Don't pause at breakpoint <bpnum> the next <num> times it's hit delete <bpnum> Delete breakpoint number <bpnum> delete Deletes all breakpoints clear <line> Delete breakpoint at line <line> clear <func-name> Delete breakpoint at function <func-name> info break List breakpoint info (including breakpoint bnums) disable 3 Disable breakpoint number 3 ignore 2 5 Ignore the next 5 times breakpoint 2 is hit enable 3 Enable breakpoint number 3 delete 1 Delete breakpoint number 1 clear 124 Delete breakpoint at source code line 124
-
condition
: Set conditions on breakpoints. A conditional breakpoint is one that only transfers control to GDB when a certain condition is true. It can be used to pause at a breakpoint inside a loop only after some number of iterations (by adding a condition on the loop counter variable), or to pause the program at a breakpoint only when the value of a variable has an interesting value for debugging purposes (avoiding pausing the program at other times).condition <bpnum> <exp> Sets breakpoint number <bpnum> to break only when expression <exp> is true break 28 Set breakpoint at line 28 (in function play) info break Lists information about all breakpoints Num Type Disp Enb Address What 1 breakpoint keep y 0x080483a3 in play at gofish.c:28 condition 1 (i > 1000) Set condition on breakpoint 1
Commands for Examining and Evaluating Program State and Expressions
-
print
(p
): Display the value of an expression. Although GDB users typically print the value of a program variable, GDB will print the value of any C expression (even expressions that are not in the program code). The print command supports printing in different formats and supports operands in different numeric representations.print <exp> Display the value of expression <exp> p i print the value of i p i+3 print the value of (i+3)
To print in different formats:
print <exp> Print value of the expression as unsigned int print/x <exp> Print value of the expression in hexadecimal print/t <exp> Print value of the expression in binary print/d <exp> Print value of the expression as signed int print/c <exp> Print ASCII value of the expression print (int)<exp> Print value of the expression as unsigned int print/x 123 Prints 0x7b print/t 123 Print 1111011 print/d 0x1c Prints 28 print/c 99 Prints 'c' print (int)'c' Prints 99
To specify different numeric representations in the expression (the default for numbers is decimal representation):
0x prefix for hex: 0x1c 0b prefix for binary: 0b101 print 0b101 Prints 5 (default format is decimal) print 0b101 + 3 Prints 8 print 0x12 + 2 Prints 20 (hex 12 is 18 in decimal) print/x 0x12 + 2 Prints 0x14 (decimal 20 in hexadecimal format)
Sometimes, expressions may require explicit type casting to inform
print
how to interpret them. For example, here, recasting an address value to a specific type (int *
) is necessary before the address can be dereferenced (otherwise, GDB does not know how to dereference the address):print *(int *)0x8ff4bc10 Print int value at address 0x8ff4bc10
When using
print
to display the value of a dereferenced pointer variable, type casting is not necessary, because GDB knows the type of the pointer variable and knows how to dereference its value. For example, ifptr
is declared as anint *
, the int value it points to can be displayed like this:print *ptr Print the int value pointed to by ptr
To print out a value stored in a hardware register:
print $eax Print the value stored in the eax register
-
display
: Automatically display the value of an expression upon reaching a breakpoint. The expression syntax is the same as theprint
command.display <exp> Display value of <exp> at every breakpoint display i display array[i]
-
x
(examine memory): Display the contents of a memory location. This command is similar toprint
, but it interprets its argument as an address value that it dereferences to print the value stored at the address.x <memory address expression> x 0x5678 Examine the contents of memory location 0x5678 x ptr Examine the contents of memory that ptr points to x &temp Can specify the address of a variable (this command is equivalent to: print temp)
Like
print
,x
can display values in different formats (for example, as anint
, achar
, or a string).Examine’s Formatting is StickySticky formatting means that GDB remembers the current format setting, and applies it to subsequent calls to
x
that do not specify formatting. For example, if the user enters the commandx/c
, all subsequent executions ofx
without formatting will use the/c
format. As a result, formatting options only need to be explicitly specified with anx
command when the user desires changes in the memory address units, repetition, or display format of the most recent call tox
.In general,
x
takes up to three formatting arguments (x/nfu <memory address>
); the order in which they are listed does not matter:1. n: the repeat count (a positive integer value)
2. f: the display format (s: string, i: instruction, x: hex, d: decimal, t: binary, a: address, …)
3. u: the units format (number of bytes) (b: byte, h: 2 bytes, w: 4 bytes, g: 8 bytes)
Here are some examples (assume
s1 = "Hello There"
is at memory address0x40062d
):x/d ptr Print value stored at what ptr points to, in decimal x/a &ptr Print value stored at address of ptr, as an address x/wx &temp Print 4-byte value at address of temp, in hexadecimal x/10dh 0x1234 Print 10 short values starting at address 0x1234, in decimal x/4c s1 Examine the first 4 chars in s1 0x40062d 72 'H' 101 'e' 108 'l' 108 'l' x/s s1 Examine memory location associated with var s1 as a string 0x40062d "Hello There" x/wd s1 Examine the memory location assoc with var s1 as an int (because formatting is sticky, need to explicitly set units to word (w) after x/s command sets units to byte) 0x40062d 72 x/8d s1 Examine ASCII values of the first 8 chars of s1 0x40062d: 72 101 108 108 111 32 84 104
-
whatis
: show the type of an expression.whatis <exp> Display the data type of an expression whatis (x + 3.4) Displays: type = double
-
set
: assign/change the value of a program variable, or assign a value to be stored at a specific memory address, or in a specific machine register.set <variable> = <exp> Sets variable <variable> to expression <exp> set x = 123 * y Set var x's value to (123 * y)
-
info
: lists information about program state and debugger state. There are a large number ofinfo
options for obtaining information about the program’s current execution state and about the debugger. A few examples include:help info Shows all the info options help status Lists more info and show commands info locals Shows local variables in current stack frame info args Shows the argument variable of current stack frame info break Shows breakpoints info frame Shows information about the current stack frame info registers Shows register values info breakpoints Shows the status of all breakpoints
For more information about these and other GDB commands, see the GDB man page
(man gdb
) and the GNU Debugger
homepage.
3.3. Debugging Memory with Valgrind
Valgrind’s Memcheck debugging tool highlights heap memory errors in
programs. Heap memory is the part of a running program’s memory that is
dynamically allocated by calls to malloc()
and freed by calls to free()
in C
programs. The types of memory errors that Valgrind finds include:
-
Reading (getting) a value from uninitialized memory. For example:
int *ptr, x; ptr = malloc(sizeof(int) * 10); x = ptr[3]; // reading from uninitialized memory
-
Reading (getting) or writing (setting) a value at an unallocated memory location, which often indicates an array out-of-bounds error. For example:
ptr[11] = 100; // writing to unallocated memory (no 11th element) x = ptr[11]; // reading from unallocated memory
-
Freeing already freed memory. For example:
free(ptr); free(ptr); // freeing the same pointer a second time
-
Memory leaks. A memory leak is a chunk of allocated heap memory space that is not referred to by any pointer variable in the program, and thus it cannot be freed. That is, a memory leak occurs when a program loses the address of an allocated chunk of heap space. For example:
ptr = malloc(sizeof(int) * 10); ptr = malloc(sizeof(int) * 5); // memory leak of first malloc of 10 ints
Memory leaks can eventually cause the program to run out of heap memory
space, resulting in subsequent calls to malloc()
failing.
The other types of memory access errors, such as invalid reads
and writes, can lead to the program crashing or can result in some program
memory contents being modified in seemingly mysterious ways.
Memory access errors are some of the most difficult bugs to find in programs. Often a memory access error does not immediately result in a noticeable error in the program’s execution. Instead, it may trigger an error that occurs later in the execution, often in a part of the program that seemingly has little to do with the source of the error. At other times, a program with a memory access error may run correctly on some inputs and crash on other inputs, making the cause of the error difficult to find and fix.
Using Valgrind helps a programmer identify these difficult to find and fix heap memory access errors, saving significant amounts of debugging time and effort. Valgrind also assists the programmer in identifying any lurking heap memory errors that were not discovered in the testing and debugging of their code.
3.3.1. An Example Program with a Heap Memory Access Error
As an example of how difficult it can be to discover and fix programs with
memory access errors, consider the following small program
(bigfish.c). This program exhibits a
"write to unallocated heap memory" error in the second for
loop, when it
assigns values beyond the bounds of the bigfish
array (note: the listing
includes source code line numbers, and the print_array()
function definition is
not shown, but it behaves as described):
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 /* print size elms of array p with name name */
5 void print_array(int *p, int size, char *name) ;
6
7 int main(int argc, char *argv[]) {
8 int *bigfish, *littlefish, i;
9
10 // allocate space for two int arrays
11 bigfish = (int *)malloc(sizeof(int) * 10);
12 littlefish = (int *)malloc(sizeof(int) * 10);
13 if (!bigfish || !littlefish) {
14 printf("Error: malloc failed\n");
15 exit(1);
16 }
17 for (i=0; i < 10; i++) {
18 bigfish[i] = 10 + i;
19 littlefish[i] = i;
20 }
21 print_array(bigfish,10, "bigfish");
22 print_array(littlefish,10, "littlefish");
23
24 // here is a heap memory access error
25 // (write beyond bounds of allocated memory):
26 for (i=0; i < 13; i++) {
27 bigfish[i] = 66 + i;
28 }
29 printf("\nafter loop:\n");
30 print_array(bigfish,10, "bigfish");
31 print_array(littlefish,10, "littlefish");
32
33 free(bigfish);
34 free(littlefish); // program will crash here
35 return 0;
36 }
In the main()
function, the second for
loop causes a heap memory access error when it
writes to three indices beyond the bounds of the bigfish
array (to indices
10, 11, and 12). The program does not crash at the point where the error
occurs (at the execution of the second for
loop); instead, it crashes later in its
execution at the call to free(littlefish)
:
bigfish: 10 11 12 13 14 15 16 17 18 19 littlefish: 0 1 2 3 4 5 6 7 8 9 after loop: bigfish: 66 67 68 69 70 71 72 73 74 75 littlefish: 78 1 2 3 4 5 6 7 8 9 Segmentation fault (core dumped)
Running this program in GDB indicates that the program crashes with a segfault
at the call to free(littlefish)
. Crashing at this point may make the
programmer suspect that there is a bug with accesses to the littlefish
array.
However, the cause of the error is due to writes to the bigfish
array and
has nothing to do with errors in how the program accesses the littlefish
array.
The most likely reason that the program crashes is that the for
loop goes beyond the
bounds of the bigfish
array and overwrites memory between
the heap memory location of the last allocated element of bigfish
and the
first allocated element of littlefish
. The heap memory locations between
the two (and right before the first element of littlefish
) are used
by malloc()
to store meta-data about the heap memory allocated
for the littlefish
array. Internally, the free()
function uses this
meta-data to determine how much heap memory to free. The modifications
to indices 10
and 11
of bigfish
overwrite these meta-data
values, resulting in the program crash on the call to free(littlefish)
.
We note, however, that not all implementations of the malloc()
function use this strategy.
Because the program includes code to print
out littlefish
after the memory access error to bigfish
,
the cause of the error may be more obvious to the programmer:
the second for
loop is somehow modifying the contents of the
littlefish
array (its element 0 value "mysteriously" changes
from 0
to 78
after the loop). However,
even in this very small program, it may be difficult to find the real
error: if the program didn’t
print out littlefish
after the second for
loop with the memory
access error, or if the for
loop upper bound was 12
instead of 13
,
there would be no visible mysterious change to program variable values that
could help a programmer see that there is an error with how the program
accesses the bigfish
array.
In larger programs, a memory access error of this type could be in a very different part of the program code than the part that crashes. There also may be no logical association between variables used to access heap memory that has been corrupted and the variables that were used to erroneously overwrite that same memory; instead, their only association is that they happen to refer to memory addresses that are allocated close together in the heap. Note that this situation can vary from run to run of a program and that such behavior is often hidden from the programmer. Similarly, sometimes bad memory accesses will have no noticeable effect on a run of the program, making these errors hard to discover. Whenever a program seems to run fine for some input, but crashes on other input, this is a sign of a memory access error in the program.
Tools like Valgrind can save days of debugging time by
quickly pointing programmers to the source and type of heap memory
access errors in their code. In the previous program, Valgrind
delineates the point where the error occurs (when
the program accesses elements beyond the bounds of the bigfish
array).
The Valgrind error message includes the type of error,
the point in the program where the error occurs, and where
in the program the heap memory near the bad memory access was allocated.
For example, here is the information Valgrind will display when
the program executes line 27 (some details from the actual Valgrind
error message are omitted):
Invalid write at main (bigfish.c:27) Address is 0 bytes after a block of size 40 alloc'd by main (bigfish.c:11)
This Valgrind error message says that the program is writing to invalid
(unallocated) heap memory at line 27 and that this invalid memory is located
immediately after a block of memory that was allocated at line 11, indicating
that the loop is accessing some elements beyond the bounds of the allocated memory
in heap space to which bigfish
points.
A potential fix to this bug is to
either increase the number of bytes passed to malloc()
or change the
second for
loop bounds to avoid writing beyond the bounds of the allocated
heap memory space.
In addition to finding memory access errors in heap memory, Valgrind can also find some errors with stack memory accesses, such as using uninitialized local variables or trying to access stack memory locations that are beyond the bounds of the current stack. However, Valgrind does not detect stack memory access errors at the same granularity as it does with heap memory, and it does not detect memory access errors with global data memory.
A program can have memory access errors with stack and global memory that Valgrind cannot find. However, these errors result in erroneous program behavior or program crashing that is similar to the behavior that can occur with heap memory access errors. For example, overwriting memory locations beyond the bounds of a statically declared array on the stack may result in "mysteriously" changing the values of other local variables or may overwrite state saved on the stack that is used for returning from a function call, leading to a crash when the function returns. Experience using Valgrind for heap memory errors can help a programmer identify and fix similar errors with accesses to stack and global memory.
3.3.2. How to Use Memcheck
We illustrate some of the main features of Valgrind’s Memcheck memory analysis tool on
an example program, valgrindbadprog.c,
which contains several bad memory access errors (comments in the code describe
the type of error). Valgrind runs the Memcheck tool by default; we depend on
this default behavior in the code snippets that follow. You can explicitly specify the
Memcheck tool by using the --tool=memcheck
option. In later sections, we will
invoke other Valgrind profiling tools by invoking the --tool
option.
To run Memcheck, first compile the valgrindbadprog.c
program with the -g
flag to add debugging
information to the executable (e.g., a.out
) file. Then, run the
executable with valgrind
. Note that for non-interactive programs, it may be helpful to
redirect Valgrind’s output to a file for viewing after the program exits:
$ gcc -g valgrindbadprog.c $ valgrind -v ./a.out # re-direct valgrind (and a.out) output to file 'output.txt' $ valgrind -v ./a.out >& output.txt # view program and valgrind output saved to out file $ vim output.txt
Valgrind’s Memcheck tool prints out memory access errors and warnings as they occur
during the program’s execution. At the end of the program’s execution,
Memcheck also prints out a summary about any memory leaks in the program.
Even though memory leaks are important to fix, the other types of memory access errors
are much more critical to a program’s correctness. As a result, unless
memory leaks are causing a program to run out of heap memory space and
crash, a programmer should focus first on fixing these other types of
memory access errors before considering memory leaks. To view details
of individual memory leaks, use the --leak-check=yes
option.
When first using Valgrind, its output may seem a bit difficult
to parse. However, the output all follows the same basic format, and
once you know this format, it’s easier to understand the
information that Valgrind is displaying about heap memory
access errors and warnings. Here is an example Valgrind error from
a run of the valgrindbadprog.c
program:
==31059== Invalid write of size 1 ==31059== at 0x4006C5: foo (valgrindbadprog.c:29) ==31059== by 0x40079A: main (valgrindbadprog.c:56) ==31059== Address 0x52045c5 is 0 bytes after a block of size 5 alloc'd ==31059== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/...) ==31059== by 0x400660: foo (valgrindbadprog.c:18) ==31059== by 0x40079A: main (valgrindbadprog.c:56)
Each line of Valgrind output is prefixed with the process’s ID (PID) number (31059 in this example):
==31059==
Most Valgrind errors and warnings have the following format:
-
The type of error or warning.
-
Where the error occurred (a stack trace at the point in the program’s execution when the error occurs.)
-
Where heap memory around the error was allocated (usually the memory allocation related to the error.)
In the preceding example error, the first line indicates an invalid write to memory (writing to unallocated memory in the heap — a very bad error!):
==31059== Invalid write of size 1
The next few lines show the stack trace where the error occurred.
These indicate an invalid write occurred at line 29 in
function foo()
, which was called from function main()
at line 56:
==31059== Invalid write of size 1 ==31059== at 0x4006C5: foo (valgrindbadprog.c:29) ==31059== by 0x40079A: main (valgrindbadprog.c:56)
The remaining lines indicate where the heap space near the
invalid write was allocated in the program.
This section of Valgrind’s output says that
the invalid write was immediately after (0 bytes after
) a block of 5
bytes of heap memory space that was allocated by a call to malloc()
at line 18 in function foo()
, called by main()
at line 56:
==31059== Address 0x52045c5 is 0 bytes after a block of size 5 alloc'd ==31059== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/...) ==31059== by 0x400660: foo (valgrindbadprog.c:18) ==31059== by 0x40079A: main (valgrindbadprog.c:56)
The information from this error identifies that there is an unallocated heap memory write error in the program, and it directs the user to specific parts of the program where the error occurs (line 29) and where memory around the error was allocated (line 18). By looking at these points in the program, the programmer may see the cause of and the fix for the error:
18 c = (char *)malloc(sizeof(char) * 5);
...
22 strcpy(c, "cccc");
...
28 for (i = 0; i <= 5; i++) {
29 c[i] = str[i];
30 }
The cause is that the for
loop executes one time too many,
accessing c[5]
, which is beyond the end of array c
.
The fix is to either change the loop bounds at line 29 or to
allocate a larger array at line 18.
If examining the code around a Valgrind error is not sufficient
for a programmer to understand or fix the error, using GDB
might be helpful. Setting breakpoints around the points in the code
associated with the Valgrind errors can help a programmer
evaluate the program’s runtime state and understand the cause of
the Valgrind error. For example, by putting a breakpoint at line 29 and printing
the values of i
and str
, the programmer can see the
array out-of-bounds error when i
is 5. In this case, the
combination of using Valgrind and GDB helps the programmer
determine how to fix the memory access bugs that Valgrind finds.
Although this chapter has focused on Valgrind’s default Memcheck tool, we characterize some of Valgrind’s other capabilities later in the book, including the Cachegrind cache profiling tool (Chapter 11), the Callgrind code profiling tool (Chapter 12), and the Massif memory profiling tool (Chapter 12). For more information about using Valgrind, see the Valgrind homepage, and its online manual.
3.4. Advanced GDB Features
This section presents advanced GDB features, some of which may make sense only after reading the Operating Systems chapter.
3.4.1. GDB and make
GDB accepts the make
command to rebuild an executable during a debugging
session, and if the build is successful it will run the newly built program
(when issued the run
command).
(gdb) make (gdb) run
Building from within GDB is convenient for a user who has set many breakpoints
and has fixed one bug but wants to continue the debugging session. In this
case, rather than quitting GDB, recompiling, restarting GDB with the new
executable, and resetting all the breakpoints, a GDB user can run make
and
start debugging the new version of the program with all the breakpoints still
set. Keep in mind, however, that modifying the C source and recompiling by
running make
from within GDB may result in the breakpoints not being at the
same logical location in the new version of the program as in the old version
if source code lines have been added or deleted. When this problem occurs,
either exit GDB and restart the GDB session on the new executable, or use
disable
or delete
to disable or delete the old breakpoints and then break
to
set new breakpoints at the correct locations in the newly compiled version of
the program.
3.4.2. Attaching GDB to a Running Process
GDB supports debugging a program that is already running (rather than starting a program to run from within a GDB session) by attaching GDB to a running process. To do this, the user needs to get the process ID (PID) value:
-
Get the process’s PID using the
ps
shell command:# ps to get process's PID (lists all processes started in current shell): $ ps # list all processes and pipe through grep for just those named a.out: $ ps -A | grep a.out PID TTY TIME CMD 12345 pts/3 00:00:00 a.out
-
Start GDB and attach it to the specific running process (with PID 12345):
# gdb <executable> <pid> $ gdb a.out 12345 (gdb) # OR alternative syntax: gdb attach <pid> <executable> $ gdb attach 12345 a.out (gdb)
Attaching GDB to a process pauses it, and the user can issue GDB commands before continuing its execution.
Alternatively, a program can explicitly pause itself to wait for debugging by
calling kill(getpid(), SIGSTOP)
(as in the
attach_example.c example). When the
program pauses at this point, a programmer can attach GDB to the process to
debug it.
Regardless of how a program pauses, after GDB attaches and the user enters some
GDB commands, the program’s execution continues from its attach point using
cont
. If cont
doesn’t work, GDB may need to explicitly send the process a
SIGCONT
signal in order to continue its execution:
(gdb) signal SIGCONT
3.4.3. Following a Process on a Fork
When GDB debugs a program that calls the fork()
function to create a
new child process, GDB can be set to follow (to debug) either the parent
process or the child process, leaving the execution of the other
process unaffected by GDB. By default, GDB follows the parent after
a call to fork()
. To set GDB to follow the child process, instead,
use the set follow-fork-mode
command:
(gdb) set follow-fork-mode child # Set gdb to follow child on fork (gdb) set follow-fork-mode parent # Set gdb to follow parent on fork (gdb) show follow-fork-mode # Display gdb's follow mode
Setting breakpoints at fork()
calls in the program is
useful when the user wants to change this behavior during a GDB
session.
The attach_example.c example shows one
way to "follow" both processes on a fork: GDB follows the parent process after
the fork, and the child sends itself a SIGSTOP
signal to explicitly pause
after the fork, allowing the programmer to attach a second GDB process to the
child before it continues.
3.4.4. Signal Control
The GDB process can send signals to the target process it is debugging and can handle signals received by the target process.
GDB can send signals to the process it debugs using the signal
command:
(gdb) signal SIGCONT (gdb) signal SIGALARM ...
Sometimes a user would like GDB to perform some action when a signal is received
by the debugged process. For example, if a program tries to access memory with
a misaligned memory address for the type it is accessing, it receives a
SIGBUS
signal and usually exits. The default behavior of GDB on a SIGBUS
is
also to let the process exit. If, however, you want GDB to examine the program
state when it receives a SIGBUS
, you can specify that GDB handle the SIGBUS
signal differently using the handle
command (the info
command shows
additional information about how GDB handles signals received by the process
during debugging):
(gdb) handle SIGBUS stop # if program gets a SIGBUS, gdb gets control (gdb) info signal # list info on all signals (gdb) info SIGALRM # list info just for the SIGALRM signal
3.4.5. DDD Settings and Bug Fixes
Running DDD creates a .ddd
directory in your home directory, which it uses to
store its settings so that users don’t need to reset all their preferences
from scratch on each invocation. Some examples of saved settings include sizes
of subwindows, menu display options, and enabling windows to view register
values and assembly code.
Sometimes DDD hangs on startup with a "Waiting until GDB ready" message.
This often indicates an error in its saved settings files. The easiest
way to fix this is remove the .ddd
directory (you will lose all your saved
settings and need to reset them when it starts up again):
$ rm -rf ~/.ddd # Be careful when entering this command! $ ddd ./a.out
3.5. Debugging Assembly Code
In addition to high-level C and C++ debugging, GDB can debug a program at its assembly code level. Doing so enables GDB to list disassembled code sequences from functions, set breakpoints at the assembly instruction level, step through program execution one assembly instruction at a time, and examine the values stored in machine registers and in stack and heap memory addresses at runtime. We use IA32 as the example assembly language in this section, but the GDB commands presented here apply to any assembly language that GCC supports. We note that readers may find this subsection most useful after reading more about assembly code in later chapters.
We use the following short C program as an example:
int main(void) {
int x, y;
x = 1;
x = x + 2;
x = x - 14;
y = x * 100;
x = x + y * 6;
return 0;
}
To compile to an IA32 executable, use the -m32
flag:
$ gcc -m32 -o simpleops simpleops.c
Optionally, compiling with gcc
's -fno-asynchronous-unwind-tables
command line option generates IA32 code that’s a bit easier for the
programmer to read and understand:
$ gcc -m32 -fno-asynchronous-unwind-tables -o simpleops simpleops.c
3.5.1. Using GDB to Examine Binary Code
In this section we show some example GDB commands to debug the short C program at the assembly code level. The following table summarizes many of the commands this section demonstrates:
GDB command | Description |
---|---|
|
Set a breakpoint at the beginning of the function |
|
Set a breakpoint at memory address 0x0804851a |
|
Disassemble the |
|
Execute the next instruction |
|
Step into a function call (step instruction) |
|
List the register contents |
|
Print the value stored in register %eax |
|
Print out the value of an int at an address (%ebp+8) |
|
Examine the contents of memory at an address |
First, compile to IA32 assembly and run GDB on the IA32 executable
program simpleops
:
$ gcc -m32 -fno-asynchronous-unwind-tables -o simpleops simpleops.c
$ gdb ./simpleops
Then, set a breakpoint in main
, and then start running the program
with the run
command:
(gdb) break main
(gdb) run
The disass
command disassembles (lists the assembly code associated
with) parts of the program. For example, to view the assembly
instructions of the main function:
(gdb) disass main # Disassemble the main function
GDB allows a programmer to set breakpoints at individual assembly instructions by dereferencing the memory address of the instruction:
(gdb) break *0x080483c1 # Set breakpoint at instruction at 0x080483c1
The program’s execution can be executed one assembly instruction at
a time using si
or ni
to step into or execute the next instruction:
(gdb) ni # Execute the next instruction
(gdb) si # Execute next instruction; if it is a call instruction,
# then step into the function
The si
command steps into function calls, meaning that GDB will pause the
program at the first instruction of the called function. The ni
command skips over them, meaning that GDB will pause the program at
the next instruction following the call instruction (after the function
executes and returns to the caller).
The programmer can print values stored in machine registers using the print
command and the name of the register prefixed by $
:
(gdb) print $eax # print the value stored in register eax
The display
command automatically displays values upon reaching a breakpoint:
(gdb) display $eax
(gdb) display $edx
The info registers
command shows all of the values stored in the machine registers:
(gdb) info registers
3.5.2. Using DDD to Debug at the Assembly Level
The DDD debugger provides a graphical interface on top of another debugger (GDB in this case). It provides a nice interface for displaying assembly code, viewing registers, and stepping through IA32 instruction execution. Because DDD has separate windows for displaying disassembled code, register values, and the GDB command prompt, it’s often easier to use than GDB when debugging at the assembly code level.
To debug with DDD, substitute ddd
for gdb
:
$ ddd ./simpleops
The GDB prompt appears in the bottom window, where it accepts GDB commands at the prompt. Although it provides menu options and buttons for some GDB commands, often the GDB prompt at the bottom is easier to use.
DDD displays the assembly code view of a program by selecting the View → Machine Code Window menu option. That option creates a new subwindow with a listing of the program’s assembly code (you will likely want to resize this window to make it larger).
To view all of the program’s register values in a separate window, enable the Status → Registers menu option.
3.5.3. GDB Assembly Code Debugging Commands and Examples
Here are some details and examples of GDB commands that are
useful for debugging at the assembly code level (see the
Common GDB Commands section for
more details about some of these commands, particularly for the
print
and x
formatting options):
-
disass
: Disassemble code for a function or range of addresses.disass <func_name> # Lists assembly code for function disass <start> <end> # Lists assembly instructions between start & end address disass main # Disassemble main function disass 0x1234 0x1248 # Disassemble instructions between addr 0x1234 & 0x1248
-
break
: Set a breakpoint at an instruction address.break *0x80dbef10 # Sets breakpoint at the instruction at address 0x80dbef10
-
stepi
(si
),nexti
(ni
) :stepi, si # Execute next machine code instruction, # stepping into function call if it is a call instr nexti, ni # Execute next machine code instruction, # treating function call as a single instruction
-
info registers
: Lists all the register values. -
print
: Display the value of an expression.print $eax # Print the value stored in the eax register print *(int *)0x8ff4bc10 # Print int value stored at memory addr 0x8ff4bc10
-
x
Display the contents of the memory location given an address. Remember that the format ofx
is sticky, so it needs to be explicitly changed.(gdb) x $ebp-4 # Examine memory at address: (contents of register ebp)-4 # if the location stores an address x/a, an int x/wd, ... (gdb) x/s 0x40062d # Examine the memory location 0x40062d as a string 0x40062d "Hello There" (gdb) x/4c 0x40062d # Examine the first 4 char memory locations # starting at address 0x40062d 0x40062d 72 'H' 101 'e' 108 'l' 108 'l' (gdb) x/d 0x40062d # Examine the memory location 0x40062d in decimal 0x40062d 72 # NOTE: units is 1 byte, set by previous x/4c command (gdb) x/wd 0x400000 # Examine memory location 0x400000 as 4 bytes in decimal 0x400000 100 # NOTE: units was 1 byte set, need to reset to w
-
set
: Set the contents of memory locations and registers.set $eax = 10 Set the value of register eax to 10 set $esp = $esp + 4 Pop a 4-byte value off the stack set *(int *)0x8ff4bc10 = 44 Store 44 at address 0x8ff4bc10
-
display
: Print an expression each time a breakpoint is hit.display $eax Display value of register eax
3.5.4. Quick Summary of Common Commands for Assembly Debugging
$ ddd ./a.out (gdb) break main (gdb) run (gdb) disass main # Disassemble the main function (gdb) break sum # Set a breakpoint at the beginning of a function (gdb) cont # Continue execution of the program (gdb) break *0x0804851a # Set a breakpoint at memory address 0x0804851a (gdb) ni # Execute the next instruction (gdb) si # Step into a function call (step instruction) (gdb) info registers # List the register contents (gdb) p $eax # Print the value stored in register %eax (gdb) p *(int *)($ebp+8) # Print out value of an int at addr (%ebp+8) (gdb) x/d $ebp+8 # Examine the contents of memory at the given # address (/d: prints the value as an int) (gdb) x/s 0x0800004 # Examine contents of memory at address as a string (gdb) x/wd 0xff5634 # After x/s, the unit size is 1 byte, so if want # to examine as an int specify both the width w & d
3.6. Debugging Multithreaded Programs with GDB
Debugging multithreaded programs can be tricky due to the multiple streams of execution and due to interactions between the concurrently executing threads. In general, here are some things to make debugging multithreaded programs a bit easier:
-
When possible, try to debug a version of the program with as few threads as possible.
-
When adding debugging
printf
statements to the code, print out the executing thread’s ID to identify which thread is printing and end the line with a\n
. -
Limit the amount of debug output by having only one of the threads print its information and common information. For example, if each thread stores its logical ID in a local variable named
my_tid
, a conditional statement on the value ofmy_tid
can be used to limit printing debug output to one thread, as illustrated in the following example:
if (my_tid == 1) {
printf("Tid:%d: value of count is %d and my i is %d\n", my_tid, count, i);
fflush(stdout);
}
3.6.1. GDB and Pthreads
The GDB debugger has specific support for debugging threaded programs, including setting breakpoints for individual threads and examining the stacks of individual threads. One thing to note when debugging Pthreads programs in GDB is that there are at least three identifiers for each thread:
-
The Pthreads library’s ID for the thread (its
pthread_t
value). -
The operating system’s lightweight process (LWP) ID value for the thread. This ID is used in part for the OS to keep track of this thread for scheduling purposes.
-
The GDB ID for the thread. This is the ID to use when specifying a specific thread in GDB commands.
The specific relationship between thread IDs can differ from one OS and Pthreads library implementation to another, but on most systems there is a one-to-one-to-one correspondence between a Pthreads ID, an LWP ID, and a GDB thread ID.
We present a few GDB basics for debugging threaded programs in GDB. See the following for more information about debugging threaded programs in GDB.
3.6.2. GDB Thread-Specific Commands:
-
Enable printing thread start and exit events:
set print thread-events
-
List all existing threads in the program (the GDB thread number is the first value listed and the thread that hit the breakpoint is denoted with an
*
):info threads
-
Switch to a specific thread’s execution context (for example, to examine its stack when executing
where
), specify the thread by its thread ID:thread <threadno> thread 12 # Switch to thread 12's execution context where # Thread 12's stack trace
-
Set a breakpoint for just a particular thread. Other threads executing at the point in the code where the breakpoint is set will not trigger the breakpoint to pause the program and print the GDB prompt:
break <where> thread <threadno> break foo thread 12 # Break when thread 12 executes function foo
-
To apply a specific GDB command to all or to a subset of threads, by adding the prefix
thread apply <threadno | all>
to a GDB command, wherethreadno
refers to the GDB thread ID:thread apply <threadno|all> command
This doesn’t work for every GDB command, setting breakpoints in particular, so use this syntax instead for setting thread-specific breakpoints:
break <where> thread <threadno>
Upon reaching a breakpoint, by default, GDB pauses all threads until the user
types cont
. The user can change the behavior to request that GDB only pause
the threads that hit a breakpoint, allowing other threads to continue executing.
3.6.3. Examples:
We show some GDB commands and output from a GDB run on a multithreaded executable compiled from the file racecond.c.
This errant program lacks synchronization around accesses to the shared
variable count
. As a result, different runs of the program produce different
final values for count
, indicating a race condition. For example, here are
two runs of the program with five threads that produce different results:
./a.out 5 hello I'm thread 0 with pthread_id 139673141077760 hello I'm thread 3 with pthread_id 139673115899648 hello I'm thread 4 with pthread_id 139673107506944 hello I'm thread 1 with pthread_id 139673132685056 hello I'm thread 2 with pthread_id 139673124292352 count = 159276966 ./a.out 5 hello I'm thread 0 with pthread_id 140580986918656 hello I'm thread 1 with pthread_id 140580978525952 hello I'm thread 3 with pthread_id 140580961740544 hello I'm thread 2 with pthread_id 140580970133248 hello I'm thread 4 with pthread_id 140580953347840 count = 132356636
The fix is to put accesses to count
inside a critical section,
using a pthread_mutex_t
variable.
If the user was not able to see this fix by examining the C code alone,
running in GDB and putting breakpoints around accesses
to the count
variable may help the programmer discover the problem.
Here are some example commands from a GDB run of this program:
(gdb) break worker_loop # Set a breakpoint for all spawned threads (gdb) break 77 thread 4 # Set a breakpoint just for thread 4 (gdb) info threads # List information about all threads (gdb) where # List stack of thread that hit the breakpoint (gdb) print i # List values of its local variable i (gdb) thread 2 # Switch to different thread's (2) context (gdb) print i # List thread 2's local variables i
Shown in the example that follows is partial output of a GDB run of the racecond
program with 3 threads (run 3
), showing examples of
GDB thread commands in the context of a GDB debugging session.
The main thread is always GDB thread number 1, and the three spawned threads are
GDB threads 2 to 4.
When debugging multithreaded programs, the GDB user must keep track of which
threads exist when issuing commands. For example, when the breakpoint in
main
is hit, only thread 1 (the main thread) exists. As a result, the GDB
user must wait until threads are created before setting a breakpoint for only a
specific thread (this example shows setting a breakpoint for thread 4 only at
line 77 in the program). In viewing this output, note when breakpoints are
set and deleted, and note the value of each thread’s local variable i
when
thread contexts are switched with GDB’s thread
command:
$ gcc -g racecond.c -pthread $ gdb ./a.out (gdb) break main Breakpoint 1 at 0x919: file racecond.c, line 28. (gdb) run 3 Starting program: ... [Thread debugging using libthread_db enabled] ... Breakpoint 1, main (argc=2, argv=0x7fffffffe388) at racecond.c:28 28 if (argc != 2) { (gdb) list 76 71 myid = *((int *)arg); 72 73 printf("hello I'm thread %d with pthread_id %lu\n", 74 myid, pthread_self()); 75 76 for (i = 0; i < 10000; i++) { 77 count += i; 78 } 79 80 return (void *)0; (gdb) break 76 Breakpoint 2 at 0x555555554b06: file racecond.c, line 76. (gdb) cont Continuing. [New Thread 0x7ffff77c4700 (LWP 5833)] hello I'm thread 0 with pthread_id 140737345505024 [New Thread 0x7ffff6fc3700 (LWP 5834)] hello I'm thread 1 with pthread_id 140737337112320 [New Thread 0x7ffff67c2700 (LWP 5835)] [Switching to Thread 0x7ffff77c4700 (LWP 5833)] Thread 2 "a.out" hit Breakpoint 2, worker_loop (arg=0x555555757280) at racecond.c:76 76 for (i = 0; i < 10000; i++) { (gdb) delete 2 (gdb) break 77 thread 4 Breakpoint 3 at 0x555555554b0f: file racecond.c, line 77. (gdb) cont Continuing. hello I'm thread 2 with pthread_id 140737328719616 [Switching to Thread 0x7ffff67c2700 (LWP 5835)] Thread 4 "a.out" hit Breakpoint 3, worker_loop (arg=0x555555757288) at racecond.c:77 77 count += i; (gdb) print i $2 = 0 (gdb) cont Continuing. [Switching to Thread 0x7ffff67c2700 (LWP 5835)] Thread 4 "a.out" hit Breakpoint 3, worker_loop (arg=0x555555757288) at racecond.c:77 77 count += i; (gdb) print i $4 = 1 (gdb) thread 3 [Switching to thread 3 (Thread 0x7ffff6fc3700 (LWP 5834))] #0 0x0000555555554b12 in worker_loop (arg=0x555555757284) at racecond.c:77 77 count += i; (gdb) print i $5 = 0 (gdb) thread 2 [Switching to thread 2 (Thread 0x7ffff77c4700 (LWP 5833))] #0 worker_loop (arg=0x555555757280) at racecond.c:77 77 count += i; (gdb) print i $6 = 1
3.7. Summary
This chapter concludes our coverage of the C programming language. Compared to other high-level programming languages, C is a relatively small programming language with a few basic constructs from which a programmer builds their program. Because C language abstractions are closer to the underlying machine code executed by the computer, a C programmer can write code that runs much more efficiently than equivalent code written using the higher-level abstractions provided by other programming languages. In particular, a C programmer has much more control over how their program uses memory, which can have a significant impact on the program’s performance. C is the language of computer systems programming where low-level control and efficiency are crucial.
In subsequent chapters we use C examples to illustrate how a computer system is designed to run a program.
4. Binary and Data Representation
From simple stone tablets and cave paintings to written words and phonograph grooves, humans have perpetually sought to record and store information. In this chapter, we’ll characterize how the latest of humanity’s big storage breakthroughs, digital computing, represents information. We also illustrate how to interpret meaning from digital data.
Modern computers utilize a variety of media for storing information (e.g., magnetic disks, optical discs, flash memory, tapes, and simple electrical circuits). We characterize storage devices later in Chapter 11; however, for this discussion, the medium is largely irrelevant — whether there’s a laser scanning the surface of a DVD or a disk head gliding over a magnetic platter, the output from the storage device is ultimately a sequence of electrical signals. To simplify the circuitry, each signal is binary, meaning it can take only one of two states: the absence of a voltage (interpreted as zero) and the presence of a voltage (one). This chapter explores how systems encode information into binary, regardless of the original storage medium.
In binary, each signal corresponds to one bit (binary digit) of information: a zero or a one. It may be surprising that all data can be represented using just zeros and ones. Of course, as the complexity of information increases, so does the number of bits needed to represent it. Luckily, the number of unique values doubles for each additional bit in a bit sequence, so a sequence of N bits can represent 2N unique values.
Figure 29 illustrates the growth in the number of representable values as the length of a bit sequence increases. A single bit can represent two values: 0 and 1. Two bits can represent four values: both of the one-bit values with a leading 0 (00 and 01), and both of the one-bit values with a leading 1 (10 and 11). The same pattern applies for any additional bit that extends an existing bit sequence: the new bit can be a 0 or 1, and in either case, the remaining bits represent the same range of values they did prior to the new bit being added. Thus, adding additional bits exponentially increases the number of values the new sequence can represent.
Because a single bit doesn’t represent much information, storage systems commonly group bits into longer sequences for storing more interesting values. The most ubiquitous grouping is a byte, which is a collection of eight bits. One byte represents 28 = 256 unique values (0-255) — enough to enumerate the letters and common punctuation symbols of the English language. Bytes are the smallest unit of addressable memory in a computer system, meaning a program can’t ask for fewer than eight bits to store a variable.
Modern CPUs also typically define a word as either 32 bits or 64 bits, depending on the design of the hardware. The size of a word determines the "default" size a system’s hardware uses to move data from one component to another (e.g., between memory and registers). These larger sequences are necessary for storing numbers, since programs often need to count higher than 256!
If you’ve programmed in C, you know that you must
declare a variable
before using it. Such declarations inform the C compiler of two important
properties regarding the variable’s binary representation: the number of bits
to allocate for it, and the way in which the program intends to interpret those
bits. Conceptually, the number of bits is straightforward, as the compiler
simply looks up how many bits
are associated with the declared type (e.g., a char
is one byte) and
associates that amount of memory with the variable. The interpretation of a
sequence of bits is much more conceptually interesting. All data in a
computer’s memory is stored as bits, but bits have no inherent meaning. For
example, even with just a single bit, you could interpret the bit’s two values
in many different ways: up and down, black and white, yes and no, on and off,
etc.
Extending the length of a bit sequence expands the range of its
interpretations. For example, a char
variable uses the American Standard
Code for Information Interchange (ASCII) encoding standard, which defines how
an eight-bit binary value corresponds to English letters and punctuation symbols.
Table 13 shows a small subset of the ASCII standard (for a full reference,
run man ascii
on the command line). There’s no special reason why the character
'X'
needs to correspond to 01011000, so don’t bother memorizing the table.
What matters is that every program storing letters agrees on their bit sequence
interpretation, which is why ASCII is defined by a standards committee.
Binary value | Character interpretation | Binary value | Character interpretation |
---|---|---|---|
01010111 |
W |
00100000 |
space |
01011000 |
X |
00100001 |
! |
01011001 |
Y |
00100010 |
" |
01011010 |
Z |
00100011 |
# |
Any information can be encoded in binary, including rich data like graphics and audio. For example, suppose that an image encoding scheme defines 00, 01, 10, and 11 to correspond to the colors white, orange, blue, and black. Figure 30 illustrates how we might use this simple two-bit encoding strategy to draw a crude image of a fish using only 12 bytes. In part a, each cell of the image equates to one two-bit sequence. Parts b and c show the corresponding binary encoding as two-bit and byte sequences, respectively. Although this example encoding scheme is simplified for learning purposes, the general idea is similar to what real graphics systems use, albeit with many more bits for a wider range of colors.
Having just introduced two encoding schemes, the same bit sequence, 01011010,
might mean the character 'Z'
to a text editor, whereas a graphics program
might interpret it as part of a fish’s tail fin. Which interpretation is
correct depends on the context. Despite the underlying bits being the same,
humans often find some interpretations much easier to comprehend than others
(e.g., perceiving the fish as colored cells rather than a table of bytes).
The remainder of this chapter largely deals with representing and manipulating binary numbers, but the overall point bears repeating: all information is stored in a computer’s memory as 0’s and 1’s, and it’s up to programs or the people running them to interpret the meaning of those bits.
4.1. Number Bases and Unsigned Integers
Having seen that binary sequences can be interpreted in all sorts of nonnumerical ways, let’s turn our attention to numbers. Specifically, we’ll start with unsigned numbers, which can be interpreted as zero or positive, but they can never be negative (they have no sign).
4.1.1. Decimal Numbers
Rather than starting with binary, let’s first examine a number system we’re already comfortable using, the decimal number system, which uses a base of 10. Base 10 implies two important properties for the interpretation and representation of decimal values.
-
Any individual digit in a base 10 number stores one of 10 unique values (0-9). To store a value larger than 9, the value must carry to an additional digit to the left. For example, if one digit starts at its maximum value (9) and we add 1 to it, the result requires two digits (9 + 1 = 10). The same pattern holds for any digit, regardless of its position within a number (e.g., 5080 + 20 = 5100).
-
The position of each digit in the number determines how important that digit is to the overall value of the number. Labeling the digits from right to left as d0, d1, d2, etc., each successive digit contributes a factor of ten more than the next. For example, take the value 8425 (Figure 31).
For the example value 8425, the 5 in the "ones" place contributes 5 (5 × 100). The 2 in the "tens" place contributes 20 (2 × 101). The 4 in the "hundreds" place contributes 400 (4 × 102), and, finally, the 8 in the "thousands" place contributes 8000 (8 × 103). More formally, one could express 8425 as
This pattern of increasing exponents applied to a base of 10 is the reason why it’s called a base 10 number system. Assigning position numbers to digits from right to left starting with d0 implies that each digit di contributes 10i to the overall value. Thus, the overall value of any N-digit decimal number can be expressed as:
Fortunately, as we’ll soon see, a very similar pattern applies to other number systems.
Distinguishing Number Bases
Now that we’re about to introduce a second number system, one potential problem is a lack of clarity regarding how to interpret a number. For example, consider the value 1000. It’s not immediately obvious whether you should interpret that number as a decimal value (i.e., one thousand) or a binary value (i.e., eight, for reasons explained soon). To help clarify, the remainder of this chapter will explicitly attach a prefix to all nondecimal numbers. We’ll soon introduce binary, for which the prefix is 0b, and hexadecimal, which uses a prefix of 0x. Therefore, if you see 1000, you should assume it’s a decimal "one thousand", and if you see 0b1000, you should interpret it as a binary number, in this case the value "eight". |
4.1.2. Unsigned Binary Numbers
While you may never have considered the specific formula describing decimal numbers as powers of 10, the concept of { ones, tens, hundreds, etc. } places should hopefully feel comfortable. Luckily, similar terminology applies to other number systems, like binary. Of course, the base is different in other number systems, so each digit position contributes a different amount to its numerical value.
A binary number system uses a base of 2 instead of decimal’s 10. Analyzing it the same way that we just did for decimal reveals several parallels (with 2 substituted for 10):
-
Any individual bit in a base 2 number stores one of two unique values (0 or 1). To store a value larger than 1, the binary encoding must carry to an additional bit to the left. For example, if one bit starts at its maximum value (1) and we add 1 to it, the result requires two bits (1 + 1 = 0b10). The same pattern holds for any bit, regardless of its position within a number (e.g., 0b100100 + 0b100 = 0b101000).
-
The position of each bit in the number determines how important that bit is to the numerical value of the number. Labeling the digits from right to left as d0, d1, d2, etc., each successive bit contributes a factor of two more than the next.
The first point implies that counting in binary follows the same pattern as decimal: by simply enumerating the values and adding digits (bits). Because this section focuses on unsigned numbers (zero and positives only), it’s natural to start counting from zero. Table 14 shows how to count the first few natural numbers in binary. As you can see from the table, counting in binary quickly increases the number of digits. Intuitively, this growth makes sense, since each binary digit (two possible values) represents less information than a decimal digit (10 possible values).
Binary value | Decimal value |
---|---|
0 |
0 |
1 |
1 |
10 |
2 |
11 |
3 |
100 |
4 |
101 |
5 |
… |
… |
The second point about labeling digits looks really familiar! In fact, it’s so similar to decimal that it leads to a nearly identical formula for interpreting a binary number. Simply replace the 10 at the base of each exponent with a 2:
Applying this formula yields the unsigned interpretation of any binary number. For example, take 0b1000:
Here’s a longer one-byte example, 0b10110100:
4.1.3. Hexadecimal
Thus far, we’ve examined two number systems, decimal and binary. Decimal is notable due to its comfort for humans, whereas binary matches the way data is stored in hardware. It’s important to note that they are equivalent in their expressive power. That is, there’s no number you can represent in one system that you can’t represent in the other. Given their equivalence, it may surprise you that we’re going to discuss one more number system: the base 16 hexadecimal system.
With two perfectly good number systems, you may wonder why we need another. The answer is primarily convenience. As shown in Table 14, binary bit sequences quickly grow to a large number of digits. Humans tend to have a tough time making sense of long sequences containing only 0’s and 1’s. And whereas decimal is more compact, its base of 10 is a mismatch with binary’s base 2.
Decimal doesn’t easily capture the range that can be expressed using a fixed number of bits. For example, suppose that an old computer uses 16-bit memory addresses. Its valid addresses range from 0b0000000000000000 to 0b1111111111111111. Represented in decimal, the addresses range from 0 to 65535. Clearly, the decimal representations are more compact than the long binary sequences, but unless you memorize their conversions, it’s more difficult to reason about the decimal numbers. Both problems only get worse on modern devices, which use 32- or 64-bit addresses!
These long bit sequences are where hexadecimal’s base 16 shines. The large base allows each digit to represent enough information for hexadecimal numbers to be compact. Furthermore, because the base is itself a power of two (24 = 16), it’s easy to map hexadecimal to binary, and vice versa. For the sake of completeness, let’s analyze hexadecimal in the same way as decimal and binary:
-
Any individual digit in a base 16 number stores one of 16 unique values. Having more than 10 values presents a new challenge for hexadecimal — traditional base 10 digits stop at a maximum value of 9. By convention, hexadecimal uses letters to represent values larger than 9, with A for 10, B for 11, up to F for 15. Like the other systems, to store a value larger than 15, the number must carry to an additional digit to the left. For example, if one digit starts at its maximum value (F) and we add 1 to it, the result requires two digits (0xF + 0x1 = 0x10; note that we use 0x to indicate hexadecimal numbers).
-
The position of each digit in the number determines how important that digit is to the numerical value of the number. Labeling the digits from right to left as d0, d1, d2, etc., each successive digit contributes a factor of 16 more than the next.
Unsurprisingly, the same trusty formula for interpreting a number applies to hexadecimal, with 16 as the base:
For example, to determine the decimal value of 0x23C8:
Hexadecimal Misconception
You may not encounter hexadecimal numbers frequently as you’re first learning
about systems programming. In fact, the only context where you’re likely to
find them is in representing memory addresses. For example, if you print the
address of a variable using the Many students often begin to equate memory addresses (e.g., C pointer variables) with hexadecimal. While you may get used to seeing addresses represented that way, keep in mind that they are still stored using binary in the hardware, just like all other data! |
4.1.4. Storage Limitations
Conceptually, there are infinitely many unsigned integers. In practice, a programmer must choose how many bits to dedicate to a variable prior to storing it, for a variety of reasons:
-
Before storing a value, a program must allocate storage space for it. In C, declaring a variable tells the compiler how much memory it needs based on its type.
-
Hardware storage devices have finite capacity. Whereas a system’s main memory is typically large and unlikely to be a limiting factor, storage locations inside the CPU that are used as temporary "scratch space" (i.e., registers) are more constrained. A CPU uses registers that are limited to its word size (typically 32 or 64 bits, depending on the CPU architecture).
-
Programs often move data from one storage device to another (e.g., between CPU registers and main memory). As values get larger, storage devices need more wires to communicate signals between them. Hence, expanding storage increases the complexity of the hardware and leaves less physical space for other components.
The number of bits used to store an integer dictates the range of its representable values. Figure 32 depicts how we might conceptualize infinite and finite unsigned integer storage spaces.
Attempting to store a larger value to a variable than the variable’s size allows is known as integer overflow. This chapter defers the details of overflow to a later section. For now, think of it like a car’s odometer that "rolls over" back to zero if it attempts to increase beyond its maximum value. Similarly, subtracting one from zero yields the maximum value.
At this point, a natural question to ask about unsigned binary is "What’s the largest positive value that N bits can store?" In other words, given a sequence of N bits that are all 1, what value does the sequence represent? Reasoning about this question informally, the analysis in the previous section shows that N bits yield 2N unique bit sequences. Since one of those sequences must represent the number 0, that leaves 2N - 1 positive values ranging from 1 to 2N - 1. Thus, the maximum value for an unsigned binary number of N bits must be 2N - 1.
For example, 8 bits provide 28 = 256 unique sequences. One of those sequences, 0b00000000, is reserved for 0, leaving 255 sequences for storing positive values. Therefore, an 8-bit variable represents the positive values 1 through 255, the largest of which is 255.
4.2. Converting Between Bases
You’re likely to encounter each of the three number bases we’ve introduced in this chapter in different contexts. In some cases, you may need to convert from one base to another. This section starts by showing how to convert between binary and hexadecimal, since those two map easily to each other. After that, we’ll explore conversions to and from decimal.
4.2.1. Converting Between Binary and Hexadecimal
Because the bases for both binary and hexadecimal are powers of 2, converting between the two is relatively straightforward. Specifically, each hexadecimal digit holds one of 16 unique values, and four bits also represents 24 = 16 unique values, making their expressive power equivalent. Table 15 enumerates the one-to-one mapping between any sequence of four bits and any single hexadecimal digit.
Binary | Hexadecimal | Binary | Hexadecimal | |
---|---|---|---|---|
0000 |
0 |
1000 |
8 |
|
0001 |
1 |
1001 |
9 |
|
0010 |
2 |
1010 |
A |
|
0011 |
3 |
1011 |
B |
|
0100 |
4 |
1100 |
C |
|
0101 |
5 |
1101 |
D |
|
0110 |
6 |
1110 |
E |
|
0111 |
7 |
1111 |
F |
Note that the content of Table 15 is equivalent to simply counting from 0 to 15 in both number systems, so there’s no need to memorize it. Armed with this mapping, you can convert any number of consecutive bits or hex digits in either direction:
-
Converting 0xB491 to binary, simply substitute the corresponding binary value for each hexadecimal digit:
B 4 9 1 1011 0100 1001 0001 -> 0b1011010010010001
-
Converting 0b1111011001 to hexadecimal, first divide up the bits into chunks of four, from right to left. If the leftmost chunk doesn’t have four bits, you can pad with leading zeros. Then, substitute the corresponding hexadecimal values:
1111011001 -> 11 1101 1001 -> 0011 1101 1001 ^ padding 0011 1101 1001 3 D 9 -> 0x3D9
4.2.2. Converting to Decimal
Fortunately, converting values to decimal is what we’ve been doing throughout previous sections of this chapter. Given a number in any base B, labeling the digits from right to left as d0, d1, d2, etc. enables a general formula for converting values to decimal:
4.2.3. Converting from Decimal
Converting from decimal to other systems requires a little more work. Informally, the goal is to do the reverse of the previous formula: determine the value of each digit such that, based on the position of the digit, adding each term results in the source decimal number. It may help to think about each digit in the target base system in the same way that we described the places (the "ones" place, the "tens" place, etc.) for decimal. For example, consider converting from decimal to hexadecimal. Each digit of a hexadecimal number corresponds to an increasingly large power of 16, and Table 16 lists the first few powers.
164 | 163 | 162 | 161 | 160 |
---|---|---|---|---|
65536 |
4096 |
256 |
16 |
1 |
For example, to convert 9742 to hexadecimal, consider:
-
How many multiples of 65536 fit into 9742? (In other words, what is the value of the "65536’s" place?)
The resulting hexadecimal value doesn’t need any multiples of 65536, since the value (9742) is smaller than 65536, so d4 should be set to 0. Note that by the same logic, all higher-numbered digits will also be 0, because each digit would contribute values even larger than 65536. Thus far, the result contains only:
0
d4
d3
d2
d1
d0
-
How many multiples of 4096 fit into 9742? (In other words, what is the value of the "4096’s" place?)
4096 fits into 9742 twice (2 × 4096 = 8192), so the value of d3 should be 2. Thus, d3 will contribute 8192 to the overall value, so the result must still account for 9742 - 8192 = 1550.
0
2
d4
d3
d2
d1
d0
-
How many multiples of 256 fit into 1550? (In other words, what is the value of the "256’s" place?)
256 fits into 1550 six times (6 × 256 = 1536), so the value of d2 should be 6, leaving 1550 - 1536 = 14.
0
2
6
d4
d3
d2
d1
d0
-
How many multiples of 16 fit into 14? (In other words, what is the value of the "sixteens" place?)
None, so d1 must be 0.
0
2
6
0
d4
d3
d2
d1
d0
-
Finally, how many multiples of 1 fit into 14? (In other words, what is the value of the "ones" place?)
The answer is 14, of course, which hexadecimal represents with the digit
E
.0
2
6
0
E
d4
d3
d2
d1
d0
Thus, decimal 9742 corresponds to 0x260E.
Decimal to Binary: Powers of Two
The same procedure works for binary, as well (or any other number system), provided that you use powers of the appropriate base. Table 17 lists the first few powers of two, which will help to convert the example decimal value 422 to binary.
28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
---|---|---|---|---|---|---|---|---|
256 |
128 |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
Because an individual bit is only allowed to store a 0 or 1, the question is no longer "How many multiples of each power fit within a value?" when converting to binary. Instead, ask a simpler question: "Does the next power of two fit?" For example, in converting 422:
-
256 fits into 422, so d8 should be a 1. That leaves 422 - 256 = 166.
-
128 fits into 166, so d7 should be a 1. That leaves 166 - 128 = 38.
-
64 does not fit into 38, so d6 should be a 0.
-
32 fits into 38, so d5 should be a 1. That leaves 38 - 32 = 6.
-
16 does not fit into 6, so d4 should be a 0.
-
8 does not fit into 6, so d3 should be a 0.
-
4 fits into 6, so d2 should be a 1. That leaves 6 - 4 = 2.
-
2 fits into 2, so d1 should be a 1. That leaves 2 - 2 = 0. (Note: upon reaching 0, all remaining digits will always be 0.)
-
1 does not fit into 0, so d0 should be a 0.
Thus, decimal 422 corresponds to 0b110100110.
Decimal to Binary: Repeated Division
The method we just described generally works well for students who are familiar with the relevant powers of two (e.g., for 422, the converter must recognize that it should start at d8 because 29 = 512 is too large).
An alternative method doesn’t require knowing powers of two. Instead, this method builds a binary result by checking the parity (even or odd) status of a decimal number and repeatedly dividing it by two (rounding halves down) to determine each successive bit. Note that it builds the resulting bit sequence from right to left. If the decimal value is even, the next bit should be a zero; if it’s odd, the next bit should be a one. When the division reaches zero, the conversion is complete.
For example, when converting 422:
-
422 is even, so d0 should be a 0. (This is the rightmost bit.)
-
422 / 2 = 211, which is odd, so d1 should be a 1.
-
211 / 2 = 105, which is odd, so d2 should be a 1.
-
105 / 2 = 52, which is even, so d3 should be a 0.
-
52 / 2 = 26, which is even, so d4 should be a 0.
-
26 / 2 = 13, which is odd, so d5 should be a 1.
-
13 / 2 = 6, which is even, so d6 should be a 0.
-
6 / 2 = 3, which is odd, so d7 should be a 1.
-
3 / 2 = 1, which is odd, so d8 should be a 1.
-
1 / 2 = 0, so any digit numbered nine or above will be 0, and the algorithm terminates.
As expected, this method produces the same binary sequence: 0b110100110.
4.3. Signed Binary Integers
So far, we’ve limited the discussion of binary numbers to unsigned (strictly non-negative) integers. This section presents an alternative interpretation of binary that incorporates negative numbers. Given that a variable has finite storage space, a signed binary encoding must distinguish between negative values, zero, and positive values. Manipulating signed numbers additionally requires a procedure for negating a number.
A signed binary encoding must divide bit sequences between negative and non-negative values. In practice, systems designers build general-purpose systems, so a 50% / 50% split is a good middle-of-the-road choice. Therefore, the signed number encodings that this chapter presents represent an equal number of negative and non-negative values.
Non-Negative versus Positive
Note that there’s a subtle but important difference between non-negative and positive. The set of strictly positive values excludes zero, whereas the non-negative set includes zero. Even after dividing the available bit sequences 50% / 50% between negative and non-negative values, one of the non-negative values must still be reserved for zero. Thus, with a fixed number of bits, a number system may end up representing more negative values than positive values (e.g., in the two’s complement system). |
Signed number encodings use one bit to distinguish between the sets of negative numbers and non-negative numbers. By convention, the left-most bit indicates whether a number is negative (1) or non-negative (0). This leftmost bit is known as the high-order bit or the most significant bit.
This chapter presents two potential signed binary encodings — signed magnitude and two’s complement. Even though only one of these encodings (two’s complement) is still used in practice, comparing them will help to illustrate their important characteristics.
4.3.1. Signed Magnitude
The signed magnitude representation treats the high-order bit exclusively as a sign bit. That is, whether the high-order bit is a 0 or a 1 does not affect the absolute value of the number, it only determines whether the value is positive (high-order bit 0) or negative (high-order bit 1). Compared to two’s complement, signed magnitude makes the decimal conversion and negation procedures relatively straightforward:
-
To compute a decimal value for an N-bit signed magnitude sequence, compute the value of digits d0 through dN-2 using the familiar unsigned method. Then, check the most significant bit, dN-1: if it’s 1, the value is negative; otherwise it isn’t.
-
To negate a value, simply flip the most significant bit to change its sign.
Negation Misconception
Signed magnitude is presented purely for pedagogical purposes. Although it was used by some machines in the past (e.g., IBM’s 7090 in the 1960s), no modern systems use signed magnitude to represent integers (although a similar mechanism is part of the standard for storing floating-point values). Unless you’re explicitly asked to consider signed magnitude, you should not assume that flipping the first bit of a binary number will negate that number’s value on a modern system. |
Figure 33 shows how four-bit signed magnitude sequences correspond to decimal values. At first glance, signed magnitude might seem attractive due to its simplicity. Unfortunately, it suffers from two major drawbacks that make it unappealing. The first is that it presents two representations of zero. For example, with four bits, signed magnitude represents both zero (0b0000) and negative zero (0b1000). Consequently, it poses a challenge to hardware designers because the hardware will need to account for two possible binary sequences that are numerically equal despite having different bit values. The hardware designer’s job is much easier with just one way of representing such an important number.
The other drawback of signed magnitude is that it exhibits an inconvenient discontinuity between negative values and zero. While we’ll cover overflow in more detail later, adding 1 to the four-bit sequence 0b1111 "rolls over" back to 0b0000. With signed magnitude, this effect means 0b1111 (-7) + 1 might be mistaken for 0 rather than the expected -6. This problem is solvable, but the solution again complicates the design of the hardware, essentially turning any transition between negative and non-negative integers into a special case that requires extra care.
For these reasons, signed magnitude has largely disappeared in practice, and two’s complement reigns supreme.
4.3.2. Two’s Complement
Two’s complement encoding solves signed magnitude’s problems in an elegant way. Like signed magnitude, the high-order bit of a two’s complement number indicates whether or not the value should be interpreted as negative. In contrast though, the high-order bit also affects the value of the number. So, how can it do both?
Computing a decimal value for an N-bit two’s complement number is similar to the familiar unsigned method, except the high-order bit’s contribution to the overall value is negated. That is, for an N-bit two’s complement sequence, instead of the first bit contributing dN-1 × 2N-1 to the sum, it contributes -dN-1 × 2N-1 (note the negative sign). Therefore, if the most significant bit is a 1, the overall value will be negative because that first bit contributes the largest absolute value to the sum. Otherwise, the first bit contributes nothing to the sum, and the result is non-negative. The full formula is:
Figure 34 illustrates the layout of four-bit sequences in two’s complement. This definition encodes just one representation of zero — a sequence of bits that are all 0’s. With only a single zero sequence, two’s complement represents one more negative value than positive. Using four-bit sequences as an example, two’s complement represents a minimum value of 0b1000 (-8), but a maximum value of only 0b0111 (7). Fortunately, this quirk doesn’t hinder hardware design and rarely causes problems for applications.
Compared to signed magnitude, two’s complement also simplifies the transition between negative numbers and zero. Regardless of the number of bits used to store it, a two’s complement number consisting of all ones will always hold the value -1. Adding 1 to a bit sequence of all 1’s "rolls over" to zero, which makes two’s complement convenient, since -1 + 1 should produce zero.
Negation
Negating a two’s complement number is slightly trickier than negating a signed magnitude value. To negate an N-bit value, determine its complement with respect to 2N (this is where the encoding’s name comes from). In other words, to negate an N-bit value X, find a bit sequence Y (X's complement) such that X + Y = 2N.
Fortunately, there’s a quick shortcut for negating a two’s complement number in practice: flip all the bits and add one. For example, to negate the eight-bit value 13, first determine the binary value of 13. Because 13 is the sum of 8, 4, and 1, set the bits in positions 3, 2, and 0:
00001101 (decimal 13)
Next, "flip the bits" (change all zeros to ones, and vice versa):
11110010
Finally, adding one yields 0b11110011. Sure enough, applying the formula for interpreting a two’s complement bit sequence shows that the value is -13:
If you’re curious as to why this seemingly magical shortcut works, consider the eight-bit negation of 13 more formally. To find 13’s complement, solve 0b00001101 (13) + Y = 0b100000000 (28, which requires an extra bit to represent). The equation can be rearranged as Y = 0b100000000 - 0b00001101. This is clearly now a subtraction problem:
100000000 (256) - 00001101 (13)
While such a subtraction might seem daunting, we can express it in a way that’s easier to compute as (0b011111111 + 1) - 0b00001101. Note that this change simply expresses 28 (256) as (255 + 1). After that change, the arithmetic looks like:
011111111 (255) + 00000001 (1) - 00001101 (13)
As it turns out, for any bit value b, 1 - b is equivalent to "flipping" that bit. Thus, the entire subtraction in the preceding example can be reduced to just flipping all the bits of the lower number. All that’s left is to add the remaining +1 from expressing 256 as 255 + 1. Putting it all together, we can simply flip a value’s bits and add one to compute its complement!
C Programming With Signed versus Unsigned Integers
In addition to allocating space, declaring variables in C also tells the
compiler how you’d like the variable to be interpreted. When you declare an
The distinction is also relevant to C in other places, like the
Even though this code passes |
Sign Extension
Occasionally, you may find yourself wanting to perform an arithmetic operation
on two numbers that are stored using different numbers of bits. For example,
in C you may want to add a 32-bit int
and a 16-bit short
. In such cases,
the smaller number needs to be sign extended, which is a fancy way of saying
that its most significant bit gets repeated as many times as necessary to
extend the length of the bit sequence to the target length. Though the
compiler will take care of wrangling the bits for you in C, it’s still helpful
to understand how the process works.
For example, to extend the four-bit sequence 0b0110 (6) to an eight-bit sequence, take the high-order bit (0) and prepend it four times to produce the extended value: 0b00000110 (still 6). Extending 0b1011 (-5) to an eight-bit sequence similarly takes the high-order bit (this time, 1) and prepends it four times to the resulting extended value: 0b11111011 (still -5). To verify the correctness, consider how the value changes after adding each new bit:
0b1011 = -8 + 0 + 2 + 1 = -5 0b11011 = -16 + 8 + 0 + 2 + 1 = -5 0b111011 = -32 + 16 + 8 + 0 + 2 + 1 = -5 0b1111011 = -64 + 32 + 16 + 8 + 0 + 2 + 1 = -5 0b11111011 = -128 + 64 + 32 + 16 + 8 + 0 + 2 + 1 = -5
As evidenced by the examples, numbers that are non-negative (high-order bit of zero) remain non-negative after adding zeros to the front. Likewise, negatives (high-order bit of one) remain negative after prepending ones to extended values.
Unsigned Zero Extension
For an unsigned value (e.g., a C variable explicitly declared with an |
4.4. Binary Integer Arithmetic
Having presented binary representations for unsigned and signed integers, we’re ready to use them in arithmetic operations. Fortunately, due to their encoding, it does not matter to the arithmetic procedures whether we choose to interpret the operands or result as signed or unsigned. This observation is great news for hardware designers because it allows them to build one set of hardware components that can be shared for both unsigned and signed operations. The hardware chapter describes the circuitry for performing arithmetic in more detail.
Luckily, the same pencil-and-paper algorithms you learned in grade school for performing arithmetic on decimal numbers also work for binary numbers. Though the hardware might not compute them in exactly the same way, you should at least be able to make sense of the calculations.
4.4.1. Addition
Recall that in a binary number, each digit holds only 0 or 1. Consequently, when adding two bits that are both 1, the result carries out to the next digit (e.g., 1 + 1 = 0b10, which requires two bits to represent). In practice, programs add multibit variables, where the result of one digit’s carry out influences the next digit by carrying in.
In general, when summing digits from two binary numbers (A and B), there are eight possible outcomes depending on the values of DigitA, DigitB, and a Carryin from the previous digit. Table 18 enumerates the eight possibilities that may result from adding one pair of bits. The Carryin column refers to a carry feeding into the sum from the previous digit, and the Carryout column indicates whether adding the pair of digits will feed a carry out to the next digit.
Inputs |
Outputs |
||||
---|---|---|---|---|---|
DigitA |
DigitB |
Carryin |
Result (Sum) |
Carryout |
|
0 |
0 |
0 |
0 |
0 |
|
0 |
0 |
1 |
1 |
0 |
|
0 |
1 |
0 |
1 |
0 |
|
0 |
1 |
1 |
0 |
1 |
|
1 |
0 |
0 |
1 |
0 |
|
1 |
0 |
1 |
0 |
1 |
|
1 |
1 |
0 |
0 |
1 |
|
1 |
1 |
1 |
1 |
1 |
Consider the addition of two four-bit binary numbers. Start by lining up the numbers so that their corresponding digits match vertically, and then sum each corresponding digit in order, from the low-order digit (d0) to the high-order digit (d3). For example, adding 0b0010 + 0b1011:
Problem Setup | Worked Example |
---|---|
0010 + 1011 |
1 <- Carry the 1 from digit 1 into digit 2 0010 + 1011 Result: 1101 |
The example shows a 1 carrying from d1 into d2. This situation is analogous to adding two decimal digits that sum to a value larger than 9. For example, when adding 5 + 8 = 13, the resulting ones place contains 3, and a 1 carries into the tens place.
The first operand (0b0010) has a leading 0, so it represents 2 for both two’s complement and unsigned interpretations. The second operand (0b1011) represents -5 if interpreted as a signed two’s complement value. Otherwise, it represents 11 if interpreted as an unsigned value. Fortunately, the interpretation of the operands doesn’t affect the steps for computing the result. That is, the computed result (0b1101) represents either 13 (unsigned: 2 + 11) or -3 (signed: 2 + -5), both of which are correct depending on the interpretation of the second operand.
More generally, a four-bit sequence represents values in the range [0, 15] when interpreted as unsigned. When interpreted as signed, it represents the range [-8, 7]. In the previous example, the result fits within the representable range either way, but we may not always be so lucky. For example, when adding 0b1100 (unsigned 12) + 0b0111 (7), the answer should be 19, but four bits can’t represent 19:
Problem Setup | Worked Example |
---|---|
1100 + 0111 |
11 <- Carry a 1 from: digit 2 into digit 3, and 1100 digit 3 out of the overall value + 0111 Result: 0011 Carry out: 1 |
Note that the addition in this example carries a 1 from the most significant bit, a condition known as a carry out for the overall arithmetic operation. In this example, the carry out suggests that the arithmetic output needs an extra bit to store the intended result. However, when performing four-bit arithmetic, there’s nowhere to put the carry out’s extra bit, so the hardware simply drops or truncates it, leaving 0b0011 as the result. Of course, if the goal was to add 12 + 7, a result of 3 is likely to be surprising. The surprise is a consequence of overflow. We’ll explore how to detect overflow and why it produces the results that it does in a later section.
Multibit adder circuits also support a carry in that behaves like a carry into the rightmost digit (that is, it serves as the Carryin input for d0). The carry in isn’t useful when performing addition — it’s implicitly set to 0, which is why it doesn’t appear in the preceding example. However, the carry in does become relevant for other operations that use adder circuitry, most notably subtraction. |
4.4.2. Subtraction
Subtraction combines two familiar operations: negation and addition. In other words, subtracting 7 - 3 is equivalent to expressing the operation as 7 + (-3). This portrayal of subtraction aligns well with how the hardware behaves — a CPU already contains circuits for negation and addition, so it makes sense to reuse those circuits rather than build an entirely new subtractor. Recall that a simple procedure to negate a binary number is to flip the bits and add one.
Consider the example 0b0111 (7) - 0b0011 (3), which starts by sending the 3 to a bit-flipping circuit. To get the "plus one," it takes advantage of the carry in to the adder circuit. That is, rather than carrying from one digit to another, subtraction feeds a carry in to d0 of the adder. Setting the carry in to 1 increases the resulting "ones place" value by one, which is exactly what it needs to get the "plus one" part of the negation. Putting it all together, the example would look like the following:
Problem Setup | Converted to Addition | Worked Example |
---|---|---|
0111 - 0011 |
1 (carry in) 0111 + 1100 (bits flipped) |
1 (carry in) 0111 + 1100 (bits flipped) Result: 0100 Carry out: 1 |
While the full result of the addition carries into an extra digit, the truncated result (0b0100) represents the expected result (4). Unlike the previous addition example, a carry out from the high-order bit is not necessarily indicative of an overflow problem for subtraction.
Performing subtraction as negation followed by addition also works when subtracting a negative value. For example, 7 - (-3) produces 10:
Problem Setup | Converted to Addition | Worked Example |
---|---|---|
0111 - 1101 |
1 (carry in) 0111 + 0010 (bits flipped) |
1 (carry in) 0111 + 0010 (bits flipped) Result: 1010 Carry out: 0 |
We further explore the implications of carrying out (or not) in the overflow section.
4.4.3. Multiplication and Division
This section briefly describes binary multiplication and division with integers. In particular, it shows methods for computing results by hand and does not reflect the behavior of modern hardware. This description is not meant to be comprehensive, as the remainder of the chapter focuses primarily on addition and subtraction.
Multiplication
To multiply binary numbers, use the common pencil-and-paper strategy of considering one digit at a time and adding the results. For example, multiplying 0b0101 (5) and 0b0011 (3) is equivalent to summing:
-
the result of multiplying d0 by 0b101 (5): 0b0101 (5)
-
the result of multiplying d1 by 0b101 (5) and shifting the result to the left by one digit: 0b1010 (10).
0101 0101 0101 x 0011 = x 1 + x 10 = 101 + 1010 = 1111 (15)
(Integer) Division
Unlike the other operations just described, division has the potential to produce a non-integral result. The primary thing to keep in mind when dividing integers is that in most languages (for example, C, Python 2, and Java) the fractional portion of the result gets truncated. Otherwise, binary division uses the same long form method that most students learn in grade school. For example, here’s how computing 11 / 3 produces an integer result of 3:
____ 11 |1011 00__ 11 (3) doesn't fit into 1 (1) or 10 (2), 11 |1011 so the first two digits of the result are 00. 001_ 11 (3) fits into 101 (5) once. 11 |1011 101 101 (5) - 11 (3) leaves 10 (2). - 11 10 0011 11 |1011 11 (3) fits into 101 (5) once again. 101
At this point, the arithmetic has produced the expected integer result, 0011 (3), and the hardware truncates any fractional parts. If you’re interested in determining the integral remainder, use the modulus operator (%); for example, 11 % 3 = 2.
4.5. Integer Overflow
Although the number of integers is mathematically infinite, in practice, numeric types in a computer’s memory occupy a fixed number of bits. As we’ve hinted throughout this chapter, using a fixed number of bits implies that programs might be unable to represent values that they’d like to store. For example, the discussion of addition showed that adding two legitimate values can produce a result that can’t be represented. A computation that lacks the storage to represent its result has overflowed.
4.5.1. Odometer Analogy
To characterize overflow, consider an example from the non-computing world: a car’s odometer. An odometer counts the number of miles a car has driven, and whether it’s digital or analog, it can display only so many (base 10) digits. If the car drives more miles than the odometer can represent, the odometer "rolls over" back to zero, since the true value can’t be expressed. For example, with a standard six-digit odometer, the maximum value it represents is 999999. Driving just one additional mile should display 1000000, but like the overflowing addition example, the 1 carries out from the six available digits, leaving only 000000.
For simplicity, let’s continue analyzing an odometer that’s limited to just one decimal digit. That is, the odometer represents the range [0, 9], so after every 10 miles the odometer resets back to zero. Illustrating the odometer’s range visually, it might look like Figure 35.
Because a one-digit odometer rolls over upon reaching 10, drawing a circular shape emphasizes the discontinuity at the top of the circle (and only at the top). Specifically, by adding one to any value other than nine, the result lands on the expected value. On the other hand, adding one to nine jumps to a value that doesn’t naturally follow it (zero). More generally, when performing any arithmetic that crosses the discontinuity between nine and zero, the computation will overflow. For example, consider adding 8 + 4, as in Figure 36.
Here, the sum yields 2 instead of the expected 12. Note that many other values added to 8 (for example, 8 + 14) would also land on two, with the only difference being that the computations would take additional trips around the circle. Consequently, it doesn’t matter whether the car drives 2, 12, or 152 miles — in the end, the odometer will read 2 regardless.
Any device that behaves like an odometer performs modular arithmetic. In this case, all arithmetic is modular with respect to a modulus of 10, since one decimal digit represents only 10 values. Therefore, given any number of miles traveled, we can compute what the odometer will read by dividing the distance by 10 and taking the remainder as the result. If the odometer had two decimal digits instead of one, the modulus would change to 100, since it could represent a larger range of values: [0, 99]. Similarly, clocks perform modular arithmetic with an hour modulus of 12.
4.5.2. Binary Integer Overflow
Having seen a familiar form of overflow, let’s turn to binary number encodings. Recall that N bits of storage represent 2N unique bit sequences and that those sequences can be interpreted in different ways (as unsigned or signed). Some operations that yield correct results under one interpretation may exhibit overflow according to the other, so the hardware needs to recognize overflow differently for each.
For example, suppose that a machine is using four-bit sequences to compute 0b0010 (2) - 0b0101 (5). Running this operation through the subtraction procedure produces a binary result of 0b1101. Interpreting this result as a signed value produces -3 (-8 + 4 + 1), the expected result for 2 - 5 without overflow. Alternatively, interpreting it as an unsigned value yields 13 (8 + 4 + 1), which is incorrect and clearly indicative of overflow. Scrutinizing this example further, it instinctively makes some sense — the result should be negative, and a signed interpretation allows for negatives, whereas unsigned does not.
Unsigned Overflow
Unsigned numbers behave similarly to the decimal odometer examples given that both represent only non-negative values. N bits represent unsigned values in the range [0, 2N - 1], making all arithmetic modular with respect to 2N. Figure 37 illustrates an arrangement of the unsigned interpretations of four-bit sequences into a modular space.
Given that unsigned interpretations can’t hold negative values, the discontinuity again sits between the maximum value and zero. Therefore, unsigned overflow results from any operation that crosses the divide between 2N-1 and 0. Stated more plainly, if performing addition (which should make the result larger) produces a smaller result, the addition caused unsigned overflow. Symmetrically, if performing subtraction (which should make the result smaller) produces a larger result, the subtraction caused unsigned overflow.
As a shortcut for detecting unsigned overflow for addition and subtraction, recall the carry out and carry in bits of those operations. A carry out is a carry from the most significant bit in the result of the computation. When set, a carry in increments the value of the result by carrying one into the least significant bit of the arithmetic operation. The carry in is only set to 1 for subtraction as part of the negation procedure.
The shortcut for unsigned arithmetic is: the carry out must match the carry in, otherwise the operation causes overflow. Intuitively, this shortcut works because:
-
For addition (carry in = 0), the result should be larger than (or equal to) the first operand. However, if the sum requires an extra bit of storage (carry out = 1), truncating that extra bit from the sum yields a smaller result (overflow). For example, in the unsigned four-bit number space, adding 0b1100 (12) + 0b1101 (13) requires five bits to store the result 0b11001 (25). When truncated to only four bits, the result represents 0b1001 (9), which is smaller than the operands (therefore, overflow).
-
For subtraction (carry in = 1), the result should be smaller than (or equal to) the first operand. Because subtraction executes as a combination of addition and negation, the addition subproblem should produce a smaller result. The only way addition can end up with a smaller value is by truncating its sum (carry out = 1). If it doesn’t require truncation (carry out = 0), the subtraction yields a larger result (overflow).
Let’s examine two examples of four-bit subtraction: one that overflows, and one that doesn’t. First, consider 0b0111 (7) - 0b1001 (9). The subtraction procedure treats this computation as:
Problem Setup | Converted to Addition | Worked Example |
---|---|---|
0111 - 1001 |
1 (carry in) 0111 + 0110 (bits flipped) |
1 (carry in) 0111 + 0110 (bits flipped) Result: 1110 Carry out: 0 |
The computation did not carry out of d3, so no truncation occurs and the carry in (1) fails to match the carry out (0). The result, 0b1110 (14), is larger than either operand and thus clearly incorrect for 7 - 9 (overflow).
Next, consider 0b0111 (7) - 0b0101 (5). The subtraction procedure treats this computation as:
Problem Setup | Converted to Addition | Worked Example |
---|---|---|
0111 - 0101 |
1 (carry in) 0111 + 1010 (bits flipped) |
1 (carry in) 0111 + 1010 (bits flipped) Result: 0010 Carry out: 1 |
The computation carries out a bit to d4, causing the carry in (1) to match the carry out (1). The truncated result, 0b0010 (2), correctly represents the expected outcome of the subtraction operation (no overflow).
Signed Overflow
The same intuition behind overflow applies to signed binary interpretations: there exists a discontinuity in the modular number space. However, because a signed interpretation allows for negatives, the discontinuity doesn’t occur around 0. Recall that two’s complement "rolls over" cleanly from -1 (0b1111…111) to 0 (0b0000…000). Thus, the discontinuity exists at the other end of the number space, where the largest positive value and smallest negative value meet.
Figure 38 shows an arrangement of the signed interpretations of four-bit sequences into a modular space. Note that half the values are negative, the other half are non-negative, and that the discontinuity lies at the min/max divide between them.
When performing signed arithmetic, it’s always safe to generate a result that moves closer to zero. That is, any operation that reduces the absolute value of the result cannot overflow, because the overflow discontinuity resides where the magnitude of the representable values is the largest.
Consequently, systems detect overflow in signed addition and subtraction by comparing the most significant bit of the operands with the most significant bit of the result. For subtraction, first rearrange the arithmetic in terms of addition (e.g., rewrite 5 - 2 as 5 + -2).
-
If the addition’s operands have different high-order bit values (i.e., one operand is negative and the other is positive), there can be no signed overflow, because the absolute value of the result must be smaller than (or equal to) either operand. The result is moving toward zero.
-
If the addition’s operands have the same high-order bit value (i.e., both are positive or both are negative), a correct result must also have the same high-order bit value. Thus, when adding two operands with the same sign, a signed overflow occurs if the result’s sign differs from that of the operands.
Consider the following four-bit signed binary examples:
-
5 - 4 is equivalent to 5 + -4. The first operand (5) is positive, whereas the second (-4) is negative, so the result must be moving toward zero where no overflow is possible.
-
4 + 2 (both positive) yields 6 (also positive), so no overflow occurs.
-
-5 - 1 is equivalent to -5 + -1 (both negative) and yields -6 (also negative), so no overflow occurs.
-
4 + 5 (both positive) yields -7 (negative). Because the operands have the same sign and it doesn’t match the result’s sign, this operation overflows.
-
-3 - 8 is equivalent to -3 + -8 (both negative) and yields 5 (positive). Because the operands have the same sign and it doesn’t match the result’s sign, this operation overflows.
4.5.3. Overflow Summary
In general, integer overflow occurs when an arithmetic operation moves between the minimum and maximum values that its result can represent. If you’re ever in doubt about the rules for signed versus unsigned overflow, consider the minimum and maximum values of an N-bit sequence:
-
The minimum unsigned value is 0 (because unsigned encodings can’t represent negative numbers) and the maximum unsigned value is 2N-1 (because one bit sequence is reserved for zero). Therefore the discontinuity is between 2N-1 and 0.
-
The minimum signed value is -2N-1 (because half of the sequences are reserved for negative values) and the maximum is 2N-1-1 (because in the other half, one value is reserved for zero). Therefore, the discontinuity is between 2N-1-1 and -2N-1.
4.5.4. Overflow Consequences
While you may not run into integer overflow frequently, overflows have the potential to break programs in notable (and potentially devastating) ways.
For example, in 2014, PSY’s popular Gangnam Style music video threatened to overflow the 32-bit counter that YouTube used to track video hits. As a result, YouTube switched to using a 64-bit counter.
Another relatively harmless example shows up in the 1980 arcade game Pac-Man. The game’s developers used an unsigned eight-bit value to track the player’s progress through the game’s levels. As a result, if an expert player makes it beyond level 255 (the maximum value of an eight-bit unsigned integer), half of the board ends up glitching significantly, as shown in Figure 39.
A much more tragic example of overflow appears in the history of the Therac-25 radiation therapy machine of the mid 1980s. The Therac-25 suffered from several design problems, including one that incremented a truth flag variable rather than setting it to a constant. After enough uses, the flag overflowed, causing it to erroneously roll over to zero (false) and bypass safety mechanisms. The Therac-25 ultimately caused serious harm to (and in some cases killed) six patients.
4.6. Bitwise Operators
In addition to the standard arithmetic operations described earlier, CPUs also support operations that are uncommon outside of binary. These bitwise operators directly apply the behavior of logic gates to bit sequences, making them straightforward to implement efficiently in hardware. Unlike addition and subtraction, which programmers typically use to manipulate a variable’s numerical interpretation, programmers commonly use bitwise operators to modify specific bits in a variable. For example, a program might encode a certain bit position in a variable to hold a true/false meaning, and bitwise operations allow the program to manipulate the variable’s individual bits to change that specific bit.
4.6.1. Bitwise AND
The bitwise AND operator (&
) evaluates two input bit sequences. For each
digit of the inputs, it outputs a 1 in the corresponding position of the output
if both inputs are 1 in that position. Otherwise, it outputs a 0 for the
digit. Table 19 shows the truth table for the bitwise AND of two
values, A and B.
A | B | A & B |
---|---|---|
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
For example, to bitwise AND 0b011010 with 0b110110, start by lining up the two sequences. Checking vertically through each digit, set the result of the column to 1 if both digits are 1. Otherwise, set the result of the column to 0:
011010 AND 110110 Only digits 1 and 4 are 1's in BOTH inputs, so Result: 010010 those are the only digits set to 1 in the output.
To perform a bitwise AND in C, place C’s bitwise AND operator (&
) between two
operand variables. Here’s the same example again, performed in C:
int x = 26;
int y = 54;
printf("Result: %d\n", x & y); // Prints 18
Bitwise Operations versus Logical Truth Operations
Be careful not to conflate bitwise operators with logical truth operators. Despite having similar names (AND, OR, NOT, etc.), the two are not the same:
Note that C often uses similar (but slightly different) operators to distinguish
between the two. For example, you can indicate bitwise AND and bitwise OR
using a single |
4.6.2. Bitwise OR
The bitwise OR operator (|
) behaves like the bitwise AND operator except that
it outputs a 1 for a digit if either or both of the inputs is 1 in the
corresponding position. Otherwise, it outputs a 0 for the digit.
Table 20 shows the truth table for the bitwise OR of two values, A and
B.
A | B | A | B |
---|---|---|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
For example, to bitwise OR 0b011010 with 0b110110, start by lining up the two sequences. Checking vertically through each digit, set the result of the column to 1 if either digit is 1:
011010 OR 110110 Only digit 0 contains a 0 in both inputs, so it's Result: 111110 the only digit not set to 1 in the result.
To perform a bitwise OR in C, place C’s bitwise OR operator (|
) between two
operands. Here’s the same example again, performed in C:
int x = 26;
int y = 54;
printf("Result: %d\n", x | y); // Prints 62
4.6.3. Bitwise XOR (Exclusive OR)
The bitwise XOR operator (^
) behaves like the bitwise OR operator except that
it outputs a 1 for a digit only if exactly one (but not both) of the inputs
is 1 in the corresponding position. Otherwise, it outputs a 0 for the digit.
Table 21 shows the truth table for the bitwise XOR of two values, A and
B.
A | B | A ^ B |
---|---|---|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
For example, to bitwise XOR 0b011010 with 0b110110, start by lining up the two sequences. Checking vertically through each digit, set the result of the column to 1 if only one digit is 1:
011010 XOR 110110 Digits 2, 3, and 6 contain a 1 in exactly one of Result: 101100 the two inputs.
To perform a bitwise XOR in C, place C’s bitwise XOR operator (^
) between two
operands. Here’s the same example again, performed in C:
int x = 26;
int y = 54;
printf("Result: %d\n", x ^ y); // Prints 44
4.6.4. Bitwise NOT
The bitwise NOT operator (~
) operates on just one operand. For each bit in
the sequence, it simply flips the bit such that a zero becomes a one or vice
versa. Table 22 shows the truth table for the bitwise NOT operator.
A | ~ A |
---|---|
0 |
1 |
1 |
0 |
For example, to bitwise NOT 0b011010, invert the value of each bit:
NOT 011010 Result: 100101
To perform a bitwise NOT in C, place a tilde character (~
) in front of an
operand. Here’s the same example again, performed in C:
int x = 26;
printf("Result: %d\n", ~x); // Prints -27
Bitwise NOT vs. Negation
Note that all modern systems represent integers using two’s complement, so bitwise NOT isn’t quite the same as negation. Bitwise NOT only flips the bits and doesn’t add one. |
4.6.5. Bit Shifting
Another important bitwise operation involves shifting the position of an
operand’s bits either to the left (<<
) or to the right (>>
). Both the left
and right shifting operators take two operands: the bit sequence to shift and
the number of places it should be shifted.
Shifting Left
Shifting a sequence to the left by N places moves each of its bits to the left N times, appending new zeros to the right side of the sequence. For example, shifting the eight-bit sequence 0b00101101 to the left by two produces 0b10110100. The two zeros at the right are appended to end of the sequence, since the result still needs to be an eight-bit sequence.
In the absence of overflow, shifting to the left increases the value of the result because bits move toward digits that contribute larger powers of two to the value of the number. However, with a fixed number of bits, any bits that shift into positions beyond the maximum capacity of the number get truncated. For example, shifting the eight-bit sequence 0b11110101 (unsigned interpretation 245) to the left by one produces 0b11101010 (unsigned interpretation 234). Here, the truncation of the high-order bit that shifted out makes the result smaller.
To perform a left bit shift in C, place two less-than characters (<<
) between
a value and the number of places to shift that value:
int x = 13; // 13 is 0b00001101
printf("Result: %d\n", x << 3); // Prints 104 (0b01101000)
Shifting Right
Shifting to the right is similar to left shifting — any bits that are shifted out of a variable’s capacity (e.g., off the end to the right) disappear due to truncation. However, right shifting introduces an additional consideration: the new bits prepended to the left side of the result may need to be either all zeros or all ones depending on the type of the variable being shifted and its high-order bit value. Conceptually, the choice to prepend zeros or ones resembles that of sign extension. Thus, there exist two distinct variants of right shifting:
-
A logical right shift always prepends zeros to the high-order bits of the result. Logical shifting is used to shift unsigned variables, since a leading 1 in the most significant bit of an unsigned value isn’t intended to mean that the value is negative. For example, shifting 0b10110011 to the right by two using a logical shift yields 0b00101100.
-
An arithmetic right shift prepends a copy of the shifted value’s most significant bit into each of the new bit positions. Arithmetic shifting applies to signed variables, for which it’s important to preserve the signedness of the high-order bits. For example, shifting 0b10110011 to the right by two using an arithmetic shift yields 0b11101100.
Fortunately, when programming in C, you don’t typically need to worry about the
distinction if you’ve declared your variables properly. If your program
includes a right shift operator (>>
), virtually every C compiler will
automatically perform the appropriate type of shifting according to the type of
the shifting variable. That is, if the shifting variable was declared with the
unsigned qualifier, the compiler will perform a logical shift. Otherwise, it
will perform an arithmetic shift.
C Right Shift Example Program
You can test the behavior of right shifting with a small example program like this one:
This program declares two 32-bit integers: one as an unsigned integer
( $ ./a.out 000FF000 FFFFF000 Because a leading 1 doesn’t indicate "negative" for the unsigned |
4.7. Integer Byte Order
So far, this chapter has described several schemes for encoding numbers with
bits, but it hasn’t mentioned how the values are organized in memory. For
modern systems, the smallest addressable unit of memory is a byte, which
consists of eight bits. Consequently, to store a one-byte value (e.g., a
variable of type char
) starting at address X, you don’t really have any
options — just store the byte at location X.
However, for multibyte values (e.g., variables of type short
or int
), the
hardware has more options for assigning a value’s bytes to memory addresses.
For example, consider a two-byte short
variable s
whose bytes are labeled A
(containing the high-order bits of s
) and B (containing the low-order bits of
s
). When a system is asked to store a short
like s
at address X (i.e.,
in addresses X and X+1), it must define which byte of the variable (A or B)
should occupy which address (X or X+1). Figure 40 shows the two
options for storing s
in memory.
The byte order (or endianness) of a system defines how its hardware assigns the bytes of a multibyte variable to consecutive memory addresses. Although byte order is rarely an issue for programs that only run on a single system, it might appear surprising if one of your programs attempts to print bytes one at a time or if you’re examining variables with a debugger.
For example, consider the following program:
#include <stdio.h>
int main(int argc, char **argv) {
// Initialize a four-byte integer with easily distinguishable byte values
int value = 0xAABBCCDD;
// Initialize a character pointer to the address of the integer.
char *p = (char *) &value;
// For each byte in the integer, print its memory address and value.
int i;
for (i = 0; i < sizeof(value); i++) {
printf("Address: %p, Value: %02hhX\n", p, *p);
p += 1;
}
return 0;
}
This program allocates a four-byte integer and initializes the bytes, in
order from most to least significant, to the hexadecimal values 0xAA
, 0xBB
,
0xCC
, and 0xDD
. It then prints the bytes one at a time starting from the
base address of the integer. You’d be forgiven for expecting the bytes to
print in alphabetical order. However, commonly used CPU architectures (i.e.,
x86 and most ARM hardware) print the bytes in reverse order when executing the example
program:
$ ./a.out Address: 0x7ffc0a234928, Value: DD Address: 0x7ffc0a234929, Value: CC Address: 0x7ffc0a23492a, Value: BB Address: 0x7ffc0a23492b, Value: AA
x86 CPUs store integers in a little-endian format — from the least-significant byte ("little end") to the most-significant byte in consecutive addresses. Other big-endian CPU architectures store multibyte integers in the opposite order. Figure Figure 41 depicts a four-byte integer in the (a) big-endian and (b) little-endian layouts.
The seemingly strange "endian" terminology originates from Jonathan Swift’s satirical novel Gulliver’s Travels (1726)1. In the story, Gulliver finds himself among two empires of six-inch-tall people who are fighting a war over the proper method for breaking eggs. The "big-endian" empire of Blefuscu cracks the large end of their eggs, whereas people in the "little-endian" empire of Lilliput crack the small end.
In the computing world, whether a system is big-endian or little-endian typically affects only programs that communicate across machines (e.g., over a network). When communicating data between systems, both systems must agree on the byte order for the receiver to properly interpret the value. In 1980, Danny Cohen authored a note to the Internet Engineering Task Force (IETF) titled On Holy Wars and a Plea for Peace 2. In that note, Cohen adopts Swift’s "endian" terminology and suggests that the IETF adopts a standard byte order for network transmissions. The IETF eventually adopted big-endian as the "network byte order" standard.
The C language provides two libraries that allow a program to reorder an integer’s bytes3,4 for communication purposes.
4.7.1. References
-
Jonathan Swift. Gulliver’s Travels. http://www.gutenberg.org/ebooks/829
-
Danny Cohen. On Holy Wars and a Plea for Peace. https://www.ietf.org/rfc/ien/ien137.txt
4.8. Real Numbers in Binary
While this chapter mainly focuses on binary integer representations, programmers often need to store real numbers, too. Storing real numbers is inherently difficult, and no binary encoding represents real values with perfect precision. That is, for any binary encoding of real numbers, there exist values that cannot be represented exactly. Irrational values like pi clearly can’t be represented precisely, since their representation never terminates. Given a fixed number of bits, binary encodings still can’t represent some rational values within their range.
Unlike integers, which are countably infinite, the set of real numbers is uncountable. In other words, even for a narrow range of real values (e.g., between zero and one), the set of values within that range is so large that we can’t even begin to enumerate them. Thus, real number encodings typically store only approximations of values that have been truncated to a predetermined number of bits. Given enough bits, the approximations are typically precise enough for most purposes, but be careful when writing applications that cannot tolerate rounding.
The remainder of this section briefly describes two methods for representing real numbers in binary: fixed-point, which extends the binary integer format, and floating-point, which represents a large range of values at the cost of some extra complexity.
4.8.1. Fixed-Point Representation
In a fixed-point representation, the position of a value’s binary point remains fixed and cannot be changed. Like a decimal point in a decimal number, the binary point indicates where the fractional portion of the number begins. The fixed-point encoding rules resemble the unsigned integer representation, with one major exception: the digits after the binary point represent powers of two raised to a negative value. For example, consider the eight-bit sequence 0b000101.10 in which the first six bits represent whole numbers, and the remaining two bits represent the fractional part. Figure 42 labels the digit positions and their individual interpretations.
Applying the formula for converting 0b000101.10 to decimal shows:
More generally, with two bits after the binary point, the fractional portion of a number holds one of four sequences: 00 (.00), 01 (.25), 10 (.50), or 11 (.75). Thus, two fractional bits allow a fixed-point number to represent fractional values that are precise to 0.25 (2-2). Adding a third bit increases the precision to 0.125 (2-3), and the pattern continues similarly, with N bits after the binary point enabling 2-N precision.
Because the number of bits after the binary point remains fixed, some computations with fully precise operands may produce a result that requires truncation (rounding). Consider the same eight-bit fixed-point encoding from the previous example. It precisely represents both 0.75 (0b000000.11) and 2 (0b000010.00). However, it cannot precisely represent the result of dividing 0.75 by 2: the computation should produce 0.375, but storing it would require a third bit after the binary point (0b000000.011). Truncating the rightmost 1 enables the result to fit within the specified format, but it yields a rounded result of 0.75 / 2 = 0.25. In this example, the rounding is egregious due to the small number of bits involved, but even longer bit sequences will require truncation at some point.
Even worse, rounding errors compound over the course of intermediate calculations, and in some cases the result of a sequence of computations might vary according to the order in which they’re performed. For example, consider two arithmetic sequences under the same eight-bit fixed-point encoding described earlier:
-
(0.75 / 2) * 3 = 0.75
-
(0.75 * 3) / 2 = 1.00
Note that the only difference between the two is the order of the multiplication and division operations. If no rounding were necessary, both computations should produce the same result (1.125). However, due to truncation occurring at different locations in the arithmetic, they produce different results:
-
Proceeding from left to right, the intermediate result (
0.75 / 2
) gets rounded to 0.25 and ultimately produces 0.75 when multiplied by 3. -
Proceeding from left to right, the intermediate computation (
0.75 * 3
) precisely yields 2.25 without any rounding. Dividing 2.25 by 2 rounds to a final result of 1.
In this example, just one additional bit for the 2-3 place allows the example to succeed with full precision, but the fixed-point position we chose only allowed for two bits after the binary point. All the while, the high-order bits of the operands went entirely unused (digits d2 through d5 were never set to 1). At the cost of extra complexity, an alternative representation (floating-point) allows the full range of bits to contribute to a value regardless of the split between whole and fractional parts.
4.8.2. Floating-Point Representation
In a floating-point representation, a value’s binary point is not fixed into a predefined position. That is, the interpretation of a binary sequence must encode how it’s representing the split between the whole and fractional parts of a value. While the position of the binary point could be encoded in many possible ways, this section focuses on just one, the Institute of Electrical and Electronics Engineers (IEEE) standard 754. Almost all modern hardware follows the IEEE 754 standard to represent floating-point values.
Figure 43 illustrates the IEEE 754 interpretation of a 32-bit
floating-point number (C’s float
type). The standard partitions the bits
into three regions:
-
The low-order 23 bits (digits d22 through d0) represent the significand (sometimes called the mantissa). As the largest region of bits, the significand serves as the foundation for the value, which ultimately gets altered by multiplying it according to the other bit regions. When interpreting the significand, its value implicitly follows a 1 and binary point. The fractional portion behaves like the fixed-point representation described in the previous section.
For example, if the bits of the significand contain 0b110000…0000, the first bit represents 0.5 (1 × 2-1), the second bit represents 0.25 (1 × 2-2), and all the remaining bits are zeros, so they don’t affect the value. Thus, the significand contributes 1.(0.5 + 0.25), or 1.75.
-
The next eight bits (digits d30 through d23) represent the exponent, which scales the significand’s value to provide a wide representable range. The significand gets multiplied by 2exponent - 127, where the 127 is a bias that enables the float to represent both very large and very small values.
-
The final high-order bit (digit d31) represents the sign bit, which encodes whether the value is positive (0) or negative (1).
As an example, consider decoding the bit sequence 0b11000001101101000000000000000000. The significand portion is 01101000000000000000000, which represents 2-2 + 2-3 + 2-5 = 0.40625, so the signifcand region contributes 1.40625. The exponent is 10000011, which represents the decimal value 131, so the exponent contributes a factor of 2(131-127) (16). Finally, the sign bit is 1, so the sequence represents a negative value. Putting it all together, the bit sequence represents:
1.40625 × 16 × -1 = -22.5
While clearly more complex than the fixed-point scheme described earlier, the IEEE floating-point standard provides additional flexibility for representing a wide range of values. Despite the flexibility, a floating-point format with a constant number of bits still can’t precisely represent every possible value. That is, like fixed-point, rounding problems similarly affect floating-point encodings.
4.8.3. Rounding Consequences
While rounding isn’t likely to ruin most of the programs you write, real number rounding errors have occasionally caused some high-profile system failures. During the Gulf War in 1991, a rounding error caused an American Patriot missile battery to fail to intercept an Iraqi missile. The missile killed 28 soldiers and left many others wounded. In 1996, the European Space Agency’s first launch of the Ariane 5 rocket exploded 39 seconds after taking off. The rocket, which borrowed much of its code from the Ariane 4, triggered an overflow when attempting to convert a floating-point value into an integer value.
4.9. Summary
This chapter examined how modern computers represent information using bits and bytes. An important takeaway is that a computer’s memory stores all information as binary 0’s and 1’s — it’s up to programs or the people running them to interpret the meaning of those bits. This chapter primarily focused on integer representations, beginning with unsigned (non-negative) integers before considering signed integers.
Computer hardware supports a variety of operations on integers, including the familiar addition, subtraction, multiplication, and division. Systems also provide bitwise operations like bitwise AND, OR, NOT, and shifting. When performing any operation, consider the number of bits used to represent the operands and result. If the storage space allocated to the result isn’t large enough, an overflow may misrepresent the resulting value.
Finally, this chapter explored common schemes for representing real numbers in binary, including the standard IEEE 754 standard. Note that when representing floating-point values, we sacrifice precision for increased flexibility (i.e., the ability to move the decimal point).
4.10. Exercises
-
What are the decimal and hexadecimal representations for the value 0b01001010?
-
What are the binary and hexadecimal representations for the value 389?
-
As a five-armed creature, Sally the starfish prefers to represent numbers using a base 5 number system. If Sally gives you the base 5 number 1423, what is the equivalent decimal value?
-
Early Access Interactive Number Conversion Questions
Solutions
If your browser supports hidden solutions, click here to reveal them.
-
0b01001010 in decimal is:
In hexadecimal, it’s:
0100 1010 4 A -> 0x4A
-
Converting 389 to decimal…
Using powers of two:
-
256 fits into 389, so d8 should be a 1. That leaves 389 - 256 = 133.
-
128 fits into 133, so d7 should be a 1. That leaves 133 - 128 = 5.
-
64 does not fit into 5, so d6 should be a 0.
-
32 does not fit into 5, so d5 should be a 0.
-
16 does not fit into 5, so d4 should be a 0.
-
8 does not fit into 5, so d3 should be a 0.
-
4 fits into 5, so d2 should be a 1. That leaves 6 - 5 = 1.
-
2 fits does not fit into 1, so d1 should be a 0.
-
1 fits into 1, so d0 should be a 1. That leaves 1 - 1 = 0.
Thus, decimal 389 corresponds to 0b110000101.
Using repeated division:
-
389 is odd, so d0 should be a 1.
-
389 / 2 = 194, which is even, so d1 should be a 0.
-
194 / 2 = 97, which is odd, so d2 should be a 1.
-
97 / 2 = 48, which is even, so d3 should be a 0.
-
48 / 2 = 24, which is even, so d4 should be a 0.
-
24 / 2 = 12, which is even, so d5 should be a 0.
-
12 / 2 = 6, which is even, so d6 should be a 0.
-
6 / 2 = 3, which is odd, so d7 should be a 1.
-
3 / 2 = 1, which is odd, so d8 should be a 1.
-
1 / 2 = 0, so any digit numbered nine or above will be 0.
Thus, decimal 389 corresponds to 0b110000101.
Converting to hexadecimal:
0001 1000 0101 1 8 5 -> 0x185
-
-
1423 in base 5 converted to decimal is:
5. What von Neumann Knew: Computer Architecture
The term computer architecture may refer to the entire hardware level of the computer. However, it is often used to refer to the design and implementation of the digital processor part of the computer hardware, and we focus on the computer processor architecture in this chapter.
The central processing unit (CPU, or processor) is the part of the computer that executes program instructions on program data. Program instructions and data are stored in the computer’s random access memory (RAM). A particular digital processor implements a specific instruction set architecture (ISA), which defines the set of instructions and their binary encoding, the set of CPU registers, and the effects of executing instructions on the state of the processor. There are many different ISAs, including SPARC, IA32, MIPS, ARM, ARC, PowerPC, and x86 (the latter including IA32 and x86-64). A microarchitecture defines the circuitry of an implementation of a specific ISA. Microarchitecture implementations of the same ISA can differ as long as they implement the ISA definition. For example, Intel and AMD produce different microprocessor implementations of IA32 ISA.
Some ISAs define a reduced instruction set computer (RISC), and others define a complex instruction set computer (CISC). RISC ISAs have a small set of basic instructions that each execute quickly; each instruction executes in about a single processor clock cycle, and compilers combine sequences of several basic RISC instructions to implement higher-level functionality. In contrast, a CISC ISA’s instructions provide higher-level functionality than RISC instructions. CISC architectures also define a larger set of instructions than RISC, support more complicated addressing modes (ways to express the memory locations of program data), and support variable-length instructions. A single CISC instruction may perform a sequence of low-level functionality and may take several processor clock cycles to execute. This same functionality would require multiple instructions on a RISC architecture.
All modern processors, regardless of their ISA, adhere to the von Neumann architecture model. The general-purpose design of the von Neumann architecture allows it to execute any type of program. It uses a stored-program model, meaning that the program instructions reside in computer memory along with program data, and both are inputs to the processor.
This chapter introduces the von Neumann architecture and the ancestry and components that underpin modern computer architecture. We build an example digital processor (CPU) based on the von Neumann architecture model, design a CPU from digital circuits that are constructed from logic gate building blocks, and demonstrate how the CPU executes program instructions.
References
-
ACM A. M. Turing Award Winners. https://amturing.acm.org/
-
"Pioneers of Modern Computer Architecture Receive ACM A.M. Turing Award", ACM Media Center Notice, March 2018. https://www.acm.org/media-center/2018/march/turing-award-2017
5.1. The Origin of Modern Computing Architectures
When tracing the ancestry of modern computing architecture, it is tempting to consider that modern computers are part of a linear chain of successive transmutations, with each machine simply an improvement of the one that previously existed. While this view of inherited improvements in computer design may hold true for certain classes of architecture (consider the iterative improvements of the iPhone X from the original iPhone), the root of the architectural tree is much less defined.
From the 1700s until the early 1900s, mathematicians served as the first human computers for calculations related to applications of science and engineering1. The word "computer" originally referred to "one who computes". Women mathematicians often served in the role of computer. In fact, the use of women as human computers was so pervasive that computational complexity was measured in "kilo-girls", or the amount of work a thousand human computers could complete in one hour2. Women were widely considered to be better at doing mathematical calculations than men, as they tended to be more methodical. Women were not allowed to hold the position of engineer. As such, they were relegated to more "menial" work, such as computing complex calculations.
The first general-purpose digital computer, the Analytical Engine, was designed by British mathematician Charles Babbage, who is credited by some as the father of the computer. The Analytical Engine was an extension of his original invention, the Difference Engine, a mechanical calculator that was capable of calculating polynomial functions. Ada Lovelace, who perhaps should be known as the mother of computing, was the very first person to develop a computer program and the first to publish an algorithm that could be computed using Charles Babbage’s Analytical Engine. In her notes is included her recognition of the general-purpose nature of the Analytical Engine: "[t]he Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.3" However, unlike modern computers, the Analytical Engine was a mechanical device and was only partially built. Most of the designers of what became the direct forerunners to the modern computer were unaware of the work of Babbage and Lovelace when they developed their own machines.
Thus, it is perhaps more accurate to think about modern computer architecture rising out of a primordial soup of ideas and innovations that arose in the 1930s and 1940s. For example, in 1937, Claude Shannon, a student at MIT, wrote what would go on to be perhaps the most influential masters thesis of all time. Drawing upon the work of George Boole (the mathematician who developed Boolean algebra), Shannon showed that Boolean logic could be applied to circuits and could be used to develop electrical switches. This would lead to the development of the binary computing system, and much of future digital circuit design. While men would design many early electronic computers, women (who were not allowed to be engineers) became programming pioneers, leading the design and development of many early software innovations, such as programming languages, compilers, algorithms, and operating systems.
A comprehensive discussion of the rise of computer architecture is not possible in this book (see Turing’s Cathedral4 by George Dyson and The Innovators6 by Walter Isaacson for more detailed coverage); however, we briefly enumerate several significant innovations that occurred in the 1930s and 1940s that were instrumental in the rise of modern computer architecture.
5.1.1. The Turing Machine
In 1937, British mathematician Alan Turing proposed7 the "Logical Computing Machine", a theoretical computer. Turing used this machine to prove that there exists no solution to the decision problem (in German, the Entscheidungsproblem), posed by the mathematicians David Hilbert and Wilhelm Ackermann in 1928. The decision problem is an algorithm that takes a statement as input and determines whether the statement is universally valid. Turing proved that no such algorithm exists by showing that the halting problem (will machine X halt on input y?) was undecidable for Turing’s machine. As part of this proof, Turing described a universal machine that is capable of performing the tasks of any other computing machine. Alonzo Church, Turing’s dissertation advisor at Princeton University, was the first to refer to the logical computing machine as the Turing machine, and its universal form as the universal Turing machine.
Turing later returned to England and served his country as part of the code breaking unit in Bletchley Park during World War II. He was instrumental in the design and construction of the Bombe, an electromechanical device that helped break the cipher produced by the Enigma machine, which was commonly used by Nazi Germany to protect sensitive communication during World War II.
After the war, Turing designed the automatic computing engine (ACE). The ACE was a stored-program computer, meaning that both the program instructions and its data are loaded into the computer memory and run by the general-purpose computer. His paper, published in 1946, is perhaps the most detailed description of such a computer8.
5.1.2. Early Electronic Computers
World War II accelerated much of the development of early computers. However, due to the classified nature of military operations in World War II, many of the details of innovations that occurred as a result of the frenetic activity during the war was not publicly acknowledged until years later. A good example of this is Colossus, a machine designed by British engineer Tommy Flowers to help break the Lorenz cipher, which was used by Nazi Germany to encode high-level intelligence communication. Some of Alan Turing’s work aided in its design. Built in 1943, Colossus is arguably the first programmable, digital, and fully electronic computer. However, it was a special-purpose computer, designed specifically for code breaking. The Women’s Royal Naval Service (WRNS, known as the "Wrens") served as operators of Colossus. In spite of the General Report of the Tunny14 noting that several of the Wrens showed ability in cryptographic work, none of them were given the position of cryptographer, and instead were delegated more menial Colossus operation tasks5,15.
On the other side of the Atlantic, American scientists and engineers were hard at work creating computers of their own. Harvard professor Howard Aiken (who was also a Naval Commander in the U.S. Navy Reserves) designed the Mark I, an electromechanical, general-purpose programmable computer. Built in 1944, it aided in the design of the atomic bomb. Aiken built his computer largely unaware of Turing’s work and was motivated by the goal of bringing Charles Babbage’s analytical engine to life6. A key feature of the Mark I was that it was fully automatic and able to run for days without human intervention6. This would be a foundational feature in future computer design.
Meanwhile, American engineers John Mauchly and Presper Eckert of the University of Pennsylvania designed and built the Electronic Numerical Integrator and Computer (ENIAC) in 1945. ENIAC is arguably the forerunner of modern computers. It was digital (though it used decimal rather than binary), fully electronic, programmable, and general purpose. While the original version of ENIAC did not have stored-program capabilities, this feature was built into it before the end of the decade. ENIAC was financed and built for the U.S. Army’s Ballistic Research Laboratory and was designed primarily to calculate ballistic trajectories. Later, it would be used to aid in the design of the hydrogen bomb.
As men were drafted into the armed forces during World War II, women were hired to help in the war effort as human computers. With the arrival of the first electronic computers, women became the first programmers, as programming was considered secretarial work. It should come as no surprise that many of the early innovations in programming, such as the first compiler, the notion of modularizing programs, debugging, and assembly language, are credited to women inventors. Grace Hopper, for example, developed the first high-level and machine-independent programming language (COBOL) and its compiler. Hopper was also a programmer for the Mark I and wrote the book that described its operation.
The ENIAC programmers were six women: Jean Jennings Bartik, Betty Snyder Holberton, Kay McNulty Mauchly, Frances Bilas Spence, Marlyn Wescoff Meltzer, and Ruth Lichterman Teitelbaum. Unlike the Wrens, the ENIAC women were given a great deal of autonomy in their task; given just the wiring diagrams of ENIAC, they were told to figure out how it worked and how to program it. In addition to their innovation in solving how to program (and debug) one of the world’s first electronic general-purpose computers, the ENIAC programmers also developed the idea of algorithmic flow charts, and developed important programming concepts such as subroutines and nesting. Like Grace Hopper, Jean Jennings Bartik and Betty Snyder Holberton would go on to have long careers in computing, and are some of the early computing pioneers. Unfortunately, the full extent of women’s contributions in early computing is not known. Unable to advance, many women left the field after World War II. To learn more about early women programmers, we encourage readers to check out Recoding Gender 11 by Janet Abbate, Top Secret Rosies, a PBS documentary12 directed by LeAnn Erickson, and "The Computers" by Kathy Kleiman13.
The British and the Americans were not the only ones interested in the potential of computers. In Germany, Konrad Zuse developed the first electromechanical general-purpose digital programmable computer, the Z3, which was completed in 1941. Zuse came up with his design independently of the work of Turing and others. Notably, Zuse’s design used binary (rather than decimal), the first computer of its kind to use the binary system. However, the Z3 was destroyed during aerial bombing of Berlin, and Zuse was unable to continue his work until 1950. His work largely went unrecognized until years later. He is widely considered the father of computing in Germany.
5.1.3. So What Did von Neumann Know?
From our discussion of the origin of modern computer architecture, it is apparent that in the 1930s and 1940s there were several innovations that led to the rise of the computer as we know it today. In 1945, John von Neumann published a paper, "First draft of a report on the EDVAC"9, which describes an architecture on which modern computers are based. EDVAC was the successor of ENIAC. It differed from ENIAC in that it was a binary computer instead of decimal, and it was a stored-program computer. Today, this description of EDVAC’s architectural design is known as the von Neumann architecture.
The von Neumann architecture describes a general-purpose computer, one that is designed to run any program. It also uses a stored-program model, meaning that program instructions and data are both loaded onto the computer to run. In the von Neumann model there is no distinction between instructions and data; both are loaded into the computer’s internal memory, and program instructions are fetched from memory and executed by the computer’s functional units that execute program instructions on program data.
John von Neumann’s contributions weave in and out of several of the previous stories in computing. A Hungarian mathematician, he was a professor at both the Institute of Advanced Study and Princeton University, and he served as an early mentor to Alan Turing. Later, von Neumann became a research scientist on the Manhattan Project, which led him to Howard Aiken and the Mark I; he would later serve as a consultant on the ENIAC project, and correspond regularly with Eckert and Mauchly. His famous paper describing EDVAC came from his work on the Electronic Discrete Variable Automatic Computer (EDVAC), proposed to the U.S. Army by Eckert and Mauchly, and built at the University of Pennsylvania. EDVAC included several architectural design innovations that form the foundation of almost all modern computers: it was general purpose, used the binary numeric system, had internal memory, and was fully electric. In large part because von Neumann was the sole author of the paper9, the architectural design the paper describes is primarily credited to von Neumann and has become known as the von Neumann architecture. It should be noted that Turing described in great detail the design of a similar machine in 1946. However, since von Neumann’s paper was published before Turing’s, von Neumann received the chief credit for these innovations.
Regardless of who "really" invented the von Neumann architecture, von Neumann’s own contributions should not be diminished. He was a brilliant mathematician and scientist. His contributions to mathematics range from set theory to quantum mechanics and game theory. In computing, he is also regarded as the inventor of the merge sort algorithm. Walter Isaacson, in his book The Innovators, argued that one of von Neumann’s greatest strengths lay in his ability to collaborate widely and to intuitively see the importance of novel concepts6. A lot of the early designers of the computer worked in isolation from one another. Isaacson argues that by witnessing the slowness of the Mark I computer, von Neumann was able to intuitively realize the value of a truly electronic computer, and the need to store and modify programs in memory. It could therefore be argued that von Neumann, even more than Eckert and Mauchly, grasped and fully appreciated the power of a fully electronic stored-program computer6.
5.1.4. References
-
David Alan Grier, "When Computers Were Human", Princeton University Press, 2005.
-
Megan Garber, "Computing Power Used to be Measured in 'Kilo-Girls'". The Atlantic, October 16, 2013. https://www.theatlantic.com/technology/archive/2013/10/computing-power-used-to-be-measured-in-kilo-girls/280633/
-
Betty Alexandra Toole, "Ada, The Enchantress of Numbers". Strawberry Press, 1998.
-
George Dyson, Turing’s Cathedral: the origins of the digital universe. Pantheon. 2012.
-
Jack Copeland, "Colossus: The Secrets of Bletchley Park’s Code-breaking Computers".
-
Walter Isaacson. "The Innovators: How a group of inventors, hackers, genius and geeks created the digital revolution". Simon and Schuster. 2014.
-
Alan M. Turing. "On computable numbers, with an application to the Entscheidungsproblem". Proceedings of the London mathematical society 2(1). pp. 230—265. 1937.
-
Brian Carpenter and Robert Doran. "The other Turing Machine". The Computer Journal 20(3) pp. 269—279. 1977.
-
John von Neumann. "First Draft of a Report on the EDVAC (1945)". Reprinted in IEEE Annals of the history of computing 4. pp. 27—75. 1993.
-
Arthur Burks, Herman Goldstine, John von Neumann. "Preliminary discussion of the logical design of an electronic computing instrument (1946)". Reprinted by The Origins of Digital Computers (Springer), pp. 399—413. 1982.
-
Janet Abbate. "Recoding gender: Women’s changing participation in computing". MIT Press. 2012.
-
LeAnn Erickson. "Top Secret Rosies: The Female Computers of World War II". Public Broadcasting System. 2010.
-
Kathy Kleiman, "The Computers". http://eniacprogrammers.org/
-
"Breaking Teleprinter Ciphers at Bletchley Park: An edition of I.J. Good, D. Michie and G. Timms: General Report on Tunny with Emphasis on Statistical Methods (1945)". Editors: Reeds, Diffie, Fields. Wiley, 2015.
-
Janet Abbate, "Recoding Gender", MIT Press, 2012.
5.2. The von Neumann Architecture
The von Neumann architecture serves as the foundation for most modern computers. In this section, we briefly characterize the architecture’s major components.
The von Neumann architecture (depicted in Figure 44) consists of five main components:
-
The processing unit executes program instructions.
-
The control unit drives program instruction execution on the processing unit. Together, the processing and control units make up the CPU.
-
The memory unit stores program data and instructions.
-
The input unit(s) load program data and instructions on the computer and initiate program execution.
-
The output unit(s) store or receive program results.
Buses connect the units, and are used by the units to send control and data information to one another. A bus is a communication channel that transfers binary values between communication endpoints (the senders and receivers of the values). For example, a data bus that connects the memory unit and the CPU could be implemented as 32 parallel wires that together transfer a 4-byte value, 1-bit transferred on each wire. Typically, architectures have separate buses for sending data, memory addresses, and control between units. The units use the control bus to send control signals that request or notify other units of actions, the address bus to send the memory address of a read or write request to the memory unit, and the data bus to transfer data between units.
5.2.1. The CPU
The control and processing units together implement the CPU, which is the part of the computer that executes program instructions on program data.
5.2.2. The Processing Unit
The processing unit of the von Neumann machine consists of two parts. The first is the arithmetic/logic unit (ALU), which performs mathematical operations such as addition, subtraction, and logical or, to name a few. Modern ALUs typically perform a large set of arithmetic operations. The second part of the processing unit is a set of registers. A register is a small, fast unit of storage used to hold program data and the instructions that are being executed by the ALU. Crucially, there is no distinction between instructions and data in the von Neumann architecture. For all intents and purposes, instructions are data. Each register is therefore capable of holding one data word.
5.2.3. The Control Unit
The control unit drives the execution of program instructions by loading them from memory and feeding instruction operands and operations through the processing unit. The control unit also includes some storage to keep track of execution state and to determine its next action to take: the program counter (PC) keeps the memory address of the next instruction to execute, and the instruction register (IR) stores the instruction, loaded from memory, that is currently being executed.
5.2.4. The Memory Unit
Internal memory is a key innovation of the von Neumann architecture. It provides program data storage that is close to the processing unit, significantly reducing the amount of time to perform calculations. The memory unit stores both program data and program instructions — storing program instructions is a key part of the stored-program model of the von Neumann architecture.
The size of memory varies from system to system. However, a system’s ISA limits the range of addresses that it can express. In modern systems, the smallest addressable unit of memory is one byte (8 bits), and thus each address corresponds to a unique memory location for one byte of storage. As a result, 32-bit architectures typically support a maximum address space size of 232, which corresponds to 4 gigabytes (GiB) of addressable memory.
The term memory sometimes refers to an entire hierarchy of storage in the system. It can include registers in the processing unit as well as secondary storage devices like hard disk drives (HDD) or solid-state drives (SSD). In the Storage and Memory Hierarchy Chapter, we discuss the memory hierarchy in detail. For now, we use the term "memory" interchangeably with internal random access memory (RAM) — memory that can be accessed by the central processing unit. RAM storage is random access because all RAM storage locations (addresses) can be accessed directly. It is useful to think of RAM as a linear array of addresses, where each address corresponds to one byte of memory.
5.2.5. The Input and Output (I/O) Units
While the control, processing, and memory units form the foundation of the computer, the input and output units enable it to interact with the outside world. In particular, they provide mechanisms for loading a program’s instructions and data into memory, storing its data outside of memory, and displaying its results to users.
The input unit consists of the set of devices that enable a user or program to get data from the outside world into the computer. The most common forms of input devices today are the keyboard and mouse. Cameras and microphones are other examples.
The output unit consists of the set of devices that relay results of computation from the computer back to the outside world or that store results outside internal memory. For example, the monitor is a common output device. Other output devices include speakers and haptics.
Some modern devices, such as the touchscreen, act as both input and output, enabling users to both input and receive data from a single unified device.
Solid-state and hard drives are another example of devices that act as both input and output devices. These storage devices act as input devices when they store program executable files that the operating system loads into computer memory to run, and they act as output devices when they store files to which program results are written.
5.2.6. The von Neumann Machine in Action: Executing a Program
The five units that make up the von Neumann architecture work together to implement a fetch-decode-execute-store cycle of actions that together execute program instructions. This cycle starts with a program’s first instruction, and is repeated until the program exits:
-
The control unit fetches the next instruction from memory. The control unit has a special register, the program counter (PC), that contains the address of the next instruction to fetch. It places that address on the address bus and places a read command on the control bus to the memory unit. The memory unit then reads the bytes stored at the specified address and sends them to the control unit on the data bus. The instruction register (IR) stores the bytes of the instruction received from the memory unit. The control unit also increments the PC’s value to store the address of the new next instruction to fetch.
-
The control unit decodes the instruction stored in the IR. It decodes the instruction bits that encode which operation to perform and the bits that encode where the operands are located. The instruction bits are decoded based on the ISA’s definition of the encoding of its instructions. The control unit also fetches the data operand values from their locations (from CPU registers, memory, or encoded in the instruction bits), as input to the processing unit.
-
The processing unit executes the instruction. The ALU performs the instruction operation on instruction data operands.
-
The control unit stores the result to memory. The result of the processing unit’s execution of the instruction is stored to memory. The control unit writes the result to memory by placing the result value on the data bus, placing the address of the storage location on the address bus, and placing a write command on the control bus. When received, the memory unit writes the value to memory at the specified address.
The input and output units are not directly involved in the execution of program instructions. Instead, they participate in the program’s execution by loading a program’s instructions and data and by storing or displaying the results of the program’s computation.
Figure 45 and Figure 46 show the four phases of instruction execution by the von Neumann architecture for an example addition instruction whose operands are stored in CPU registers. In the fetch phase, the control unit reads the instruction at the memory address stored in the PC (1234). It sends the address on the address bus, and a READ command on the control bus. The memory unit receives the request, reads the value at address 1234, and sends it to the control unit on the data bus. The control unit places the instruction bytes in the IR register and updates the PC with the address of the next instruction (1238 in this example). In the decode phase, the control unit feeds bits from the instruction that specify which operation to perform to the processing unit’s ALU, and uses instruction bits that specify which registers store operands to read operand values from the processing unit’s registers into the ALU (the operand values are 3 and 4 in this example). In the execute phase, the ALU part of the processing unit executes the operation on the operands to produce the result (3 + 4 is 7). Finally, in the store phase the control unit writes the result (7) from the processing unit to the memory unit. The memory address (5678) is sent on the address bus, a WRITE command is sent on the control bus, and the data value to store (7) is sent on the data bus. The memory unit receives this request and stores 7 at memory address 5678. In this example, we assume that the memory address to store the result is encoded in the instruction bits.
5.3. Logic Gates
Logic gates are the building blocks of the digital circuitry that implements arithmetic, control, and storage functionality in a digital computer. Designing complicated digital circuits involves employing a high degree of abstraction: a designer creates simple circuits that implement basic functionality from a small set of basic logic gates; these simple circuits, abstracted from their implementation, are used as the building blocks for creating more complicated circuits (simple circuits are combined together to create new circuits with more complicated functionality); these more complicated circuits may be further abstracted and used as a building block for creating even more complicated functionality; and so on to build complete processing, storage, and control components of a processor.
5.3.1. Basic Logic Gates
At the lowest level, all circuits are built from linking logic gates together. Logic gates implement boolean operations on boolean operands (0 or 1). AND, OR, and NOT form a complete set of logic gates from which any circuit can be constructed. A logic gate has one (NOT) or two (AND and OR) binary input values and produces a binary output value that is the bitwise logical operation on its input. For example, an input value of 0 to a NOT gate outputs 1 (1 is NOT(0)). A truth table for a logical operation lists the operation’s value for each permutation of inputs. Table 23 shows the truth tables for the AND, OR, and NOT logic gates.
A | B | A AND B | A OR B | NOT A |
---|---|---|---|---|
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
0 |
Figure 47 shows how computer architects represent these gates in circuit drawings.
A multi-bit version of a logic gate (for M-bit input and output) is a very simple circuit constructed using M one-bit logic gates. Individual bits of the M-bit input value are each input into a different one-bit gate that produces the corresponding output bit of the M-bit result. For example, Figure 48 shows a 4-bit AND circuit built from four 1-bit AND gates.
This type of very simple circuit, one that just expands input and output bit width for a logic gate, is often referred to as an M-bit gate for a particular value of M specifying the input and output bit width (number of bits).
5.3.2. Other Logic Gates
Even though the set of logic gates consisting of AND, OR, and NOT is sufficient for implementing any circuit, there are other basic logic gates that are often used to construct digital circuits. These additional logic gates include NAND (the negation of A AND B), NOR (the negation of A OR B), and XOR (exclusive OR). Their truth tables are shown in Table 24.
A | B | A NAND B | A NOR B | A XOR B |
---|---|---|---|---|
0 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
0 |
The NAND, NOR, and XOR gates appear in circuit drawings, as shown in Figure 49.
The circle on the end of the NAND and NOR gates represents negation or NOT. For example, the NOR gate looks like an OR gate with a circle on the end, representing the fact that NOR is the negation of OR.
5.4. Circuits
Digital circuits implement core functionality of the architecture. They implement the Instruction Set Architecture (ISA) in hardware, and also implement storage and control functionality throughout the system. Designing digital circuits involves applying multiple levels of abstraction: circuits implementing complex functionality are built from smaller circuits that implement partial functionality, which are built from even simpler circuits, and so on down to the basic logic gate building blocks of all digital circuits. Figure 51 illustrates a circuit abstracted from its implementation. The circuit is represented as a black box labeled with its functionality or name and with only its input and output shown, hiding the details of its internal implementation.
There are three main categories of circuit building blocks: arithmetic/logic, control, and storage circuits. A processor integrated circuit, for example, contains all three types of subcircuits: its register set uses storage circuits; its core functionality for implementing arithmetic and logic functions uses arithmetic and logic circuits; and control circuits are used throughout the processor to drive the execution of instructions and to control loading and storing values in its registers.
In this section, we discuss these three types of circuit, showing how to design a basic circuit from logic gates, and then how to build larger circuits from basic circuits and logic gates.
5.4.1. Arithmetic and Logic Circuits
Arithmetic and Logic circuits implement the arithmetic and logic instructions of an ISA that together make up the arithmetic logic unit (ALU) of the processor. Arithmetic and logic circuits also implement parts of other functionality in the CPU. For example, arithmetic circuits are used to increment the program counter (PC) as part of the first step of instruction execution, and they are used to calculate memory addresses by combining instruction operand bits and register values.
Circuit design often starts with implementing a 1-bit version of a simple circuit from logic gates. This 1-bit circuit is then used as a building block for implementing M-bit versions of the circuit. The steps for designing a 1-bit circuit from basic logic gates are:
-
Design the truth table for the circuit: determine the number of inputs and outputs, and add a table entry for every permutation of input bit(s) that specifies the value of the output bit(s).
-
Using the truth table, write an expression for when each circuit output is 1 in terms of its input values combined with AND, OR, NOT.
-
Translate the expression into a sequence of logic gates, where each gate gets its inputs from either an input to the circuit or from the output of a preceding logic gate.
We follow these steps to implement a single-bit equals
circuit: bitwise equals (A == B
) outputs 1 when the values of A
and B
are
the same, and it outputs 0 otherwise.
First, design the truth table for the circuit:
A | B | A == B output |
---|---|---|
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
Next, write expressions for when A == B
is 1 in terms of A
and B
combined
with AND, OR, and NOT. First, consider each row whose output
is 1 separately, starting with the first row in the truth table:
A | B | A == B |
---|---|---|
0 |
0 |
1 |
For the input values in this row, construct a conjunction of expressions of its inputs that evaluate to 1. A conjunction combines subexpressions that evaluate to 0 or 1 with AND, and is itself 1 only when both of its subexpressions evaluate to 1. Start by expressing when each input evaluates to 1:
NOT(A) # is 1 when A is 0 NOT(B) # is 1 when B is 0
Then, create their conjunction (combine them with AND) to yield an expression for when this row of the truth table evaluates to 1:
NOT(A) AND NOT(B) # is 1 when A and B are both 0
We do the same thing for the last row in the truth table, whose output is also 1:
A | B | A == B |
---|---|---|
1 |
1 |
1 |
A AND B # is 1 when A and B are both 1
Finally, create a disjunction (an OR) of each conjunction corresponding to a row in the truth table that evaluates to 1:
(NOT(A) AND NOT(B)) OR (A AND B) # is 1 when A and B are both 0 or both 1
At this point we have an expression for A == B
that can be translated to a
circuit. At this step, circuit designers employ techniques
to simplify the expression to create a minimal equivalent expression (one
that corresponds to the fewest operators and/or shortest path length of
gates through the circuit). Designers must take great care when minimizing
a circuit design to ensure the equivalence of the translated expression.
There are formal methods for circuit minimization that
are beyond the scope of our coverage, but we will employ a few heuristics
as we develop circuits.
For our example, we directly translate the preceding expression to a circuit. We may be tempted to replace (NOT(A) AND NOT(B)) with (A NAND B), but note that these two expressions are not equivalent: they do not evaluate the same for all permutations of A and B. For example, when A is 1 and B is 0, (A == B) is 0 and (A NAND B) is 1.
To translate the expression to a circuit, start from the innermost expression and work outward (the innermost will be the first gates, whose outputs will be inputs to subsequent gates). The first set of gates correspond to any negation of input values (NOT gates of inputs A and B). Next, for each conjunction, create parts of the circuit feeding input values into an AND gate. The AND gate outputs are then fed into OR gate(s) representing the disjunction. The resulting circuit is shown in Figure 52.
To verify the correctness of this circuit, simulate all possible permutations of input values A and B through the circuit and verify that the output of the circuit matches its corresponding row in the truth table for (A == B). For example, if A is 0 and B is 0, the two NOT gates negate their values before being fed through the top AND gate, so the input to this AND gate is (1, 1), resulting in an output of 1, which is the top input value to the OR gate. The values of A and B (0, 0) are fed directly though the bottom AND gate, resulting in output of 0 from the bottom AND gate, which is the lower input to the OR gate. The OR gate thus receives input values (1, 0) and outputs the value 1. So, when A and B are both 0, the circuit correctly outputs 1. Figure 53 illustrates this example.
Viewing the implementation of a 1-bit equality circuit as a unit allows it to be abstracted from its implementation, and thus it can be more easily used as a building block for other circuits. We represent this abstraction of the 1-bit equality circuit (shown in Figure 54) as a box with its two inputs labeled A and B and its single output labeled A == B. The internal gates that implement the 1-bit equality circuit are hidden in this abstracted view of the circuit.
Single-bit versions of NAND, NOR, and XOR circuits can be constructed similarly, using only AND, OR, and NOT gates, starting with their truth tables (Table 26) and applying the same steps as the 1-bit equality circuit.
A | B | A NAND B | A NOR B | A XOR B |
---|---|---|---|---|
0 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
0 |
Multibit versions of these circuits are constructed from multiple single-bit versions of the circuits in a similar way to how the 4-bit AND gate was constructed from four 1-bit AND gates.
Arithmetic Circuits
Arithmetic circuits are constructed using exactly the same method as we used for constructing the logic circuits. For example, to construct a 1-bit adder circuit, start with the truth table for single-bit addition, which has two input values, A and B, and two output values, one for the SUM of A and B, and another output for overflow or CARRY OUT. Table 27 shows the resulting truth table for 1-bit add.
A | B | SUM | CARRY OUT |
---|---|---|---|
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
0 |
1 |
In the next step, for each output, SUM and CARRY OUT, create logical expressions of when the output value is 1. These expressions are expressed as disjunctions of per-row conjunctions of input values:
SUM: (NOT(A) AND B) OR (A AND NOT(B)) # 1 when exactly one of A or B is 1 CARRY OUT: A AND B # 1 when both A and B are 1
The expression for CARRY OUT cannot be simplified. However, the expression for SUM is more complicated and can be simplified, leading to a simpler circuit design. The first thing to note is that the SUM output can also be expressed as (A XOR B). If we have an XOR gate or circuit, expressing SUM as (A XOR B) results in a simpler adder circuit design. If not, the expression using AND, OR, and NOT is used and implemented using AND, OR, and NOT gates.
Let’s assume that we have an XOR gate that we can use for implementing the 1-bit adder circuit. The resulting circuit is shown in Figure 55.
The 1-bit adder circuit can be used as a building block for more complicated circuits. For example, we may want to create N-bit adder circuits for performing addition on values of different sizes (e.g. 1-byte, 2-byte, or 4-byte adder circuits). However, creating an N-bit adder circuit from N 1-bit adder circuits requires more care than creating an N-bit logic circuits from N 1-bit logic circuits.
When performing a multibit addition (or subtraction), individual bits are summed in order from the least significant bit to the most significant bit. As this bitwise addition proceeds, if the sum of the ith bits results in a carry out value of 1, then an additional 1 is added with the two (i+1)st bits. In other words, the carry out of the ith bit adder circuit is an input value to the (i+1)st bit adder circuit.
Thus, to implement a multibit adder circuit, we need a new 1-bit adder circuit that has three inputs: A, B, and CARRY IN. To do this, follow the steps above for creating a 1-bit adder circuit, with three inputs (A, B, CARRY IN) and two outputs (SUM and CARRY OUT), starting with the truth table for all possible permutations of its three inputs. We leave the design of this circuit as an exercise for the reader, but we show its abstraction as a 1-bit adder circuit in Figure 56.
Using this version of a 1-bit adder circuit as a building block, we can construct an N-bit adder circuit by feeding corresponding operand bits through individual 1-bit adder circuits, feeding the CARRY OUT value from the ith 1-bit adder circuit into the CARRY IN value of the (i+1)st 1-bit adder circuit. The 1-bit adder circuit for the 0th bits receives a value of 0 for its CARRY IN from another part of the CPU circuitry that decodes the ADD instruction.
This type of N-bit adder circuit, built from N 1-bit adder circuits, is called a ripple carry adder, shown in Figure 57. The SUM result ripples or propagates through the circuit from the low-order to the high-order bits. Only after bit 0 of the SUM and CARRY OUT values are computed will the bit 1 of the SUM and CARRY OUT be correct