1.1. Getting Started Programming in C

Let’s start by looking at a "hello world" program that includes an example of calling a function from the math library. We compare the C version of this program to the Python version. The C version might be put in a file named hello.c (.c is the suffix convention used for C source code files), whereas the Python version might be in a file named hello.py.

Table 1. Comparison of the syntax for a small program in Python and C. Both the C version and Python version are available for download.
Python version (hello.py) C version (hello.c)
'''
    The Hello World Program in Python
'''

# Python math library
from math import *


# main function definition:
def main():
    # statements on their own line
    print("Hello World")
    print("sqrt(4) is %f" % (sqrt(4)))

# call the main function:
main()
/*
    The Hello World Program in C
 */

/* C math and I/O libraries */
#include <math.h>
#include <stdio.h>

/* main function definition: */
int main() {
    // statements end in a semicolon (;)
    printf("Hello World\n");
    printf("sqrt(4) is %f\n", sqrt(4));

    return 0;  // main returns value 0
}

The first thing to notice is that the C and Python versions of this program have similar structure and language constructs albeit with different language syntax. In particular:

Comments:

  • In Python: multi-line comments begin and end with ''', and single line comments begin with #

  • In C: multi-line comments begin with /* and end with */, and single line comments begin with //.

Importing Library Code:

  • In Python: libraries are included (imported) using import

  • In C: libraries are included (imported) using #include. All #include statements appear at the top of the program, outside of function bodies.

Blocks:

  • In Python: indentation is used to denote a block.

  • In C: blocks (e.g. function, loop, and conditional bodies) start with { and end with }.

The main function:

  • In Python: def main(): defines the main function.

  • In C: int main(){ } defines the main function. int is the return type of the main function, which means that main returns an integer value. int is C’s name for specifying the signed integer type (signed integers are values like -3, 0, 1234). main returns the int value 0 to signify running to completion without error.

Statements:

  • In Python: each statement is on a separate line.

  • In C: each statement ends with a semicolon ;. In C, statements must be within the body of some function (in main in this example).

Output:

  • In Python: the print function to print out a formatted string. Values for the placeholders in the format string follow a % symbol in a comma separated list of values (e.g. the value of sqrt(4) will be printed in place of the %f placeholder in the format string).

  • In C: the printf function is used to print out a formatted string. Values for the placeholders in the format string are additional arguments separated by commas (e.g. the value of sqrt(4) will be printed in place of the %f placeholder in the format string).

There are a few important differences to note in the C and Python versions of this program:

Indentation: In C, indentation does not have meaning, but it is good programming style to indent statements based on their containing block’s nested level.

Output: C’s printf function does not automatically print a newline character at the end like Python’s print function does. As a result, C programmers need to explicitly specify a newline character (\n) in the format string when a newline is desired in the output.

main function:

  • A C program must have a function named main, and its return type must be int. This means that the main function returns a signed integer type value. Python programs do not need to name their main function main, but they often do by convention.

  • The C main function has an explicit return statement to return an int value (by convention, main should return 0 if the main function is successfully executed without errors).

  • A Python program needs to include an explicit call to its main function to run it when the program is executed. In C, its main function is automatically called when the C program is executed.

1.1.1. Compiling and Running C programs

Python is an interpreted programming language, which means that another program, the Python interpreter, runs Python programs; the Python interpreter acts like a virtual machine on which Python programs are run. To run a Python program, the program source code (hello.py) is given as input to the Python interpreter program that runs it. For example ($ is the Linux shell prompt):

$ python hello.py

The Python interpreter is a program that is in a form that can be run directly on the underlying system (this form is called binary executable), and takes as input the Python program that it runs.

Interpreted execution of a Python program.
Figure 1. A Python program is directly executed by the Python interpreter, which is a binary executable program that is run on the underlying system (OS and hardware).

To run a C program, it must first be translated into a form that a computer system can execute directly. A C compiler is a program that translates C source code into a binary executable form that can be directly executed by the computer hardware. A binary executable consists of a series of 0’s and 1’s in a well-defined format that a computer can run.

For example, to run the C program hello.c on a Unix system, the C code must first be compiled by a C compiler (gcc) that produces a binary executable (by default named a.out). The binary executable version of the program then can be run directly on the system:

$ gcc hello.c
$ ./a.out

(Note: some C compilers may need to be explicitly told to link in the math library: -lm):

$ gcc hello.c -lm
C program text goes to the C compiler, which converts it into an executable sequence of zeroes and ones.  The format of the executable sequence can be run by the underlying system.
Figure 2. The C compiler (gcc) builds C source code into a binary executable file (a.out). The underlying system (OS and hardware) directly execute the a.out file to run the program.

Detailed Steps:

In general, the following sequence describes the necessary steps for editing, compiling, and running a C program on a Unix system:

  1. First, using a text editor (e.g. vim) write and save your C source code program in a file (e.g. hello.c):

    $ vim hello.c
  2. Next, compile the source to an executable form, and then run the executable form of our program. The most basic syntax for compiling with the GNU C compiler (gcc) is:

    $ gcc <input_source_file>

If compilation yields no errors, the compiler creates a binary executable file named a.out. The compiler also allows you to specify the name of the binary executable file to generate using the -o flag:

$ gcc -o <output_executable_file> <input_source_file>

For example, this command instructs gcc to compile hello.c into an executable file named hello:

$ gcc -o hello hello.c

We can invoke the executable program using ./hello:

$ ./hello

Any changes made to the C source code (the hello.c file), must be re-compiled with gcc to produce a new version of hello. If the compiler detects any errors during compilation, the ./hello file will not be created/recreated (but beware, an older version of the file from a previous successful compile may still exist).

Often when compiling with gcc, you want to include several command line options. For example, these options enable more compiler warnings and build a binary executable with extra debugging information:

$ gcc -Wall -g -o hello hello.c

Because the gcc command line can be long, often the make utility is used to simplify compiling C programs, and for cleaning up files created from gcc. Using make and writing Makefiles is an important skill that you will develop as you build up experience with C programming.

We cover compiling and linking with C library code in more detail at the end of Chapter 2.

1.1.2. Variables and C Numeric Types

Like Python, C uses variables as named storage locations for holding data. Thinking about the scope and type of program variables is important for understanding the semantics of what your program will do when you run it. A variable’s scope defines when the variable has meaning (i.e where and when in your program it can be used) and its lifetime (i.e. it could persist for the entire run of a program or only during a function activation). A variable’s type defines the range of values that it can represent and how those values will be interpreted when performing operations on its data.

In C, all variables must be declared before they can be used. To declare a variable, use the following syntax:

type_name variable_name;

A variable can only have a single type. The basic C types include char, int, float, double. By convention, in C variables should be declared at the beginning of their scope (top of a { } block) before any C statements in that scope.

Below is an example C code snippet that shows declaring and using variables of some different types. We discuss types and operators in more detail following this example.

vars.c
{
    /* 1. Define all variables in this block's scope at the top of the block. */

    int x;   // declares x to be an int type variable and allocates space for it

    int i, j, k;  // can define multiple variables of the same type like this

    char letter;  // a char stores a single byte integer value
                  // it is often used to store a single ASCII character
                  // value (it stores the ASCII numeric encoding of a character)
                  // a char in C is a different type than a string in C

    float winpct; // winpct is declared to be a float type
    double pi;    // the double type is more precise than float

    /* 2. After defining all variables, you can use them in C statements. */

    x = 7;        // x stores 7 (initialize variables before using their value)
    k = x + 2;    // use x's value in an expression

    letter = 'A';        // a single quote is used for single character value
    letter = letter + 1; // letter stores 'B' (its ASCII value is one more than 'A')

    pi = 3.1415926;

    winpct = 11 / 2.0; // winpct gets 5.5, winpct is a float type
    j = 11 / 2;        // j gets 5: int division truncates anything after the decimal
    x = k % 2;         // % is C's mod operator, so x gets 9 mod 2 (1)
}

In the example above, note the semicolons galore. Recall that C statements are delineated by ;, not line breaks — C expects a semicolon after every statement. You’ll forget some, and gcc almost never says "you missed a semicolon" even though that might be the only thing wrong with your program. In fact, often when you forget a semicolon, the compiler indicates a syntax error on the next line after the one with the missing semicolon, because it interprets it as part of the statement from the previous line. As you program more in C, you will learn to correlate gcc errors to the specific C syntax mistakes that they describe.

1.1.3. C Types

C supports a small set of built-in data types, and it provides a few ways in which programmers can construct basic collections of types (arrays and structs). From these basic building blocks, a C programmer can build complex data structures.

C defines a set of basic types for storing numeric values. Here are some examples of numeric literal values of different C types:

8     // the int value 8
3.4   // the double value 3.4
'h'   // the char value 'h' (its value is 104, the ASCII value of h)

The C char type stores a numeric value. However, it is often used by programmers to store the value of an ASCII character. A character literal value is specified in C as a single character between single quotes.

C does not support a string type, but programmers can create strings from C’s char type and its support for constructing arrays of values, which we discuss in later sections. C does, however, support a way of expressing string literal values in programs — a string literal is any sequence of characters between double quotes. C programmers often pass string literals as the format string argument to printf:

printf("this is a C string\n");

Python supports strings, but it does not have a char type. In C, a string and a char are two very different types, and they evaluate differently. This difference is illustrated by considering a C string literal that contains one character to a C char literal. For example:

'h'  // this is a char literal value   (its value is 104, the ASCII value of h)
"h"  // this is a string literal value (its value is NOT 104, it is not a char)

We discuss C strings and char variables in more detail in the Strings section, and instead we mainly focus on C’s numeric types here.

C Numeric Types

C supports several different types for storing numeric values. The types differ by the format of the numeric values they represent. For example, the float and double types can be used to represent real values, int to represent signed integer values, and unsigned int to represent unsigned integer values. Real values are positive or negative values with a decimal point, such as -1.23 or 0.0056. Signed integers are positive, negative, and zero integer values, such as -333, 0, or 3456. Unsigned integers are strictly non-negative integer values, such as 0 or 1234.

C’s numeric types also differ in the range and precision of values they can represent. The range or precision of a value depends on the number of bytes associated with its type. Types with more bytes can represent a larger range of values (for integer types), or higher precision values (for real types), than can types with fewer bytes.

Table 2 shows the number of bytes of storage, the kind of numeric values stored, and how to declare a variable for a variety of common C numeric types:

Table 2. C Numeric Types. (NOTE: these are typical sizes — the exact number of bytes depends on the hardware architecture!)
Type Name Usual Size Values Stored How to declare

char

1 byte

integers

char x;

short

2 bytes

signed integers

short x;

int

4 bytes

signed integers

int x;

long

4 or 8 bytes

signed integers

long x;

long long

8 bytes

signed integers

long long x;

float

4 bytes

signed real numbers

float x;

double

8 bytes

signed real numbers

double x;

C also provides unsigned versions of the integer numeric types (char, short, int, long, and long long). To declare a variable as unsigned, prepend the key word unsigned before the type name. For example:

int x;           // x is a signed int variable
unsigned int y;  // y is an unsigned int variable

The C standard does not specify if the char type is signed or unsigned. As a result, some implementations may implement char as signed integer values and others as unsigned. It is good programming practice to explicitly declare unsigned char if you want the unsigned version of a char variable.

The exact number of bytes of C types may vary from one architecture to the next. The sizes in the above table are minimum (and common) sizes for each type. You can print out the exact size on a given machine using C’s sizeof operator. The sizeof operator takes the name of a type as an argument and evaluates to the number of bytes used to store that type. For example:

printf("number of bytes in an int: %lu\n", sizeof(int));
printf("number of bytes in a short: %lu\n", sizeof(short));

sizeof evaluates to an unsigned long value so in the call to printf, use the placeholder %lu to print its value. On most architectures the output of these statements will be:

number of bytes in an int: 4
number of bytes in a short: 2

Arithmetic Operators

Arithmetic operators are used to combine values of numeric types. The resulting type of the operation is based on the type of the operands. For example, if two int values are combined with an arithmetic operator, the resulting type is also an integer.

C performs automatic type conversion when an operator combines operands of two different types. For example, if an int operand is combined with a float operand, the integer operand is first converted to its floating point equivalent before the operator is applied, and the type of the operation’s result is float.

The following arithmetic operators can be used on most numeric type operands:

  • add (+) and subtract (-)

  • multiply (*), divide (/) and mod (%):

    The mod operator, %, can only take integer type operands (int, unsigned int, short, etc.)

    If both operands are int types, then the divide operator, / does integer division (the resulting value is an int, truncating anything beyond the decimal point from the division operation). For example 8/3 evaluates to 2.

    If one or both of the operands are float (or double), / does real division and evaluates to a float (or double) result. For example, 8 / 3.0 evaluates to approximately 2.666667.

  • assignment (=):

    variable = value of expression;   (e.g., x = 3 + 4;)
  • assignment with update (+=, -=, *=, /=, and %=):

    variable op= expression;   (e.g., x += 3; is shorthand for x = x + 3;)
  • increment (++) and decrement (--):

    variable++;   (e.g., x++; assigns x the value x + 1)
Pre- vs. Post-increment

++variable and variable++ are both valid, but they are evaluated slightly differently:

  • ++x: increment x first, then use its value.

  • x++: use x’s value first, then increment it.

In many cases it doesn’t matter which you use because the value of the incremented or decremented variable is not being used in the statement. For example, these two statements are equivalent (although the first is the commonly used syntax for this statement):

x++;
++x;

In some cases though, context affects the outcome (when the value of the incremented or decremented variable is being used in the statement). For example:

x = 6;
y = ++x + 1;  // y is assigned 8: increment x first, then evaluate (x+1) (8)

x = 6;
y = x++ + 1;  // y is assigned 7: evaluate x+1 first (7), then increment x

Code like the example above that uses an arithmetic expression with an increment operator is often hard to read, and it’s easy to get wrong. As a result, it’s generally better to avoid writing code like this, instead writing separate statements for exactly the order you want. For example, if you want to first increment x and then assign x+1 to y, just write it as two separate statements.

Instead of this:

y = ++x + 1;

Write it as two separate statements:

x++;
y = x + 1;