17.8. Archiving and Compressing Files

The tar (tape archive) utility is used to package up a set of files into a single file (an archive file or a tar file). The tar command is also used to extract the files out of a tar file. An archive file (or tar file) contains all the file data from the files that make up the tar file plus additional meta-data that describe its contents. The meta-data are necessary for tar to extract the individual file(s) out of a tar file.

Tar files are often used to distribute a set of files or to transfer a set of files from one machine to another. They also may be used to store backups of sets of files (file archives). For example, if a user who wants to transfer a set of files from one system to another, they would first run tar with the -c command line option to create a single tar file containing all of the files, then scp the resulting tar file to the remote machine. On the remote machine they would then run tar with -x to extract files from the tar file.

The tar command has many command line options for configuring its behavior in different ways. Table 1 lists some examples.

Table 1. tar command line options
Option Description

-c

create a tar file from a set of files

-x

extract, or untar, a tar file

-t

list the contents of a tar file

-v

verbose mode, prints out information as runs command

-f <tar file name>

specifies the name of the tar file

To create a tar file you need to use -c, specify the name of the tar output file with -f and then give the list of files (and directories) to include in the tar file (full directory contents will be included).

Here is an example of how to create a tar file from three regular files (main.c, process.c, Makefile) and the contents of a directory (inputfiles/):

$ tar -c -v -f mytarfile.tar  main.c process.c Makefile intputfiles/

-v is the verbose command line option. It lists some information about the files being tar’ed up as tar executes.

The command line options to tar can also be specified using an alternate, more compact, syntax (this is often used in tar examples you see):

$ tar -cvf mytarfile.tar  main.c process.c Makefile intputfiles/

$ tar cvf mytarfile.tar  main.c process.c Makefile intputfiles/

Alternatively, you can create a directory and copy the files you want to tar into it, and generate a tar file from that directory. For example:

$ mkdir mytardir
$ cp *.c mytardir/.
$ cp -r inputfiles mytardir/.
$ tar cvf mytarfile.tar mytardir

To list the files in a tar file, use the -t command line option:

$ tar -tvf mytarfile.tar

To extract files from a tar file, use the -x command line option:

$ tar xvf mytarfile.tar

17.8.1. file compression

Often times file compression is used along with tar to reduce the size of the tar file. This is useful for both reducing the storage space required for the tar file, and also to reduce the amount of data that are sent over the network when a tar file is copied to a remote machine using scp. The gzip and gunzip are used to compress and uncompress a file using gzip compression, and bzip2 and bunzip2 are used to compress and uncompress a file using bzip compression. Here are some example calls:

$ ls -l
-rw-r--r-- 1 sarita users 40840 Apr 12 16:55 file

$ gzip file        # compress using gzip
$ ls -l
-rw-r--r-- 1 sarita users 13258 Apr 12 16:55 file.gz

$ gunzip file.gz   # uncompress a gziped file
$ ls -l
-rw-r--r-- 1 sarita users 40840 Apr 12 16:55 file

$ bzip2 file        # compress using bzip2
$ ls -l
-rw-r--r-- 1 sarita users 12568 Apr 12 16:55 file.bz2

$ bunzip2 file.bz2   # uncompress a bzip2ed file
$ ls -l
-rw-r--r-- 1 sarita users 40840 Apr 12 16:55 file

tar and file compression

File compression is often used on tar files. A user can run file compression utilities on a tar file after tar creates it and run utilities to uncompress the tar file before running tar to extract files from the tar file. However, because file compression is so commonly used with tar files, tar has command line options to create compressed tar files (and to extract files from compressed tar files) using different compression algorithms (for example, -j uses bzip2, and -z uses gzip).

Here are some examples of compressing the tar file in one step using the -z option to gzip or -j option for bzip: to tar:

# use -z option to compress tar file using gzip:
$ tar -czvf myproj.tar.gz myproj/
$ ls
  myproj/ myproj.tar.gz


# or use -j option to compress tar file using bzip2:
$ tar -cjvf myproj.tar.bz2 myproj/
$ ls
  myproj/ myproj.tar.bz2

Alternatively, here is an example of separating the tar file creation and compression into two steps (create tar file, then compress using gzip (or bzip2):

$ tar cvf myproj.tar myproj/   # create a tar file from myproj
$ ls
  myproj/ myproj.tar

$ gzip myproj.tar     # gzip the tar file
$ ls
  myproj/ myproj.tar.gz

# OR could use bzip
$ bzip2 myproj.tar     # bzip the tar file
$ ls
  myproj/ myproj.tar.bz2

You can also use the one-step or two-step method to extract files from a compressed tar file. For example, here is the single step way (-z specifies to uncompress the file first using gzip):

$ tar -xzvf myproj.tar.gz
$ ls
  myproj/ myproj.tar.gz

Alternatively, here is the two-step method to extract files from a compressed tar file (first uncompress with gunzip or bunzip2, then untar):

$ ls
  myproj.tar.gz
$ gunzip myproj.tar.gz
$ ls
  myproj.tar
$ tar -xvf myproj.tar
$ ls
  myproj/ myproj.tar


# or using bunzip2 for bzip'ed file
$ ls
  myproj.tar.bz2
$ bunzip myproj.tar.bz2
$ ls
  myproj.tar
$ tar -xvf myproj.tar
$ ls
  myproj/ myproj.tar

17.8.2. putting it all together

Below is an example of using tar to transfer a directory’s contents from one machine (laptop) to a remote machine (ada.cs.college.edu). It uses the scp command to remotely copy the file between the two machines (see Section 17.3 for more information about scp).

  1. First, the user creates the compressed tar file on laptop (note: we show the two-step method to create and compress the tar file in order to show the listing of the files in the tar file, but the compressed tar file could also be created in one-step using the -cjvf command line options):

    laptop$  ls myproj/
      Makefile input1.txt input2.txt main.c mux.c mux.h
    
    laptop$  tar -cvf myproj.tar myproj/   # create a tar file from myproj
    laptop$  ls
      myproj/ myproj.tar
    
    laptop$  tar -tf myproj.tar            # list tar file contents
       myproj/
       myproj/Makefile
       myproj/input1.txt
       myproj/input2.txt
       myproj/main.c
       myproj/mux.c
       myproj/mux.h
    
    laptop$ gzip myproj.tar              # compress the file
  2. Then, they scp the single tar file to the remote machine:

    laptop$ scp myproj.tar sam@ada.cs.college.edu:.   # scp single tar file
      sam@ada.cs.college.edu's password:
  3. Then the user can then ssh into the remote machine, and untar the myproj.tar file to extract the files:

    laptop$ ssh sam@ada.cs.college.edu
      sam@ada.cs.college.edu's password:
    
    ada$ ls
     classes/ letters/  myproj.tar projects/
    
    ada$ tar -xjvf myproj.tar          # extract contents of tar file
    
    ada$ ls
     classes/ letters/  myproj/  myproj.tar projects/

If the user wanted to untar the directory somewhere other than in their home directory, they could use mv to move the directory where they want (or they could have also moved the tar file to where they want it before untaring it). For example, to move it as a subdirectory under the project directory:

ada$ ls
   classes/ letters/  myproj/  myproj.tar projects/
ada$ mv myproj project/.
ada$ ls
   classes/ letters/  myproj.tar projects/
ada$ cd project
ada$ ls
   myproj/

Often after untaring a tar file, the user removes the .tar file:

rm myproj.tar
ls

17.8.3. References

For more information see: