Compiling on a cluster

Overview

Teaching: 15 min
Exercises: 5 min

Questions

How do I compile C and FORTRAN programs?

What compilers can I use?

Objectives

Be able to compile a basic C or FORTRAN program

An important note

Please see the section called Setup for the demo files for this workshop.

What Is A Compiled Language

A compiled language is one which requires a program called a compiler to convert human readable instructions into machine code that the computer can directly understand. Some popular compiled languages are C, C++, and FORTRAN.

After the program source code is compiled once, it doesn’t need to be compiled again until the program is changed – the computer can now very quickly load up the program and run the instructions as many times as we would like it to.

There is another type of programming language called an interpreted language. With these languages, there is a program called an interpreter that loads and executes the source code of the program everytime you run it (usually one line at a time). Examples of popular interpreted languages are Python, Javascript and Ruby.

Why Use A Compiled Language?

The one obvious reason is speed – the machine code generated by a compiler runs a lot faster than source code that needs to be fed into an interpreter at run time, one line at a time.

Compiled languages also allow for tighter, more efficient memory management. You can create for your program exactly as much memory as you need. But you also have the power to shoot yourself in the foot – you also must clean up the memory you create to be efficient (but you don’t have to worry about cleaning it up when the program ends, the operating system will do this for you.)

Compilers can (often automatically) optimize the machine code generated to access special instructions of the chip you are running on to make your code run even faster. They can even reorder and rethink pieces of code to manage memory more efficiently.

Why Not Use A Compiled Language?

Compiled languages are harder to program, mostly due to the memory management required. For example, you will need to define the type of every variable you declare (e.g. character, integer, floating-point number). With most interpreted languages, you usually don’t have to worry about this.

Machine code generated by a compiler on one system is not guaranteed to run on another computer – the CPU chip features will sometimes be different between systems, so the compiled code won’t run. Even source code may need to be altered between different operating systems/CPU architectures/compilers. In general, interpreted code is very portable, and just runs the same on different systems.

Setting up the environment to compile programs

The Intel compiler is licensed on all of the Compute Canada clusters, and it is the preferred compiler to use. This is because Intel designed the CPU chips in our cluster, so their compilers will create better optimized code to run on these chips.

We do not have a license for the Intel Compiler on our training cluster, so we need to use the free compiler gcc. In order to use the compiler, we need to load into our work environment:

module load gcc

We can also add this to the file .bashrc in our home directory so that when the right compiler is chosen everytime we login.

Compiling our first C program

Now that we have a compiler loaded, lets compile a simple C program

gcc -o hello hello.c

Note: on a cluster with the Intel compiler installed, we could compile the same program using:

icc -o hello hello.c

Running our compiled C program

We can run the program with:

./hello

Hello World

Notice that we didn’t just type hello to run our new program. Here that the dot (.) at the beginning of the command refers to the current directory we are in, and tells our shell where to find the hello program. If we did not do this, the shell may not be able to find our program, or it could even find a different hello program and run that instead. For example, in the standard path /usr/bin, if there is a command called hello.

Compiling and running a FORTRAN program

Compiling FORTRAN code is very similar:

gfortran -o hello_fortran hello.f90

Run the compiled fortran program with:

./hello_fortran

Hello World

Note: the -o flag tells the compiler where to output the compiled machine code. If you exclude the -o flag, the machine code will be writen to a file called a.out. This compiled program can be run with:

./a.out

Note: as programs get more complex, it often makes sense to break up programs into multiple source files. These files will need to be linked together into a running program. At this point, it makes sense to use a Makefile (or some other build system) to manage the compiling and linking of your program.

Submitting our program to the scheduler

While it’s fine to run small programs on the login node of a compute cluster, most work should be submitted through the scheduler. Let’s create a slurm submission script called submit-hello-job.sh. Open an editor (e.g. Nano) and type (or copy/paste) the following contents:
#!/bin/bash 
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:05:00

./hello
We can now submit the job using the sbatch command:
sbatch submit-hello-job.sh
Submitted batch job 14
The job id number (14 in this case) is important and we will need it to look at the job output.

This job will likely run very quickly, but if you are fast enough you can see it in the queue (either in a running or a pending state):
squeue -u YOUR_USER_NAME
(Replace YOUR_USER_NAME with the username issued to you by your instructor.)

When our job is finished, we can find the output from the program looking in the slurm output file (in my case, this is called slurm-14.out):
cat slurm-14.out
Hello world!
Note that our training cluster does not use accounts like a Compute Canada cluster does. On a Compute Canada cluster, we can specify our accounting group with a line in our submission script like the following:
#SBATCH --account=def-whatever
(Replace def-whatever with an accounting group name that you’re authorized to use.)

Key Points

Compilers turn instuctions in programming languages to machine code that the computer can execute

lesson home

High-Performance Computing, Beyond

next episode