Compiling on a cluster
Overview
Teaching: 15 min
Exercises: 5 minQuestions
How do I compile C and FORTRAN programs?
What compilers can I use?
Objectives
Be able to compile a basic C or FORTRAN program
An important note
Please see the section called Setup for the demo files for this workshop.
What Is A Compiled Language
A compiled language is one which requires a program called a compiler to
convert human readable instructions into machine code that the
computer can directly understand. Some popular compiled languages are
C
, C++
, and FORTRAN
.
After the program source code is compiled once, it doesn’t need to be compiled again until the program is changed – the computer can now very quickly load up the program and run the instructions as many times as we would like it to.
There is another type of programming language called an interpreted language.
With these languages, there is a program called an interpreter that loads
and executes the source code of the program everytime you run it
(usually one line at a time). Examples of popular interpreted languages are
Python
, Javascript
and Ruby
.
Why Use A Compiled Language?
The one obvious reason is speed – the machine code generated by a compiler runs a lot faster than source code that needs to be fed into an interpreter at run time, one line at a time.
Compiled languages also allow for tighter, more efficient memory management. You can create for your program exactly as much memory as you need. But you also have the power to shoot yourself in the foot – you also must clean up the memory you create to be efficient (but you don’t have to worry about cleaning it up when the program ends, the operating system will do this for you.)
Compilers can (often automatically) optimize the machine code generated to access special instructions of the chip you are running on to make your code run even faster. They can even reorder and rethink pieces of code to manage memory more efficiently.
Why Not Use A Compiled Language?
Compiled languages are harder to program, mostly due to the memory management required. For example, you will need to define the type of every variable you declare (e.g. character, integer, floating-point number). With most interpreted languages, you usually don’t have to worry about this.
Machine code generated by a compiler on one system is not guaranteed to run on another computer – the CPU chip features will sometimes be different between systems, so the compiled code won’t run. Even source code may need to be altered between different operating systems/CPU architectures/compilers. In general, interpreted code is very portable, and just runs the same on different systems.
Setting up the environment to compile programs
The Intel compiler is licensed on all of the Compute Canada clusters, and it is the preferred compiler to use. This is because Intel designed the CPU chips in our cluster, so their compilers will create better optimized code to run on these chips.
We do not have a license for the Intel
Compiler on our training cluster, so we need to use the free compiler gcc
.
In order to use the compiler, we need to load into our work environment:
module load gcc
We can also add this to the file .bashrc
in our home directory so that when the
right compiler is chosen everytime we login.
Compiling our first C program
Now that we have a compiler loaded, lets compile a simple C program
gcc -o hello hello.c
Note: on a cluster with the Intel compiler installed, we could compile the same program using:
icc -o hello hello.c
Running our compiled C program
We can run the program with:
./hello
Hello World
Notice that we didn’t just type hello
to run our new program. Here that the dot (.
) at the beginning of the command refers to the current directory we are in, and tells our shell where to find the hello
program. If we did not do this, the shell may not be able to find our program, or it could even find a different hello program and run that instead. For example, in the standard path /usr/bin
, if there is a command called hello
.
Compiling and running a FORTRAN program
Compiling FORTRAN code is very similar:
gfortran -o hello_fortran hello.f90
Run the compiled fortran program with:
./hello_fortran
Hello World
Note: the -o
flag tells the compiler where to output the compiled machine code. If you exclude the -o
flag, the machine code will be writen to a file called a.out
. This compiled program can be run with:
./a.out
Note: as programs get more complex, it often makes sense to break up
programs into multiple source files. These files will need to be linked
together into a running program. At this point, it makes sense to use a
Makefile
(or some other build system) to manage the compiling and
linking of your program.
Submitting our program to the scheduler
While it’s fine to run small programs on the login node of a compute cluster, most work should be submitted through the scheduler. Let’s create a slurm submission script called
submit-hello-job.sh
. Open an editor (e.g. Nano) and type (or copy/paste) the following contents:#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --time=00:05:00 ./hello
We can now submit the job using the
sbatch
command:sbatch submit-hello-job.sh
Submitted batch job 14
The job id number (
14
in this case) is important and we will need it to look at the job output.This job will likely run very quickly, but if you are fast enough you can see it in the queue (either in a running or a pending state):
squeue -u YOUR_USER_NAME
(Replace
YOUR_USER_NAME
with the username issued to you by your instructor.)When our job is finished, we can find the output from the program looking in the slurm output file (in my case, this is called
slurm-14.out
):cat slurm-14.out
Hello world!
Note that our training cluster does not use accounts like a Compute Canada cluster does. On a Compute Canada cluster, we can specify our accounting group with a line in our submission script like the following:
#SBATCH --account=def-whatever
(Replace
def-whatever
with an accounting group name that you’re authorized to use.)
Key Points
Compilers turn instuctions in programming languages to machine code that the computer can execute