Cluster (Torque)

The computing cluster

We have a cluster for parallel computing at CEREMADE. You can download here cluster2017.pdf the slides of a presentation from November 2017. In addition, a set of "test" examples (examples presented in this page) can be downloaded : tutoCluster.zip.

Description of the structure

The nodes

The cluster consists of 8 nodes (machines named clust1, clust2, etc.) of different configurations:

  • clust1: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust2: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust3: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust4: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust5: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust6: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust7: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust8: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

That's a total of 320 CPUs !

For the ERC MDFT

  • clust9: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  • clust10: 40 CPU(s), Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

TORQUE and the cluster machine

To submit a calculation, the TORQUE service has been set up to manage the submitted calculations. This is done by setting up a particular machine (front end) called "cluster" through which one must submit the desired calculation requesting time and resources that will be managed by the TORQUE service.

Copy files to cluster

An example by scp:

scp -r /home/chupin/pi/ chupin@cluster:~/

Connecting to the cluster machine

You connect to the cluster machine with ssh:

ssh username@cluster

Send a calculation

To send a calculation, you must write a PBS file that will indicate to TORQUE the requirements and commands to execute the calculation.

PBS scripts are bash scripts that contain PBS commands as comments.

Building the PBS script

Minimum example of PBS commands

A minimal example of PBS commands is provided here. All commands are described at the end of the page.

#!/bin/bash
# Submission.pbs file
#PBS -N calculationPiChupin
#PBS -l walltime=0:01:10
#PBS -l nodes=2:ppn=8
#PBS -M chupin@ceremade.dauphine.fr
#PBS -m e

The walltime

If the walltime is not specified, it is set to 9999h. So, when you don't know a priori the execution time of your code, it's better not to specify the walltime in the PBS script. If the code ends before the time specified by the walltime, the job is "evacuated" from the job queue, and frees the reserved resource. On the other hand, if the time specified by the walltime is reached, the job is abruptly interrupted.

Some PBS environment variables

The following PBS scripts:

#!/bin/bash                                                                                                                                                                                                                                                                                                                   
#PBS -N TestEnvironment                                                                                                                                                                                                                                                                                                       
#PBS -l walltime=4:00:30                                                                                                                                                                                                                                                                                                      
#PBS -l nodes=1:ppn=8                                                                                                                                                                                                                                                                                                         
#PBS -j oe                                                                                                                                                                                                                                                                                                                    
#PBS -M chupin@ceremade.dauphine.fr                                                                                                                                                                                                                                                                                           
##PBS -m e                                                                                                                                                                                                                                                                                                                    
#PBS -o $PBS_JOBNAME$PBS_JOBID.o                                                                                                                                                                                                                                                                                              

### directory management                                                                                                                                                                                                                                                                                                   
echo Repertoire darrivee `pwd
# directory from which the job was submitted                                                                                                                                                                                                                                                                                
echo Working directory $PBS_O_WORKDIR
# we move in the directory                                                                                                                                                                                                                                                                                            
cd $PBS_O_WORKDIR

### Some information that may be useful                                                                                                                                                                                                                                                                                    
echo Host `hostname
echo The list of CPUs
hostfile=`cat $PBS_NODEFILE`
echo $hostfile

### Total number of CPUs                                                                                                                                                                                                                                                                                                       
NPROCS=`wc -l < $PBS_NODEFILE`
echo It has been allocated $NPROCS cpus

### Setting the env variable for OpenMP                                                                                                                                                                                                                                                                        
# $PBS_NUM_PPN is the number of CPUs per request node                                                                                                                                                                                                                                                                        
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS
echo This job has $OMP_NUM_THREADS cpus

will produce the following result :

Directory darrivee /home/users/chupin
Work directory /home/users/chupin/tutoCluster/PBSWORKDIR
Host clust9
The list of CPUs
clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9 clust9
It has been allocated 8 cpus
This job has 8 cpus

Execution of the calculation program

To launch a job from a cluster, use :

chupin@cluster:~/pi/> qsub submission.pbs

Other TORQUE tools to view, cancel, stop, etc. a jo are described at the end of the page.

Full examples

Example 1: Example using MPI

Suppose we have a pi2.c code using MPI, and therefore compiled with mpic++. We place the pi2 executable on the cluster machine (via scp for example), along with the following file (named here submission.pbs):

#!/bin/bash
# Submission.pbs file
#PBS -N calculationPiChupin
#PBS -l walltime=0:01:10
#PBS -l nodes=2:ppn=8

# For OpenMP export 
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS

# we move to the TORQUE directory
cd $PBS_O_WORKDIR
# we execute the pi2 program
mpiexec -machinefile $PBS_NODEFILE -mca btl ^openib --mca btl_tcp_if_include bond0 pi2

This is a bash script whose comments starting with #PBS are commands for TORQUE (PBS). Here, the name of the calculation is calculPiChupin, we ask for at most 10 minutes of calculation time, 16 CPUs with 2 "nodes" each with 8 processors (it is TORQUE which manages the choice of machines and CPUs).

Once these files are on the cluster' machine in a directory in itshome', we submit the job using the following command :

chupin@cluster:~/pi/> qsub submission.pbs

Example 2: Example using Matlab

Matlab is installed on all nodes of the cluster. It can therefore be used. Let's suppose that in our working directory we have a script script.m that we want to run.

#!/bin/bash
# Submission.pbs file
#PBS -N calculationPiChupin
#PBS -l walltime=1:0:0:0
#PBS -l nodes=1:ppn=20

# For OpenMP export 
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS

# we move to the TORQUE directory
cd $PBS_O_WORKDIR
# we run the matlab program but without a graphical interface
matlab -nodisplay -nodesktop -r "run('script.m')

This is a bash script whose comments starting with #PBS are commands for TORQUE (PBS). Here, the name of the calculation is calculPiChupin, we ask for at most one hour of calculation time.

Warning : here, we have taken 20 CPUs with 1 "node" (it's TORQUE which manages the choice of machines and CPUs), erased some of them to be able to use the 40 threads available, we have to do multithreading with Matlab, and we don't know how to do without a graphical interface.

Warning : on some accounts, matlab is not accessible, and you have to specify the full path of the executable :

/usr/local/bin/matlab -nodisplay -nodesktop -r "run('script.m')

Once these files are on the cluster' machine in a directory in itshome', we submit the job using the following command:

chupin@cluster:~/codematlab/> qsub submission.pbs

Example 3: Example using OpenMP

Let's consider a C++ compute_pi.cpp code using the omp.h library and thus the OpenMP instructions to the compiler.

Such a code must be compiled as follows:

g++ -o compute_pi -fopenmp compute_pi.cpp 

Once this is done, we create a PBS script that can look like :

#!/bin/bash
#PBS -N calculationPiOpenMP
#PBS -l walltime=0:01:10
#PBS -l nodes=1:ppn=24
#PBS -j oe
#PBS -M chupin@ceremade.dauphine.fr
#PBS -m e
#PBS -o pi.o$PBS_JOBID

# For OpenMP export
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS

# we move to the TORQUE directory
cd $PBS_O_WORKDIR
# execute the program
./compute_pi

Once these files are on the cluster' machine in a directory in itshome', we submit the job using the following command:

chupin@cluster:~/pi/> qsub submission.pbs

Example 4: Example using Python

Consider a Python script. numpy and scipy use OpenMP to parallelize some functions.

A PBS script to run this script on the computing cluster may look like :

#!/bin/bash
# Submission.pbs file
#PBS -N ExPython
#PBS -l walltime=12:01:10
#PBS -l nodes=1:ppn=8

# For OpenMP export 
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS

# we move to the TORQUE directory
cd $PBS_O_WORKDIR
python3 script.py

Once these files are on the cluster' machine in a directory in itshome', we submit the job using the following command:

chupin@cluster:~/pythonEx/> qsub submission.pbs

PBS options

Option Description
#PBS -N <name> Sets the name of the "job" as <name> instead of the name of the submission script.
#PBS -l nodes=<value>:nnp=<value> Set the number of nodes (nodes) with the number of processors per node (nnp). We can also specify by name the nodes, and thus deal with a heterogeneous architecture like ours: #PBS -l nodes=clust1:ppn=24+clust9:ppn=24+clust4:ppn=8
#PBS -l walltime=<time> Maximum requested time. <time> must be in the format HH:MM:SS.
#PBS -o <filename> Redirects standard output to the <filename> file instead of <job script>.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS containing the "job" id.
#PBS -e <filename> Redirects the error output to the <filename> file instead of <job script>.e$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS containing the "job" id.
#PBS -j {oe,eo} Gathers standard and error output in the error file (eo) or the standard file (oe).
#PBS -m {a,b,e} Sends an email when the job fails (option a _first), when the job starts (option b begin), when the job finishes (option e end).
#PBS -M <email> Specifies the mail for sending.
#PBS -S <shell> Specifies the shell to interpret the submission script.

Note

You can specify exactly which nodes you want to use with different numbers of processors on each with the option :

#PBS -l nodes=1:ppn=5:clust8

This allows you to explicitly choose the clust8 node, with 5 threads for this node. Of course, this is not advisable, it is TORQUE that manages the job distribution.

PBS environment variables

Variable Name Description Example
PBS_JOBID The job identifier (calculation) 12345.cluster
PBS_JOBNAME Job name defined with -N option my_job
PBS_NODEFILE Name of a file which is made by TORQUE and contains the list of nodes used. Each node appears as many times as the number of threads (cores) requested by the ppn option. /var/spool/pbs/aux/12345.cluster
PBS_O_HOST Name of the host on which qsub was run (at our cluster) cluster
PBS_O_WORKDIR Directory from which the job is submitted /home/user/chupin/scripts_pbs
PBS_SERVER Name of the machine on which the TORQUE server runs cluster
PBS_VERSION PBS version TORQUE-2.5.3
PBS_NUM_NODES Number of nodes required for the job (e.g. with -l nodes=20:ppn=8)
PBS_NUM_PPN Number of threads (cores) per node required for the job (e.g. with -l nodes=20:ppn=8);

Manage jobs

To list the calculations launched on the cluster, we use the pbstop program:

chupin@cluster:~/pi/> pbstop

that produces something like:

Usage Totals: 0/320 Procs, 0/8 Nodes, 0/1 Running Jobs
Node States: 8 free

CPU 0
     1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ....
clust1 .... ... ... ... ... .....
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust2 ....................................................................
clust2 ... ... ... ... ... .....
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust3 ....................................................................
clust3 ... ... ... ... ... ....
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust4 ....................................................................
clust4 ... ... ... ... ... ....
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust5 . . . . . . . . . .   . . . . . . . . . .   . . . . . . . . . .
clust5 . . . . . . . . . .
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust6 . . . . . . . . . .   . . . . . . . . . .   . . . . . . . . . .
clust6 . . . . . . . . . .
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust7 . . . . . . . . . .   . . . . . . . . . .   . . . . . . . . . .
clust7 . . . . . . . . . .
     ---------------------------------------------------------------

     1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0   1 2 3 4 5 6 7 8 9 0
     ---------------------------------------------------------------
clust8 . . . . . . . . . .   . . . . . . . . . .   . . . . . . . . . .
clust8 . . . . . . . . . .
     ---------------------------------------------------------------

  Job#  Username  Queue    Jobname    Nodes   S  Elapsed/Requested
  150   chupin    jobq     PythonDIIS     1   Q       --/   04:00

[?] unknown  [@] busy  [*] down  [.] idle  [%] offline  [!] other

Other programs

Other programs are available to handle calculations launched by qsub. In particular :

  • qsub which allows you to submit a calculation.

  • qstat which examines the status of a job. The ID given in the #JOB column of pbstop must be given.

    chupin@cluster:~/pi/> qstat 150
  • qhold that puts a job on hold. The ID given in the #JOB column of pbstop must be given.

  • qrls that quits waiting for a job. The ID given in the #JOB column of pbstop must be given.

  • qdel that deletes a job. The ID given in the #JOB column of pbstop must be given.