Dedication

HPC doc.

Job management

Submitting jobs

There are various possibilities to submit a job on the cluster. We have detailed the most common usage below.

Hint

If you belong to more than one account, you should specify which account should be used with the option –account=<account>

Batch mode

You can submit a batch job in two ways. Either directly specifying the options on the command line or with the help of a script.

Simple example to launch a batch job using 16 cores on the default partition:

sbatch --ntasks=16 myjob

Hint

The default partition is debug-EL7

Launch a multithreaded job using two thread by task (allocate a full node in this case):

sbatch --ntasks=8 --cpus-per-task=2

Hint

The default number of cpus per task is 1

Launch a job specifying the partition (see Partitions and limits) and max time:

sbatch --ntasks --partition=debug-EL7 --time=0-00:05:00

Hint

You will have more chance to have your job quickly scheduled if you specify an accurate max execution time (i.e. not the permitted maximum).

Example sbatch script:

#!/bin/sh
#SBATCH -J jobname
#SBATCH -e jobname-error.e%j
#SBATCH -o jobname-out.o%j
#SBATCH -n 48
#SBATCH -c 1
#SBATCH -p parallel-EL7
#SBATCH -t 48:00:00
srun jobname | tee stdout.txt

To submit your job do as follows:

sbatch yoursbatch.sh

You can submit two batch jobs that are dependent. The dependent job will only start when the first one terminated with success:

sbatch --ntasks=10 --time=10 pre_process.bash
Submitted batch job nnn
sbatch --ntasks=128 --time=60 --depend=nnn do_work.bash

Monothreaded jobs

When you submit a job on Baobab, the minimum resources that are allocated to you is a full node unless you use one of the mono* partition. In this case, as the node may be shared with other users, it’s wise to specify the memory you need per core. If you don’t, the default memory allocated to one core is 3GB.

Let say you want to launch one job that needs 1GB per core:

#!/bin/bash

#SBATCH --partition=mono-shared-EL7
#SBATCH --time=05:00
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1000 # in MB

srun ./yourprog

Adapt this example to fit your need. If you need to scale this solution to a bigger number of similar tasks, see Job array.

Multithreaded jobs

When you have a programm that needs more than one core per task (openMP, STATA etc), you can adapt the Monothreaded jobs example by adding one line:

#SBATCH --cpus-per-task=x

where x is between 1 and 32. We have only one node with 32 cores, be sure that you need it as you may have to wait many days for it to be available.

If you want to use all the cores of a node, but you don’t know in advance the characteristic of the node, you can use this sbatch script:

#!/bin/sh
#SBATCH --job-name=test
#SBATCH --time=00:15:00
#SBATCH --partition=shared-EL7
#SBATCH --output=slurm-%J.out
#SBATCH --exclusive
#SBATCH --ntasks=1

# We want to have a node with minimum
# 2 sockets, 4 cores per socket and 1 thread per core
# nb_cores = 2 * 4 * 1 = 8 cores
# if you want more cores, you can increase the number of cores per socket to
# 6 or 8 to have 12 or 16 cores.
#SBATCH --extra-node-info=2:4:1

We want to have at least 12GB RAM on this node
#SBATCH --mem=12000

# run one task which use all the cpus of the node
srun --cpu_bind=mask_cpu:0xffffffff ./mySoft

Interactive

If you need to perform some debug/tests, you can start an interactive session on a compute node. For example, let say you want to start a session on the debug-EL7 partition for a duration of 15minutes, using all the CPUS:

salloc -n1 -c 16 --partition=debug-EL7 --time=15:00 srun -n1 -N1 --pty $SHELL

See the login prompt changed to reflect that you are using a node:

[sagon@nodexxx ~]$

When done, you can stop the session like this:

exit

Job array

SLURM supports natively the notion of job arrays. It’s useful to use a job array when you have a lot of the same jobs to launch and you just want to give a different parameter to every job.

Here is an example for launching n time a monothreaded job:

#!/bin/bash

#SBATCH --partition=mono-shared-EL7
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=1000 # in MB
#SBATCH -o myjob-%A_%a.out

srun echo "I'm task_id " ${SLURM_ARRAY_TASK_ID} " on node " $(hostname)

When you launch it, you may specify how many cores ONE instance of your array needs with –ntasks if you are using MPI or –cpus-per-task otherwise. You must specify the array size and offset by specifying the start and stop of the array index. The maximum size of an array is currently set to 1000.

Example to launch an array of 100 jobs, each one using one core:

sbatch --array=1-100 mybatch.sh

Master/Slave

You can run different job step in an allocation. This may be useful in case of master slave program for example.

Example if you want to launch a master program on core 0 and 4 slaves jobs on cores 1-4:

cat master.conf
#TaskID Program Arguments
0 masterprogramm
1-4 slaveprogramm --rank=%0

srun --ntasks=5 --multi-prog master.conf

Use %t and %o to obtain task id and offset.

Monitor job

If you want to know at which time your job could start in theory, you can proceed like that:

srun --test-only -n100 -p parallel-EL7 -t 10 hostname

To see the queue:

squeue

To see when your job is scheduled to start (could be before):

scontrol show jobid 5543

Notification of job events

You can receive an email to the address used for Account registration when a determined event occurs in your job’s life:

--mail-type=<type> BEGIN, END, FAIL, REQUEUE, and ALL (any state change).

Those options can be used in sbatch and srun.

If your job get killed, one of the reason could be that you have used too much memory. To check if it’s the case, you can have a look at dmesg:

dmesg
Memory cgroup out of memory: Kill process 62310 (cavity3d) score 69 or sacrifice child

Memory and CPU usage

You can see how much memory/cpu your job is using if it is still running using sstat.

sstat --format=AveCPU,MaxRSS,JobID,NodeList -j <yourjobid>

If your job is no longer running, you can use sacct to see stats about.

sacct --format=Start,AveCPU,State,MaxRSS,JobID,NodeList,ReqMem --units=G -j <yourjobid>

If you want other information please see the sacct manpage.

Report and statistics with sreport

To get reporting about your past jobs, you can use sreport (https://slurm.schedmd.com/sreport.html).

Here are some examples that can give you a starting point :

To get the number of jobs ran by you ($USER) in 2018 (dates in yyyy-mm-dd format) :

[brero@login2 ~]$ sreport job sizesbyaccount user=$USER PrintJobCount start=2018-01-01 end=2019-01-01

--------------------------------------------------------------------------------
Job Sizes 2018-01-01T00:00:00 - 2018-12-31T23:59:59 (31536000 secs)
Units are in number of jobs ran
--------------------------------------------------------------------------------
  Cluster   Account     0-49 CPUs   50-249 CPUs  250-499 CPUs  500-999 CPUs  >= 1000 CPUs % of cluster
--------- --------- ------------- ------------- ------------- ------------- ------------- ------------
   baobab      root           180            40             4            15             0      100.00%

You can see how many jobs were run (grouped by allocated CPU). You can also see I had to specify an extra day for the end date end=2019-01-01 in order to cover the whole year Job Sizes 2018-01-01T00:00:00 - 2018-12-31T23:59:59.

You can also check how much CPU time (seconds) you have used on the cluster between since 2019-09-01 :

[brero@login2 ~]$ sreport cluster AccountUtilizationByUser user=$USER start=2019-09-01 -t Seconds
--------------------------------------------------------------------------------
Cluster/Account/User Utilization 2019-09-01T00:00:00 - 2019-09-09T23:59:59 (64800 secs)
Usage reported in CPU Seconds
--------------------------------------------------------------------------------
  Cluster         Account     Login     Proper Name     Used   Energy
--------- --------------- --------- --------------- -------- --------
   baobab        rossigno     brero   BRERO Massimo     1159        0

In this example, I added the time -t parameter to specify I want to see Seconds, but I can also ask for Minutes or Hours.

Please note :
  • By default, the time is in CPU Minutes
  • It takes up to an hour for Slurm to upate this information in its database
  • If you don’t specify a start, nor an end date, yesterday’s date will be used.
  • The CPU time is the time that was allocated to you. It doesn’t matter if the CPU was actually used or not. So let’s say you ask for 15min allocation, then do nothing for 3 minutes then run 1 CPU at 100% for 4 minutes and exit the allocation, then 7 minutes will be added to your CPU time.

Tip : If you absolutely need a report including your job that ran on the same day, you can override the default end date by forcing tomorrow’s date :

sreport cluster AccountUtilizationByUser user=$USER start=2019-09-01 end=$(date --date="tomorrow" +%Y-%m-%d) -t seconds

Memory

When you submit a job, the usable memory you have is 3GB per core. If you are running a job which requires more memory per core, you can specify it like this:

--mem-per-cpu=1000 # in MB

If you have requested a full node, you still need to specify how much memory you need:

--mem=60000 # 60GB

This is even the case if you request a partition such as bigmem!

You can as well specify a value of 0 to request all the memory of the node:

--mem=0

Job priority

The priority is determined by various factors like the usage history, job age and it’s size.

The scheduler use a backfill to maximize the cluster usage. To benefit of the backfill allocation, you can specify a minimum and a maximum execution time when you submit a job. If the resources for the max execution time are not available, the scheduler will try to lower the execution time until it reach the minimum execution time. In the following example, we submit a job specifying that we want ideally a two days execution time with a one day minimum:

srun --tasks=128 --time-min=1-00:00:00 --time=2-00:00:00 --partition parallel-EL7 myjob

Attention

Be sure that you have some kind of checkpointing activated as your job will be terminated between the min and max execution time.

Cancel a job

If you made a mistake, you can cancel a pending job or a running job (only yours). If you want to cancel a particular job, you need to know its job id (you can see it on the web interface or using squeue).

Cancel a job using its job id:

scancel jobid

Cancel a job and send a custom signal (for example to force the binary to generate a checkpoint):

scancel --signal=signal_name

Cancel all the jobs belonging to one user that are in state pending:

scancel --user=meyerx --state=pending