Wednesday, January 7, 2015

SLURM tutorial : Basic commands


Main website for learning SLRUM


http://slurm.schedmd.com/tutorials.html

Submit a job with name and outputfile name(This will overwrite the parameters in shell file header )

sbatch   -J   job1  -o   job1.out  --partition=batch    myscript.sh

 

Basic shell script for job

#!/bin/sh
#
#SBATCH --job-name=testJob
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --partition=dragon-default
#
# Display all variables set by slurm
env | grep "^SLURM" | sort

#
cd /projects/dragon/FANTOM5/processed_data_feature

## All my commands for job will go here

date;time;
mkdir t1

How to submit a batch job

sbatch myscript.sh

How to check the list of jobs of a user

squeue -u user1
squeue -u user1 -l  # it will show in details
 

How to check the whole history and status of a job

 scontrol show job=JOBID

 

How to use one particular node in interactive mode. Useful when all jobs are pending and you need to run a job



srun --pty --time=5:00:00 bash


How to kill job


  •  Cancel job 1234 along with all of its steps:
    •               scancel 1234
  •  Send SIGKILL to all steps of job 1235, but do not cancel the job itself:
    •               scancel --signal=KILL 1235
  •  Send SIGUSR1 to the batch shell processes of job 1236:
    •               scancel --signal=USR1 --batch 1236
  •  Cancel job all pending jobs belonging to user "bob" in partition "debug":
    •               scancel --state=PENDING --user=bob --partition=debug
  •  Cancel only array ID 4 of job array 1237
    •               scancel 1237_4

How to start a node in interactive mode


srun --pty --nodes=1 --exclusive --partition=interactive bash -l

How to start a  GUI in cluster


You need xserver to access the GUI

1. Login to cluster

 ssh -Y username@login.cbrc.kaust.edu.sa

Next, you'll need to get an interactive jobs started, for example to get a whole node in the interactive queue:

2.  Open an interactive node

srun --pty --nodes=1 --exclusive --partition=interactive bash -l
3. Suppose you want to use matlab in GUI, then do following two commands

module load matlab
matlab &

Full list of SLURM commands [ Source: http://www.tchpc.tcd.ie/node/129 ]

Man pages exist for all SLURM daemons, commands, and API functions. The command option --help also provides a brief summary of options. Note that the command options are all case insensitive.
  • sacct is used to report job or job step accounting information about active or completed jobs.
  • salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
  • sattach is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times.
  • sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
  • sbcast is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
  • scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
  • scontrol is the administrative tool used to view and/or modify SLURM state. Note that many scontrol commands can only be executed as user root.
  • sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
  • smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
  • squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
  • srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
  • smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
  • strigger is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.
  • sview is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by SLURM.

 Handling multiple jobs by Job Array:


 http://slurm.schedmd.com/job_array.html