Main website for learning SLRUM
http://slurm.schedmd.com/tutorials.html
Submit a job with name and outputfile name(This will overwrite the parameters in shell file header )
sbatch -J job1 -o job1.out --partition=batch myscript.shBasic shell script for job
#!/bin/sh#
#SBATCH --job-name=testJob
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --partition=dragon-default
#
# Display all variables set by slurm
env | grep "^SLURM" | sort
#
cd /projects/dragon/FANTOM5/processed_data_feature
## All my commands for job will go here
date;time;
mkdir t1
How to submit a batch job
sbatch myscript.shHow to check the list of jobs of a user
squeue -u user1
squeue -u user1 -l # it will show in details
How to check the whole history and status of a job
How to use one particular node in interactive mode. Useful when all jobs are pending and you need to run a job
srun --pty --time=5:00:00 bash
How to kill job
- Cancel job 1234 along with all of its steps:
- scancel 1234
- Send SIGKILL to all steps of job 1235, but do not cancel the job itself:
- scancel --signal=KILL 1235
- Send SIGUSR1 to the batch shell processes of job 1236:
- scancel --signal=USR1 --batch 1236
- Cancel job all pending jobs belonging to user "bob" in partition "debug":
- scancel --state=PENDING --user=bob --partition=debug
- Cancel only array ID 4 of job array 1237
- scancel 1237_4
How to start a node in interactive mode
srun --pty --nodes=1 --exclusive --partition=interactive bash -l
How to start a GUI in cluster
You need xserver to access the GUI
1. Login to cluster
ssh -Y username@login.cbrc.kaust.edu.sa
ssh -Y username@login.cbrc.kaust.edu.sa
Next, you'll need to get an interactive jobs started, for example to get a whole node in the interactive queue:
2. Open an interactive node
srun --pty --nodes=1 --exclusive --partition=interactive bash -l
3. Suppose you want to use matlab in GUI, then do following two commands
module load matlab
matlab &
Full list of SLURM commands [ Source: http://www.tchpc.tcd.ie/node/129 ]
Man pages exist for all SLURM daemons, commands, and API functions. The command option --help also provides a brief summary of options. Note that the command options are all case insensitive.- sacct is used to report job or job step accounting information about active or completed jobs.
- salloc is used to allocate resources for a job in real time. Typically this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.
- sattach is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times.
- sbatch is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
- sbcast is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.
- scancel is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
- scontrol is the administrative tool used to view and/or modify SLURM state. Note that many scontrol commands can only be executed as user root.
- sinfo reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options.
- smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
- squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
- srun is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
- smap reports state information for jobs, partitions, and nodes managed by SLURM, but graphically displays the information to reflect network topology.
- strigger is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.
- sview is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by SLURM.
Handling multiple jobs by Job Array:
http://slurm.schedmd.com/job_array.html
Hi, I really loved reading this article. By this article i have learnt many things about OBIEE QAs, please keep me updating if there is any update.
ReplyDeleteTeradata Online Training
Teradata Training
Teradata Online Course keep updating.........