Submitting Jobs via Slurm
There are two ways you can submit jobs to a cluster: by using workflows or through any terminal or command-line interface. For the workflows option, please see Running Workflows.
After you’ve started a cluster, log in to the controller with your preferred method. The quickest way to submit a job is to transfer your file(s) to the cluster, then run the command sbatch.
In this example, we submitted the file demo_test1.sbatch with sbatch:
[demo@democluster-60 ~]$ ls
demo_test1.sbatch
[demo@democluster-60 ~]$ sbatch demo_test1.sbatch
Submitted batch job 2
After submitting a job, you can watch its progress with the command watch squeue, which will update every two seconds with the job's status in the ST column:
Every 2.0s: squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4 test.part test demo CF 0:08 2 demo-democluster-00060-1-[0001-0002]
You can also use watch 'sinfo;echo;squeue' if you want to see general cluster information in addition to your job's progress:
Every 2.0s: sinfo; echo; squeue
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test.partition1* up infinite 2 mix# demo-democluster-00060-1-[0001-0002]
test.partition1* up infinite 3 idle~ demo-democluster-00060-1-[0003-0005]
test.partition2 up infinite 5 idle~ demo-democluster-00060-2-[0001-0005]
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4 test.part test demo CF 0:26 2 demo-democluster-00060-1-[0001-0002]
When using watch squeue or watch 'sinfo;echo;squeue', the ST column will show CF while the node(s) configure. All of the rows beneath JOBID will clear when your job is finished:
Every 2.0s: sinfo; echo; squeue
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test.partition1* up infinite 2 idle% demo-democluster-00060-1-[0001-0002]
test.partition1* up infinite 3 idle~ demo-democluster-00060-1-[0003-0005]
test.partition2 up infinite 5 idle~ demo-democluster-00060-2-[0001-0005]
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Once the job is finished, you can check its output with cat file_name. Our file demo_test1.sbatch inlcuded instructions to send our completed job's data to an std.out file and any errors to an std.err file:
[demo@democluster-60 ~]$ ls
demo_test1.sbatch std.err std.out
[demo@democluster-60 ~]$ cat std.err
[demo@democluster-60 ~]$ cat std.out
demo-democluster-00060-1-0001
demo-democluster-00060-1-0001
demo-democluster-00060-1-0002
demo-democluster-00060-1-0002
Using cat std.err didn’t return anything because the job executed without errors.
Common Slurm Commands
This section gives a quick overview of the commands you’ll use most often when interacting with clusters. You can use any of these commands in any terminal after logging in to a controller node.
Because ACTIVATE uses Slurm to manage jobs, you can use any of their system commands. For an extensive list of those options, see Slurm’s command guide. You can also enter man in front of any command (such as man sacct) to see its description and a list of other available commands in Slurm’s virtual manual.
When we say “job ID” in this section, we mean the job ID that Slurm assigns to your work, which will appear when running many of these commands. ID numbers in the Worflow Monitor and the jobs folder on the ACTIVATE platform act as a separate identifier to help us track how many jobs we’ve ever run on the platform.
Using any of the commands in this section will generate a new Slurm job ID.
Fault tolerance is defined by how well an infrastructure remains functional or online even when there are service disruptions because of outages or natural disasters.
On ACTIVATE, cluster deletions are queue-based for fault tolerance.
The cluster startup process has no retries for fault tolerance, but the logs are visible so users can see any problems that occur.
For compute node startup requests, fault tolerance is implemented with retries via Slurm (by default, there is a new startup attempt approximately every 20 minutes).
Job Management
salloc
salloc retrieves resources for your job without executing any tasks.
Using this command retrieves resources before you need them by signaling the system to reserve a specified number of nodes. For example, salloc -N 2 will reserve two compute nodes, for a total of three nodes, including the controller.
salloc is useful if you’re sharing a cluster with other users in your organization: using this command means that once a job is finished, the allocated nodes will remain on reserve for your use until you disconnect from the cluster (meaning that your wait times will be shorter because another user cannot take control of your allocated nodes, so you won’t have to wait for more nodes to become available or wait for them to start once they’re available).
sbatch
sbatch submits a job script that will execute later. You can also configure nodes with sbatch by adding these options:
--n-tasks-per-nodeto specify the number of CPUs-tto specify the maximum amount of time you want these resources to run with the format of0:0:0for hours, minutes, and seconds
For example, sbatch demo_test1.sbatch --n-tasks-per-node 5 -t 3:0:0 would run the file demo_test1.sbatch and request 5 CPUs for 3 hours of maximum run time.
srun
srun executes a job script. You can use the same options from salloc and sbatch with srun:
-Nto specify the number of nodes--n-tasks-per-nodeto specify the number of CPUs-tto specify the maximum amount of time you want these resources to run
For example, srun -N 1 --pty bash would request 1 compute node and open a pseudoterminal, creating an interactive command-line session.
scancel
scancel paired with a job ID ends a pending or running job or job step. For example:
[demo@democluster-60 ~]$ sbatch demo_test1.sbatch
Submitted batch job 6
[demo@democluster-60 ~]$ scancel
scancel: error: No job identification provided
[demo@democluster-60 ~]$ scancel 6
If you cancel a job, it will disappear from your queue.
Cluster Management
sinfo
sinfo shows information about the nodes and partitions you’re using. By default, sinfo displays partition names, availability, time limit, the number of nodes, state, and the node’s ID number (which is displayed as username-democluster-00019-1-[0001-0005]).
- Please note that if you enter
sinfowithout setting up partitions, you’ll receive the error messageslurm_load_partitions: Unable to contact slurm controller (connect failure).
squeue
squeue shows a list of running and pending jobs. By default, squeue shows job ID number, partition, username, job status, number of nodes, and node names for all queued and running jobs. You can also use these commands to adjust squeue’s output:
--userto see only one user’s jobs, such as--user=yourPWusername--longto show non-abbreviated information and add the fieldtimelimit--startto estimate a job’s start time
Notification Management
This section applies only to cloud clusters, not on-premises clusters.
By default, cloud clusters will send job start/finish notifications to ACTIVATE. You can change that setting or add it as an email notification by following the steps in Managing Notifications.
To enable additional job status notifications, you can also pair the flag --mail-type with the commands salloc, sbatch, or srun. For example, the command sbatch --mail-type=FAIL exampleScript.sbatch will send a notification if your job fails to start or complete.
You can add multiple notification events to the --mail-type flag at once and separate them with commas: sbatch --mail-type=BEGIN,END exampleScript.sbatch
Alternatively, you can add tags inside a Slurm batch file, as seen in this example:
#!/bin/bash
#SBATCH --mail-type=BEGIN,END
echo "Hello, World!"
Both methods above work equally well.
The primary difference is that entering the flag and notification event(s) outside the file will override any settings inside of your batch script, but will not cause anything to be written into the file.
Notification Events
The table below lists the events currently supported by the --mail-type flag.
| Type | Notification Event |
|---|---|
ALL | equivalent to BEGIN,END,FAIL,INVALID_DEPEND,REQUEUE,STAGE_OUT |
NONE | does not send notifications; this is the default |
BEGIN | job start |
END | job end |
FAIL | job failure |
REQUEUE | job is requeued |
INVALID_DEPEND | a job’s dependency cannot be satisfied, so the job will not run |
STAGE_OUT | when a job has completed or been cancelled, but has not yet released its resources |
TIME_LIMIT_50 | when a job reaches 50% of its walltime* limit |
TIME_LIMIT_80 | when a job reaches 80% of its walltime* limit |
TIME_LIMIT_90 | when a job reaches 90% of its walltime* limit |
TIME_LIMIT | when a job reaches its walltime* limit |
ARRAY_TASKS | sends other option notifications for each array task instead of for the array as a whole; without this option, BEGIN, END, and FAIL notifications will only notify once for the full array instead of sending a notification for each individual array task |
*The walltime limit is the user set limit for how long a job can run.
Please note that walltime limits are infinite by default. A walltime limit can be added when starting a job.
Troubleshooting
sacct
sacct shows a summary of users as well as completed and running jobs. Using this command will display a table with a job’s ID number, name, partition, status, exit code, whose account it’s running on, and how many CPUs it’s using.
For troubleshooting purposes, the State and ExitCode fields from running sacct are especially useful for determining whether a node has failed and, if so, why. If you reach out to us for help, one of our support engineers may ask you for the information you see after running sacct.
scontrol
scontrol can delegate commands to specific job IDs and nodes. Please note that many scontrol commands can only be executed as user root. You can use these commands with a job ID to adjust scontrol’s output:
suspendto pause a job's processesresumeto continue a job's processesholdto make a job a lower priority, putting it “on hold” so higher priority jobs will run firstreleaseto remove a job from the hold listshow jobto get detailed information about a job