Submitting Jobs via Slurm
There are two ways you can submit jobs to a cluster: by using workflows or through any terminal or command-line interface. For the workflows option, please see Running Workflows.
After you’ve started a cluster, log in to the controller with your preferred method. The quickest way to submit a job is to transfer your file(s) to the cluster, then run the command sbatch
.
In this example, we submitted the file demo_test1.sbatch
with sbatch
:
[demo@democluster-60 ~]$ ls
demo_test1.sbatch
[demo@democluster-60 ~]$ sbatch demo_test1.sbatch
Submitted batch job 2
After submitting a job, you can watch its progress with the command watch squeue
, which will update every two seconds with the job's status in the ST
column:
Every 2.0s: squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4 test.part test demo CF 0:08 2 demo-democluster-00060-1-[0001-0002]
You can also use watch 'sinfo;echo;squeue'
if you want to see general cluster information in addition to your job's progress:
Every 2.0s: sinfo; echo; squeue
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test.partition1* up infinite 2 mix# demo-democluster-00060-1-[0001-0002]
test.partition1* up infinite 3 idle~ demo-democluster-00060-1-[0003-0005]
test.partition2 up infinite 5 idle~ demo-democluster-00060-2-[0001-0005]
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4 test.part test demo CF 0:26 2 demo-democluster-00060-1-[0001-0002]
When using watch squeue
or watch 'sinfo;echo;squeue'
, the ST
column will show CF
while the node(s) configure. All of the rows beneath JOBID
will clear when your job is finished:
Every 2.0s: sinfo; echo; squeue
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test.partition1* up infinite 2 idle% demo-democluster-00060-1-[0001-0002]
test.partition1* up infinite 3 idle~ demo-democluster-00060-1-[0003-0005]
test.partition2 up infinite 5 idle~ demo-democluster-00060-2-[0001-0005]
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Once the job is finished, you can check its output with cat file_name
. Our file demo_test1.sbatch
inlcuded instructions to send our completed job's data to an std.out
file and any errors to an std.err
file:
[demo@democluster-60 ~]$ ls
demo_test1.sbatch std.err std.out
[demo@democluster-60 ~]$ cat std.err
[demo@democluster-60 ~]$ cat std.out
demo-democluster-00060-1-0001
demo-democluster-00060-1-0001
demo-democluster-00060-1-0002
demo-democluster-00060-1-0002
Using cat std.err
didn’t return anything because the job executed without errors.
Common Slurm Commands
This section gives a quick overview of the commands you’ll use most often when interacting with clusters. You can use any of these commands in any terminal after logging in to a controller node.
Because ACTIVATE uses Slurm to manage jobs, you can use any of their system commands. For an extensive list of those options, see Slurm’s command guide. You can also enter man
in front of any command (such as man sacct
) to see its description and a list of other available commands in Slurm’s virtual manual.
When we say “job ID” in this section, we mean the job ID that Slurm assigns to your work, which will appear when running many of these commands. ID numbers in the Worflow Monitor and the jobs folder on the ACTIVATE platform act as a separate identifier to help us track how many jobs we’ve ever run on the platform.
Using any of the commands in this section will generate a new Slurm job ID.
Fault tolerance is defined by how well an infrastructure remains functional or online even when there are service disruptions because of outages or natural disasters.
On ACTIVATE, cluster deletions are queue-based for fault tolerance.
The cluster startup process has no retries for fault tolerance, but the logs are visible so users can see any problems that occur.
For compute node startup requests, fault tolerance is implemented with retries via Slurm (by default, there is a new startup attempt approximately every 20 minutes).
Job Management
salloc
salloc
retrieves resources for your job without executing any tasks.
Using this command retrieves resources before you need them by signaling the system to reserve a specified number of nodes. For example, salloc -N 2
will reserve two compute nodes, for a total of three nodes, including the controller.
salloc
is useful if you’re sharing a cluster with other users in your organization: using this command means that once a job is finished, the allocated nodes will remain on reserve for your use until you disconnect from the cluster (meaning that your wait times will be shorter because another user cannot take control of your allocated nodes, so you won’t have to wait for more nodes to become available or wait for them to start once they’re available).
sbatch
sbatch
submits a job script that will execute later. You can also configure nodes with sbatch
by adding these options:
--n-tasks-per-node
to specify the number of CPUs-t
to specify the maximum amount of time you want these resources to run with the format of0:0:0
for hours, minutes, and seconds
For example, sbatch demo_test1.sbatch --n-tasks-per-node 5 -t 3:0:0
would run the file demo_test1.sbatch
and request 5 CPUs for 3 hours of maximum run time.
srun
srun
executes a job script. You can use the same options from salloc
and sbatch
with srun
:
-N
to specify the number of nodes--n-tasks-per-node
to specify the number of CPUs-t
to specify the maximum amount of time you want these resources to run
For example, srun -N 1 --pty bash
would request 1 compute node and open a pseudoterminal, creating an interactive command-line session.
scancel
scancel
paired with a job ID ends a pending or running job or job step. For example:
[demo@democluster-60 ~]$ sbatch demo_test1.sbatch
Submitted batch job 6
[demo@democluster-60 ~]$ scancel
scancel: error: No job identification provided
[demo@democluster-60 ~]$ scancel 6
If you cancel a job, it will disappear from your queue.
Cluster Management
sinfo
sinfo
shows information about the nodes and partitions you’re using. By default, sinfo
displays partition names, availability, time limit, the number of nodes, state, and the node’s ID number (which is displayed as username-democluster-00019-1-[0001-0005]
).
- Please note that if you enter
sinfo
without setting up partitions, you’ll receive the error messageslurm_load_partitions: Unable to contact slurm controller (connect failure)
.
squeue
squeue
shows a list of running and pending jobs. By default, squeue
shows job ID number, partition, username, job status, number of nodes, and node names for all queued and running jobs. You can also use these commands to adjust squeue
’s output:
--user
to see only one user’s jobs, such as--user=yourPWusername
--long
to show non-abbreviated information and add the fieldtimelimit
--start
to estimate a job’s start time
Notification Management
This section applies only to cloud clusters, not on-premises clusters.
By default, cloud clusters will send job start/finish notifications to ACTIVATE. You can change that setting or add it as an email notification by following the steps in Managing Notifications.
To enable additional job status notifications, you can also pair the flag --mail-type
with the commands salloc
, sbatch
, or srun
. For example, the command sbatch --mail-type=FAIL exampleScript.sbatch
will send a notification if your job fails to start or complete.
You can add multiple notification events to the --mail-type
flag at once and separate them with commas: sbatch --mail-type=BEGIN,END exampleScript.sbatch
Alternatively, you can add tags inside a Slurm batch file, as seen in this example:
#!/bin/bash
#SBATCH --mail-type=BEGIN,END
echo "Hello, World!"
Both methods above work equally well.
The primary difference is that entering the flag and notification event(s) outside the file will override any settings inside of your batch script, but will not cause anything to be written into the file.
Notification Events
The table below lists the events currently supported by the --mail-type
flag.
Type | Notification Event |
---|---|
ALL | equivalent to BEGIN,END,FAIL,INVALID_DEPEND,REQUEUE,STAGE_OUT |
NONE | does not send notifications; this is the default |
BEGIN | job start |
END | job end |
FAIL | job failure |
REQUEUE | job is requeued |
INVALID_DEPEND | a job’s dependency cannot be satisfied, so the job will not run |
STAGE_OUT | when a job has completed or been cancelled, but has not yet released its resources |
TIME_LIMIT_50 | when a job reaches 50% of its walltime* limit |
TIME_LIMIT_80 | when a job reaches 80% of its walltime* limit |
TIME_LIMIT_90 | when a job reaches 90% of its walltime* limit |
TIME_LIMIT | when a job reaches its walltime* limit |
ARRAY_TASKS | sends other option notifications for each array task instead of for the array as a whole; without this option, BEGIN , END , and FAIL notifications will only notify once for the full array instead of sending a notification for each individual array task |
*The walltime limit is the user set limit for how long a job can run.
Please note that walltime limits are infinite by default. A walltime limit can be added when starting a job.
Troubleshooting
sacct
sacct
shows a summary of users as well as completed and running jobs. Using this command will display a table with a job’s ID number, name, partition, status, exit code, whose account it’s running on, and how many CPUs it’s using.
For troubleshooting purposes, the State
and ExitCode
fields from running sacct
are especially useful for determining whether a node has failed and, if so, why. If you reach out to us for help, one of our support engineers may ask you for the information you see after running sacct
.
scontrol
scontrol
can delegate commands to specific job IDs and nodes. Please note that many scontrol
commands can only be executed as user root. You can use these commands with a job ID to adjust scontrol
’s output:
suspend
to pause a job's processesresume
to continue a job's processeshold
to make a job a lower priority, putting it “on hold” so higher priority jobs will run firstrelease
to remove a job from the hold listshow job
to get detailed information about a job