SLURM Workflows
Repeating jobs with job arrays
SLURM has a feature that can helps when the goal is to repeat the same job many times: job arrays.
The sbatch
option to use is --array
.
Example usage:
#!/usr/bin/env bash
#SBATCH --job-name='job-array-example'
#SBATCH --time=0-00:05:00
#SBATCH --tasks=1
#SBATCH --array=0-49
#SBATCH --output=slurm-%A_%a.out
#SBATCH --error=slurm-%A_%a.err
# make sure that directories "temporary" and "results" exist
mkdir -p results
# this constructs job-000, job-001, ..., from the SLURM_ARRAY_TASK_ID
name=$(printf "job-%03d" "${SLURM_ARRAY_TASK_ID}")
input_file="input/${name}.txt"
result_file="results/${name}.txt"
# step 1: convert to grayscale and threshold
./process "${input_file}" "${result_file}"
(Adapted from here)
Job chaining using SLURM dependencies
Performing all the steps of a workflow from scratch in a single slurm job might waste resources if the steps differ in the amount of resource needed.
Jobs can be chained using slurm dependencies:
Use sbatch
--dependency
option to launch the dependent jobsUse sbatch
--parsable
and command substitution to obtain automatically the job ID for the “upstream” jobs
Job chaining by recursive sbatch
invocations
Workflow managers
If your workflow are sufficiently complicated, managing the complexity inside a bash script using SLURM dependencies might be limiting.
What happens if a step in your workfow fails? What happens if some of your input data changes? What steps do you need to run again?
Tools like snakemake or nextflow (but there are many others) might make your life easier in case of complex worflows.