SLURM Workflows

Repeating jobs with job arrays

SLURM has a feature that can helps when the goal is to repeat the same job many times: job arrays.

The sbatch option to use is --array. Example usage:

#!/usr/bin/env bash
#SBATCH --job-name='job-array-example'
#SBATCH --time=0-00:05:00
#SBATCH --tasks=1

#SBATCH --array=0-49
#SBATCH --output=slurm-%A_%a.out
#SBATCH --error=slurm-%A_%a.err

# make sure that directories "temporary" and "results" exist
mkdir -p results

# this constructs job-000, job-001, ..., from the SLURM_ARRAY_TASK_ID
name=$(printf "job-%03d" "${SLURM_ARRAY_TASK_ID}")

input_file="input/${name}.txt"
result_file="results/${name}.txt"

# step 1: convert to grayscale and threshold
./process "${input_file}" "${result_file}"

(Adapted from here)

Job chaining using SLURM dependencies

Performing all the steps of a workflow from scratch in a single slurm job might waste resources if the steps differ in the amount of resource needed.

Jobs can be chained using slurm dependencies:

  • Use sbatch --dependency option to launch the dependent jobs

  • Use sbatch --parsable and command substitution to obtain automatically the job ID for the “upstream” jobs

Job chaining by recursive sbatch invocations

Workflow managers

If your workflow are sufficiently complicated, managing the complexity inside a bash script using SLURM dependencies might be limiting.

What happens if a step in your workfow fails? What happens if some of your input data changes? What steps do you need to run again?

Tools like snakemake or nextflow (but there are many others) might make your life easier in case of complex worflows.