Running Jobs on Compute Nodes Using SLURM

TLDR

Use SLURM commands (srun, sbatch) to run your computational jobs on compute nodes. slurm docs

To perform heavy computations on Turing, you'll use the SLURM job scheduler to run tasks on the compute nodes.

🎯 What is SLURM?

SLURM (Simple Linux Utility for Resource Management) is a workload manager that handles job scheduling and resource allocation on the cluster.

If the login node is the library's front desk, SLURM acts as the librarian who organizes and manages resource access. SLURM ensures you receive the computational resources you need on the cluster. To access these resources, you must submit requests through SLURM, which evaluates and allocates resources based on availability and efficiency. This process helps maximize the cluster's overall performance while meeting user needs as effectively as possible.

🏃‍♀️ Running Jobs with SLURM

Interactive Jobs with `sinteractive`

Purpose: Start an interactive session on a compute node to try things out.
Usage:

sinteractive

What It Does: Allocates resources and provides a shell prompt on a compute node for you to run commands interactively.

sinteractive vs srun

sinteractive is a thin wrapper around srun --pty /bin/bash that simplifies resource requests.

Batch Jobs with `sbatch`

Purpose: Use sbatch to run jobs non-interactively, freeing your terminal and ensuring efficient, repeatable execution in the background.
Usage:

sbatch your_script.sh

What It Does: Queues your job script for execution when resources become available.

We'll cover writing submission scripts in detail in the next section.

🤔 Why Use SLURM?

Understanding how to use SLURM ensures:

Efficient Resource Use: Optimizes cluster performance by allocating appropriate resources.
Fair Scheduling: Ensures equitable resource distribution among all users.
Job Management: Provides tools to monitor and control your jobs effectively.