Running Jobs on Compute Nodes Using SLURM
TLDR
Use SLURM commands (srun
, sbatch
) to run your computational jobs on compute nodes. slurm docs
To perform heavy computations on Turing, you'll use the SLURM job scheduler to run tasks on the compute nodes.
🎯 What is SLURM?
SLURM (Simple Linux Utility for Resource Management) is a workload manager that handles job scheduling and resource allocation on the cluster.
If the login node is the library's front desk, SLURM acts as the librarian who organizes and manages resource access. SLURM ensures you receive the computational resources you need on the cluster. To access these resources, you must submit requests through SLURM, which evaluates and allocates resources based on availability and efficiency. This process helps maximize the cluster's overall performance while meeting user needs as effectively as possible.
🏃♀️ Running Jobs with SLURM
Interactive Jobs with sinteractive
-
Purpose: Start an interactive session on a compute node to try things out.
-
Usage:
- What It Does: Allocates resources and provides a shell prompt on a compute node for you to run commands interactively.
sinteractive vs srun
sinteractive
is a thin wrapper around srun --pty /bin/bash
that simplifies resource requests.
Batch Jobs with sbatch
- Purpose: Use sbatch to run jobs non-interactively, freeing your terminal and ensuring efficient, repeatable execution in the background.
- Usage:
- What It Does: Queues your job script for execution when resources become available.
We'll cover writing submission scripts in detail in the next section.
🤔 Why Use SLURM?
Understanding how to use SLURM ensures:
- Efficient Resource Use: Optimizes cluster performance by allocating appropriate resources.
- Fair Scheduling: Ensures equitable resource distribution among all users.
- Job Management: Provides tools to monitor and control your jobs effectively.