Writing a SLURM Submission Script
TLDR
Create a submission script with #SBATCH
directives to specify resources, and include the commands to run your job.
Submitting jobs to the compute nodes on Turing requires a SLURM submission script. This script tells the scheduler what resources your job needs and what commands to execute.
📝 Understanding Submission Scripts
A submission script has two main parts:
- Resource Specifications: Using
#SBATCH
directives to describe the resources and properties required for your job. - Job Commands: The actual commands or scripts that will be executed on the compute nodes.
🔹 Basic Submission Script
In the simplest case, you could omit all #SBATCH
options, but it's recommended to include some basic directives to ensure your job runs effectively.
Here is an example of a basic submission script:
#!/bin/bash
#SBATCH -N 1 # (1)
#SBATCH -n 2 # (2)
#SBATCH --mem=8g # (3)
#SBATCH -J "Hello World Job" # (4)
#SBATCH -p short # (5)
#SBATCH -t 12:00:00 # (6)
echo "Hello World" # (7)!
#SBATCH -N 1
: Request 1 node for the job.#SBATCH -n 2
: Request 2 CPU cores.#SBATCH --mem=8g
: Request 8 GiB of memory.#SBATCH -J "Hello World Job"
: Set the job name to "Hello World Job".#SBATCH -p short
: Submit the job to the short partition.#SBATCH -t 12:00:00
: Set the maximum runtime to 12 hours.- If the job hasn't completed within this time, it will be terminated.
echo "Hello World"
: The script content that will be executed on the compute node.
Using Turing in a Class
If you are using Turing as part of a class, you must submit your jobs to the academic partition. Jobs submitted as part of a class are also limited to one GPU at a time.
Available Partitions
-
short For jobs requiring less than 24 hours of runtime. This should be your default unless you need to use academic for a class. If your job cant run in 24 hours consider requesting more resources. If the required resources would be too large then you should use the long partition.
-
long For jobs requiring more than 24 hours, with a default runtime of 3 days. Can be extended up to a maximum of 1 week.
-
academic Reserved for students using Turing as part of a class.
🔹 Submission Script for GPU Use
If your job requires GPUs, you'll need to include additional directives in your submission script.
Example GPU submission script:
#!/bin/bash
#SBATCH -N 1 # (1)
#SBATCH -n 8 # (2)
#SBATCH --mem=8g # (3)
#SBATCH -J "Example GPU Job" # (4)
#SBATCH -p short # (5)
#SBATCH -t 12:00:00 # (6)
#SBATCH --gres=gpu:2 # (7)
#SBATCH -C "A100|V100" # (8)
module load python # (9)
module load cuda/12.2 # (10)
python my_script_name.py # (11)
#SBATCH -N 1
: Request 1 node for the job.#SBATCH -n 8
: Request 8 CPU cores.#SBATCH --mem=8g
: Request 8 GiB of memory.#SBATCH -J "Example GPU Job"
: Set the job name to "Example GPU Job".#SBATCH -p short
: Submit the job to the short partition.#SBATCH -t 12:00:00
: Set the maximum runtime to 12 hours.#SBATCH --gres=gpu:2
: Request 2 GPUs.#SBATCH -C "A100|V100"
: [Optional] Specify GPU types, limiting to A100 or V100.module load python
: Load the latest stable version of Python.- For more information, see Software.
module load cuda/12.2
: Load the CUDA 12.2 toolkit, providing access to required GPU drivers.python my_script_name.py
: Run your Python script.
Available GPUs
H200
, A100-80G
, H100
, L40S
, A100
, V100
, P100
, A30
⚠️ Important Notes
- Resource Requests: Be mindful of the resources you request. Overestimating can lead to longer wait times; underestimating can cause your job to fail.
- Time Limits: Setting the
--time
directive helps the scheduler optimize resource allocation.
🤔 Why Use Submission Scripts?
- Automation: Scripts allow you to run complex jobs without manual intervention.
- Reproducibility: Easily rerun jobs with consistent settings.
- Resource Management: Specify exactly what your job needs, helping the scheduler optimize cluster usage.