Writing a SLURM Submission Script
TLDR
Create a submission script with #SBATCH directives to specify resources, and include the commands to run your job.
Submitting jobs to the compute nodes on Turing requires a SLURM submission script. This script tells the scheduler what resources your job needs and what commands to execute.
Why Use Submission Scripts?
Submission scripts let you clearly specify the resources you want without long command line arguments, and allow you to manage your projects significantly easier than without.
Understanding Submission Scripts
A submission script has two main parts:
- Resource Specifications: Using
#SBATCHdirectives to describe the resources and properties required for your job. - Job Commands: The actual commands or scripts that will be executed on the compute nodes.
This information is expanded upon in our SLURM Guide,
Basic Submission Script
In the simplest case, you could omit all #SBATCH options, but it's recommended to include some basic directives to ensure your job runs effectively.
Here is an example of a basic submission script:
#!/bin/bash
#SBATCH --nodes 1 # (1)
#SBATCH --cpus-per-task 2 # (2)
#SBATCH --mem=8g # (3)
#SBATCH --job-name "Hello World Job" # (4)
#SBATCH --partition short # (5)
#SBATCH --time 0-12:00:00 # (6)
echo "Hello World" # (7)!
#SBATCH --nodes 1: Request 1 node for the job.#SBATCH --cpus-per-task 2: Request 2 CPU cores per process.#SBATCH --mem=8g: Request 8 GB of memory.#SBATCH --job-name "Hello World Job": Set the job name to "Hello World Job".#SBATCH --partition short: Submit the job to the short partition.#SBATCH --time 0-12:00:00: Set the maximum runtime to 12 hours.- If the job hasn't completed within this time, it will be terminated.
echo "Hello World": The script content that will be executed on the compute node.
Using Turing in a Class
If you are using Turing as part of a class, you must submit your jobs to the academic partition. Jobs submitted as part of a class are also limited to one GPU at a time.
Available Partitions
-
short: For jobs requiring less than 24 hours of runtime. This should be your default unless you need to use academic for a class. If your job can't run in 24 hours consider requesting more resources. If the required resources would be too large then you should use the long partition.
-
long: For jobs requiring more than 24 hours, with a default runtime of 3 days. Can be extended up to a maximum of 1 week.
-
academic: Reserved for students using Turing as part of a class.
Submission Script for GPU Use
If your job requires GPUs, you'll need to include additional directives in your submission script.
Example GPU submission script:
#!/bin/bash
#SBATCH --nodes 1 # (1)
#SBATCH -cpus-per-task 8 # (2)
#SBATCH --mem=8g # (3)
#SBATCH --job-name "Example GPU Job" # (4)
#SBATCH --partition short # (5)
#SBATCH --time 0-12:00:00 # (6)
#SBATCH --gres=gpu:2 # (7)
#SBATCH --constraint="A100|V100" # (8)
module load python # (9)
module load cuda/12.2 # (10)
python my_script_name.py # (11)
#SBATCH --nodes 1: Request 1 node for the job.#SBATCH -cpus-per-task 8: Request 8 CPU cores.#SBATCH --mem=8g: Request 8 GiB of memory.#SBATCH --job-name "Example GPU Job": Set the job name to "Example GPU Job".#SBATCH --partition short: Submit the job to the short partition.#SBATCH --time 0-12:00:00: Set the maximum runtime to 12 hours.#SBATCH --gres=gpu:2: Request 2 GPUs.#SBATCH --constraint="A100|V100": [Optional] Specify GPU types, limiting to A100 or V100.module load python: Load the latest stable version of Python.- For more information, see Software.
module load cuda/12.2: Load the CUDA 12.2 toolkit, providing access to required GPU drivers.python my_script_name.py: Run your Python script.
Available GPUs can be found in our About Page.
Important Notes
- Resource Requests: Be mindful of the resources you request. Overestimating can lead to longer wait times; underestimating can cause your job to fail.
- Time Limits: Setting the
--timedirective helps the scheduler optimize resource allocation.