Skip to content

Requesting Resources

TLDR

Remember, Turing is a shared resource, so only use what you absolutely need!

Resource Caps

Depending on what partition your job is running on, you are limited in the number of resources you can request. They can be used up in a single job, or by running multiple jobs.

You can view them by running sacctmgr show qos format={Name,MaxTRESPU} -p

As of April 2026, the resource limits are as follows:

Partition CPUs GPUs RAM TIME LIMIT
Academic 64 2 250G 48 HR
Short 1024 12 8T 24 HR
Long 256 4 3T 7 DAYS

While it may be tempting to just request the maximum resources, it's important to stress that requesting more resources than you need will increase the time it takes for your job to even begin. In order to get 1024 CPU cores on the short partition, there would have to be enough nodes on the partition that are not in use by anyone else to request that many. This could take weeks. Even requesting 256 could take many hours- and odds are you won't even be using all of them!

Remember that the Turing Cluster is a shared resource; only request what you need. With that said, we would rather you use a lot of resources to speed up your job than use less resources and hold nodes for extended periods of time.

Please note that, unless otherwise specified, requested resources are per node, meaning requesting 2 nodes and requesting 2 GPUs will result in a total of 4 GPUs.

Common Pitfalls

  • Requesting many more GPUs than necessary to run your model
    • This can actually slow your model in addition to increasing the wait time. Most default GPU optimizations split your model across the GPUs. This is great for fitting large models on GPUs that can't support them alone, but it also increases communication bottlenecks as data gets passed from one GPU to the other.
  • Requesting more CPUs than necessary
    • Some programs do need a lot of CPUs to parallelize computation. But unless you are using CPU-intensive code, you wont benefit from more spare computational power.
  • Requesting large amounts of RAM
    • This is arguably the most difficult resource to balance. If you get OOM errors, then requesting more is the best option assuming there are no issues with your code. Our advice is start small and work up gradually until you fix the issue.

Solutions

Determining What You Need

The first step to requesting resources is getting a good estimate of what you need to begin with. If you need to run an LLM locally, consider its size and the size of your data. If you are running software, look at the minimum specifications its documentation provides. From there, increase your requested resources until you are satisfied with Turing's performance.

Checking Performance

If you aren't sure how well your project is utilizing the requested resources, you can track it by using the following command:

  • seff <job id>: reviews CPU and memory usage.

If you notice that seff is saying you are only using 10% of your CPUs, cut back on the amount you are requesting!

Fully utilizing your resources

Make sure you look into specifying your code to use all your available resources to your desired degree. For example, PyTorch does automatically detect multiple GPUs and spread loaded AI models across them given a basic device specification, it doesn't use all the GPUs at the same time, thus only slowing your code down instead of speeding it up. Even for CPUs, you should check to see if you can find a way to maximize utilization as needed for your project.