Storing Data

TLDR

Short-term files that can be auto-created, redownloaded, or are only needed for a short duration can be stored in /scratch/.
Long-term files that you do not need to access often but need to hold onto for a long time should be stored in /archive/.
Everything else can be stored in your home directory, where you login to.

Where to Put Your Data

In Turing, there are three places where you can put your data:

Home Directories: Used for general project data storage
Scratch: Used for short-term, temporary file storage
Archive: Used for bulk file storage

When to use Home Directories

Home directories are located at /home/${USER}/, and are the default directory you are placed in when you log in to Turing. If you aren't sure where your data should go, put it here! Home directories are frequently backed up, and snapshots are taken daily.

When to use Scratch

Scratch directories are located at /scratch/${USER}/, and should be used for storing large amounds of data or files that don't need to be backed up. For example, a simulation job might produce regular checkpoints in order to resume in case of an error. Another common use case is for storing large datasets which are publicly available, and can be re-downloaded if needed. If these files are created in the home directories, they will be snapshotted and backed up, potentially for years. If you know your data doesn't need that level of protection, use scratch. Because scratch is used for transient storage, we may either delete or archive data on scratch if it hasn't been used in a while.

When to use Archive

Archive is located at /archive/${USER}, and is used for long-term bulk storage. It has slower performance than home directories, but much more available space. Data here is snapshotted just like home directories. It is a perfect place to put old projects or datasets you aren't actively working on, but might need in the future. However, data stored here should not be used directly for running jobs. As such, archive is only available on the login nodes, not the compute or GPU nodes. If you need to use archived data in a job, please move it to your home directory or copy it to scratch first. Because archive is meant for long-term storage, we may group your files into a compressed zip file if they haven't been used in a while to save space.