Storing Data
TLDR
- Short-term files that can be auto-created, redownloaded, or are only needed for a short duration can be stored in
/scratch/. - Long-term files that you do not need to access often but need to hold onto for a long time should be stored in
/archive/. - Everything else can be stored in your home directory, where you login to.
Where to Put Your Data
In Turing, there are three places where you can put your data:
- Home Directories: Used for general project data storage
- Scratch: Used for short-term, temporary file storage
- Archive: Used for bulk file storage
When to use Home Directories
Home directories are located at /home/${USER}/, and are the default directory
you are placed in when you log in to Turing. If you aren't sure where your data
should go, put it here! Home directories are frequently backed up, and snapshots
are taken daily.
When to use Scratch
Scratch directories are located at /scratch/${USER}/, and should be used for
storing large amounds of data or files that don't need to be backed up. For
example, a simulation job might produce regular checkpoints in order to resume
in case of an error. Another common use case is for storing large datasets which
are publicly available, and can be re-downloaded if needed. If these files are
created in the home directories, they will be snapshotted and backed up,
potentially for years. If you know your data doesn't need that level of
protection, use scratch. Because scratch is used for transient storage, we may
either delete or archive data on scratch if it hasn't been used in a while.
When to use Archive
Archive is located at /archive/${USER}, and is used for long-term bulk
storage. It has slower performance than home directories, but much more
available space. Data here is snapshotted just like home directories. It is a
perfect place to put old projects or datasets you aren't actively working on,
but might need in the future. However, data stored here should not be used
directly for running jobs. As such, archive is only available on the login
nodes, not the compute or GPU nodes. If you need to use archived data in a job,
please move it to your home directory or copy it to scratch first. Because
archive is meant for long-term storage, we may group your files into a
compressed zip file if they haven't been used in a while to save space.