TACC Longstar6 Instruction

6 minute read

Published:

This is the quick start instruction for beginners on Texas Advanced Computing Center (TACC) Longstar6 High Performance Computing (HPC) Systems. For more details information, please refer to the official website.

At first, you should have an account for TACC. Log In or Create Account.

Remember your username and password. Only when you have/are added to certain project, you have access to corresponding system. For this document, we use Longstar6 system.

0. Not the First Time Cheat Sheet

  1. Login: ssh <username>@ls6.tacc.utexas.edu
  2. Go to work directory: cd $WORK
  3. File Transfer: Use scp or rsync
  4. Environment setup: source ~/.bashrc
  5. Interactive Job Session: srun --partition=gpu-a100-dev --nodes=1 --time=00:30:00 --ntasks=1 --pty bash

1. Access the System

The ssh command (SSH protocol) is the standard way to connect to Lonestar6 (ls6.tacc.utexas.edu). SSH also includes support for the file transfer utilities scp and sftp.

The Linux command line:

localhost$ ssh <username>@ls6.tacc.utexas.edu

The above command will rotate connections across all available login nodes, login1-login3, and route your connection to one of them. To connect to a specific login node, use its full domain name:

localhost$ ssh <username>@login2.ls6.tacc.utexas.edu

To connect with X11 support on Lonestar6 (usually required for applications with graphical user interfaces), use the -X or -Y switch:

localhost$ ssh -X <username>@ls6.tacc.utexas.edu

To report a connection problem, execute the ssh command with the -vvv option and include the verbose output when submitting a help ticket. Do not run the ssh-keygen command on Lonestar6.

SSH Config Example:

Host TACC
    HostName ls6.tacc.utexas.edu
    User <username>

When connecting, you’ll be asked to enter your password and TACC Token Code.

To access the system:

1) If not using ssh-keys, please enter your TACC password at the password prompt
2) At the TACC Token prompt, enter your 6-digit code followed by <return>.

If you are facing issues logging in, please use our login wizard at
https://accounts.tacc.utexas.edu/login_support to troubleshoot.

(<username>@ls6.tacc.utexas.edu) Password: <your password>
(<username>@ls6.tacc.utexas.edu) TACC Token Code: <Duo Mobile Token>

2. Working Directory

Lonestar6’s startup mechanisms define corresponding account-level environment variables $HOME, $SCRATCH and $WORK that store the paths to directories that you own on each of these file systems.

Your home directory $HOME do not have enough space, go to your account-specific working directory $WORK once you login.

$ cd $WORK
$ pwd
/work/<number>/<username>/ls6
File SystemQuotaKey Features
$HOME10 GB200,000 files Not intended for parallel or high-intensity file operations. NFS file system Backed up regularly. Overall capacity 7 TB Not purged.
$WORK1 TB 3,000,000 files Across all TACC systemsNot intended for high-intensity file operations or jobs involving very large files. Lustre file system On the Global Shared File System that is mounted on most TACC systems. See Stockyard system description for more information. Defaults: 1 stripe, 1MB stripe size Not backed up. Not purged.
$SCRATCHnoneOverall capacity 8 PB Defaults: 4 targets, 512 KB chunk size Not backed up Files are subject to purge if access time* is more than 10 days old.*
/tmp on nodes288 GBData purged at the end of each job. Access is local to the node. Data in /tmp is not shared across nodes.

3. File Transfer

You can transfer files between Lonestar6 and Linux-based systems using either scp or rsync. Both scp and rsync are available in the Mac Terminal app. Windows SSH clients typically include scp-based file transfer capabilities.

Using scp

scp <local_file> <username>@ls6.tacc.utexas.edu:$WORK

Using rsync (for large or multiple files)

rsync -av <local_dir> <username>@ls6.tacc.utexas.edu:$WORK

For a more user-friendly experience, consider using an application like Termius or cloud storage platforms such as GitHub or Dropbox. For example

$ git clone <github repository link>
$ # Or
$ wget -O <file name> <dropbox link>

4. Environment Setup

Put all your customizations in ~/.bashrc. Take mine as an example, anything that default in $HOME, change it to $WORK

export PYTHONPATH="$WORK/python-packages:$PYTHONPATH"
export NLTK_DATA=$WORK/python-packages/nltk_data
export HF_HOME=$WORK/huggingface_cache/
export PATH=$WORK/python3.11/bin:$PATH
export PIP_CACHE_DIR=$WORK/.cache/pip

After making these changes, run source ~/.bashrc to apply them.

Take your time with this step to ensure your environment is set up correctly. If needed, consider using AI tools for assistance.

5. Running Jobs

Longstar6 uses the Simple Linux Utility for Resource Management (Slurm) batch environment

Queue NameMin/Max Nodes per Job (assoc’d cores)*Max Job DurationMax Nodes per UserMax Jobs per UserCharge Rate (per node-hour)
development4 nodes (512 cores)2 hours611 SU
gpu-a1008 nodes (1024 cores)48 hours1284 SUs
gpu-a100-dev2 nodes (256 cores)2 hours214 SUs
gpu-a100-small1 node48 hours221.5 SUs
gpu-h1001 node48 hours116 SUs
large65/256 nodes (65536 cores)48 hours25611 SU
normal1/64 nodes (8192 cores)48 hours75201 SU
vm-small1/1 node (16 cores)48 hours440.143 SU

* Access to the large queue is restricted. To request more nodes than are available in the normal queue, submit a consulting (help desk) ticket through the TACC User Portal. Include in your request reasonable evidence of your readiness to run under the conditions you’re requesting. In most cases this should include your own strong or weak scaling results from Lonestar6.

** The gpu-a100-small and vm-small queues contain virtual nodes with fewer resources (cores) than the nodes in the other queues.

Copy and customize the following scripts to specify and refine your job’s requirements.

  • specify the maximum run time with the -t option.
  • specify number of nodes needed with the -N option
  • specify total number of MPI tasks with the -n option
  • specify the project to be charged with the -A option.

An example for the Interactive Job Session:

$ srun --partition=gpu-a100-dev --nodes=1 --time=00:30:00 --ntasks=1 --pty bash
$ # Or simply
$ srun -p gpu-a100-dev -N 1 -t 00:30:00 -n 1 --pty bash

Slurm Cheat Sheet

Basic Commands

CommandDescription
sinfoShow partitions and node status.
squeueList queued and running jobs.
squeue -u $USERShow your jobs.
sbatch script.shSubmit a batch job.
srunRun a command or script interactively.
scancel JOBIDCancel a job.
sacctShow completed jobs.
scontrol show job JOBIDDetailed job info.
scontrol show node NODENAMENode details.
seff JOBIDJob efficiency details (if available).

Job Monitoring

CommandDescription
squeueList all jobs.
squeue -u $USERYour jobs only.
scontrol show job JOBIDDetailed job info.
sacct -u $USERCompleted job history.
sacct -j JOBID --format=JobID,JobName,State,ElapsedSummary for a job.

Resource Specifications

OptionDescription
--nodes=2Request 2 nodes.
--ntasks=4Request 4 tasks (MPI).
--cpus-per-task=2Request 2 CPU cores per task (OpenMP).
--gres=gpu:2Request 2 GPUs per node.
--mem=16GRequest 16 GB RAM per node.
--time=02:00:00Max runtime of 2 hours.
--partition=gpu-a100Specify partition/queue.
--account=your_accountCharge to a specific account.

References

  1. Longstar6 User Guide
  2. Slurm Workload Manager