Introduction+to+HPC+Etiquette

=Tips and tricks covering use and proper calling of various programs at OSC, how to minimize queue time, and police your peers.=

HPC Etiquette on OSC

 * Max jobs in queue (individual user) is 1000
 * Max jobs running (individual user) is 128
 * These jobs may use up to 2048 processor cores
 * On Glenn, a typical node possesses 8 cores, with 24 GB of memory allotted, so calling 8 processors gives 3 GB of memory to each process
 * On Oakley, a typical node possesses 12 cores, with 48 GB of memory allotted
 * Max jobs running (group/project) is 192
 * As a group, you may use no more than 2048 processor cores (on Glenn that's 256 nodes, on Oakley it's around 170-171 nodes)
 * Check any and all submission scripts for accuracy before submitting. Watch as jobs are submitted if possible. If you see more than you expect, kill your submission script.
 * Keep a qdel script on hand just in case (this is used to delete submitted jobs by their id number)
 * I use a basic BASH script where I supply the first number of a job id, then have it iterate through x-number job numbers
 * You cannot delete jobs you didn't submit so don't worry if you iterate over id numbers that don't belong to you
 * By default...
 * Serial jobs can ask for walltime up to 168 hours
 * Parallel jobs can ask for walltime up to 96 hours
 * Email if you need longer, they are very helpful people

HPC Efficiency on OSC

 * When possible, ask for all the processors on a node
 * Occupy as few nodes as possible, unless memory issues require you more sparsely populate nodes
 * Running out of memory? Try asking for more nodes and fewer processors per node (important when using NWChem for us)
 * May significantly increase your queue time

Example submission files and scripts for OSC
code format="bash"
 * **kill_jobs.sh**
 * 1) !/bin/bash


 * 1) Usage:
 * 2) kill_jobs.sh ID_number
 * 3) ID_number (the x's): xxxxxxxx.oak-batch

INIT=$1

for i in {0..1000}; do   qdel $( echo "$INIT + $i" | bc ).oak-batch done

code code format="bash"
 * **uccsd_pes.sh**
 * 1) !/bin/bash

FILES="b c e f" dr=0.20
 * 1) Move ion back along x-vector exclusively

for i in $FILES; do for ii in {0..20}; do    if [ $ii -eq 0 ]; then perl -pe 's/X.XXXXXX/'$( echo "0.000000 - $ii*$dr" | bc ).000000'/g' < $( echo $i )_orig.gjf > $i-$ii.tmp perl -pe 's/NUM/'$ii'/g' < $i-$ii.tmp > $i-$ii.gjf rm $i-$ii.tmp cp OSC-subjobs $i-$ii.tmp perl -pe 's/xyz/'$i-$ii'/g' < $i-$ii.tmp > $i-$ii.pbs qsub $i-$ii.pbs rm $i-$ii.pbs $i-$ii.tmp else perl -pe 's/X.XXXXXX/'$( echo "scale=6; 0.000000 - $ii*$dr" | bc )'/g' < $( echo $i )_orig.gjf > $i-$ii.tmp perl -pe 's/NUM/'$ii'/g' < $i-$ii.tmp > $i-$ii.gjf rm $i-$ii.tmp cp OSC-subjobs $i-$ii.tmp perl -pe 's/xyz/'$i-$ii'/g' < $i-$ii.tmp > $i-$ii.pbs qsub $i-$ii.pbs rm $i-$ii.pbs $i-$ii.tmp fi done done

code code format="bash" RUNHOME=/nfs/14/ucn0939/TRAVIS/pes_forces trap "cd $RUNHOME;mkdir $PBS_JOBID;cp -R $TMPDIR/* $PBS_JOBID" TERM st=`date +%s` echo "Beginning job $PBS_JOBID execution on $HOSTNAME at" `date`
 * **OSC-submit**
 * 1) PBS -N xyz
 * 2) PBS -l walltime=10:00:00,nodes=1:ppn=12
 * 3) PBS -S /bin/bash
 * 4) PBS -j oe

module load gaussian/g09c01

cd $RUNHOME/xyz

mkdir xyz cp xyz.gjf $TMPDIR
 * 1) copy input files to tmpdir

cd $TMPDIR g09 xyz.gjf rm -f Gau* formchk xyz.chk sleep 20 /nfs/14/ucn0939/AIMAll/aimqb.ish -naat=1 -bim=proaim -iasmesh=fine -boaq=high -briaq=1.5 -mir=15.0 -ehren=1 -source=0 -feynman=true -magprops=> mv $TMPDIR/* $RUNHOME/xyz
 * 1) do optimizations and copy files back

en=`date +%s` t=`bc <<< $en-$st` echo "Job ending " `date` echo "Used $t seconds."
 * 1) print run time

code

TPP 08/27/2012