Troubleshooting

Here are some tips if you’re running into trouble on the Wharton HPC3.

Investigating Failed Jobs

If a job or jobs have failed, you can explore why in a couple of ways.

Log Files

Take a look at your output files, which are by default JOBNAME + .o + JOBID + . + TASKID. Look for typos, missing packages or libraries, etc.

qacct

Examine the output from qacct -j JOBID + . + TASKID. Look for ru_maxrss of > 2097152 (bytes) for a default RAM job (2GB), or  N x 1024 x 1024 (where N is GB you requested) f you’ve requested > the default job RAM.

Reporting Trouble

If you’ve read through these tips and the Tools Page for the particular software package that you are running, and still have an issue, please send an e-mail to research-computing@wharton.upenn.edu, and include as many details as you can think of, particularly:

  1. an example JOB ID (the best detail for us!) and TASK ID (if an array job, and it wasn’t all tasks)
  2. the exact commands you were running when you saw the trouble
  3. any errors (feel free to copy/paste) that you received