Troubleshooting

Here are some tips if you’re running into trouble on the Wharton HPCC.

Interactive (qrsh, qlogin) Errors

If you try to qlogin or qrsh, and receive the following error:

qlogin

Error

This generally means that queue is busy, a common occurrence. Please add '-now no‘ option to your qlogin or qrsh command, like:

$ qlogin -now no
$ qrsh -now no stata

Note that with qrsh the '-now no' option is to 'qrsh'not to the command you’re running (‘stata’ in this example).

Investigating Failed Jobs

If a job or jobs have failed, you can explore why in a couple of ways.

Log Files

Take a look at your output files, which are by default JOBNAME + .o + JOBID + . + TASKID. Look for typos, missing packages or libraries, etc.

qacct

Examine the output from qacct -j JOBID + . + TASKID. Look for ru_maxrss of > 5242880 (bytes) for a default RAM job (5GB), or  N x 1024 x 1024 (where N is GB you requested) f you’ve requested > the default job RAM.

Reporting Trouble

If you’ve read through these tips and the Tools Page for the particular software package that you are running, and still have an issue, please send an e-mail to research-computing@wharton.upenn.edu, and include as many details as you can think of, particularly:

  1. an example JOB ID (the best detail for us!) and TASK ID (if an array job)
  2. the exact commands you were running when you saw the trouble
  3. any errors (feel free to copy/paste) that you received