High Performance Computing

Whatever your research computing needs, Wharton Computing’s Research and Innovation team is dedicated to helping you get the job done.

High Performance Computing

The Wharton School HPC Cluster is a 32-node, 512-core Linux cluster environment designed to support the school’s academic research mission. It is managed collaboratively by Wharton Computing’s Research and Innovation and Core Services teams.

Don’t have a Wharton HPC account? Apply HERE!

Current HPC Cluster Status

Cloud HPC

Need to scale beyond our on-campus resources? Work in an isolated environment? Control your own services and costs? Cloud HPC resources may be for you. Contact research-computing@wharton.upenn.edu for further details.

Overview

Wharton’s HPC Cluster (HPCC) provides access to advanced computational research hardware and software for Wharton faculty, faculty collaborators and research assistants, and Wharton doctoral candidates. It is designed for simple and parallel processing across a large set of tightly integrated hardware with dedicated networking and storage. For more information about the hardware, please see the Hardware page. HPCC users have access to a number of scientific, mathematics, and analytic software including Matlab, Mathematica, R, Stata, SAS and more. MySQL server access can be provided as well. The HPCC also has Fortran, C, and C++ compilers in GNU and Intel versions. For more information about each of the software packages and compilers available, please see the Software page.

Access

If you have not applied for an HPC account (required for access), please fill out the HPC Account Application, and we will get you started! Once you have an HPC account, you will want to be able to access the software/job scheduler as well as copy files.

  • Access: details connecting to the HPCC via SSH (X forwarding available), VNC, or PC SAS Connect
  • Storage: details about the storage environment, including quotas, and supported file transfer methods

Using the Software

If you need help with general UNIX/Linux commands, here’s a nice Unix Command Reference Sheet. You might also take a look at our Training Basics Page for more in-depth training options. To understand how to run the software, it is useful to understand the architecture of the HPCC, which allows a large number of people to simultaneously run computational software. You are logging into a “login” or “head” node (in our case, hpcc.wharton.upenn.edu) which then has special scheduling software to allow you to execute the research software you prefer on a number of behind the scenes “compute” nodes. Which brings up the possibly confusing, first important point:

By design, computational software is not installed on hpcc.wharton.upenn.edu, the login node. This prevents errant or computationally intensive programs from blocking everyone’s access to the cluster via the login node. You must use the cluster ‘q’ commands to use any research software.

There are two methods of using the software on the cluster: using the queuing system, and interactively — both of which will execute your commands on the compute nodes:

  • Using the queuing system (submitting scripts) is the preferred method for running computations on the cluster. With scripts, many simultaneous computations can be run without any user interaction.
  • Interactive usage involves running the software manually on a compute node. If you are new, this might be the way you have used research software in the past, however it does not fully exploit the power and utility of the cluster. This method can be good for running a simple, single job; learning the software; testing; and debugging.

Each software product has its own startup command depending on whether textual or graphical mode is available (and if your connection supports the graphical interface). Take a look at the specific Tools Page for the software you wish to use (Matlab, SAS, Stata, etc.) for more information once you understand the basics.

Using the Queuing System (Script Submission)

 Job Queue Submission Overview

For more in-depth reading on this topic, see Job Management. Simply stated, running calculations on the cluster without user interaction involves:

  • Connect to the cluster using the instructions from Access
  • Taking your software commands and placing them in a software script file
  • Creating a job script that calls the software and your software script file (or using echo 'commands' and a pipe (|) to qsub)
  • Use the qsub command to submit your job script to the cluster scheduling queues
  • When resources are available (generally immediately), your job script is launched on an available compute node (server)
  • While your job script is running, the output is placed where you specified in your job script, or the directory where you started the job by default

Again, please see Job Management for greater details and example processes.

Imbox_notice

dos2unix

If you are copying files from Windows using SFTP, please set your client to ASCII mode for both the command and job scripts, otherwise your job may not run as expected due to end-of-line differences between UNIX and Windows OSes. If you use Windows File Sharing to transfer the files, or have a file that you’d like to fix already on the cluster, you should use dos2unix filename to convert the file ‘line endings’ to work correctly in Linux.

Interactive Access

courtesy rule for all interactive access (qrsh and qlogin): if you’re not going to be doing something actively for more than a few minutes, please log out. If we see a session open but idle for more than an hour, we may kill the interactive job, potentially costing you any unsaved open work.

To run software interactively on the cluster:

  • Connect to the cluster using the instructions from Access * Run: qrsh program
  • For example, qrsh sas -nodms (textual) or qrsh sas (graphical). All software is documented on each individual software page (Mathematica, Matlab, Stata, etc.)
  • If there is an available compute node, the scheduler will automatically connect you to it and start the software
  • Behind the scenes, you have just submitted a job to start your program interactively which got scheduled and executed as soon as possible

If you are trying to debug code while writing it, it might be more convenient to just stay logged in on a compute node rather than trying to repeatedly log into them to work. Simply run qlogin, which will keep you logged into a compute node until you type exit. You can then simply run program (stata, sas, matlab, mathematica, python, etc.) when logged into a compute node like this.

Tip: There are very few reasons to use qlogin anymore. We recommend just do exactly what you are planning on doing with qlogin but add ‘qrsh’ to the front of the command. Like:

qrsh f90 -o mybinary mycode.f

That said, even qrsh can be ‘abused’ by starting a job (maybe an interactive Matlab session) and not doing anything actively. See the courtesy rule above: we might kill it if it’s been active and idle for more than an hour. If you’d like to work with our team to come up with an efficient workflow for your project, please e-mail research-computing@wharton.upenn.edu to schedule some time to discuss your project. We can help you plan and script so you can keeping your cores ‘doing something’ as much as possible, while not wasting cores that could be used by you (like those qlogin sessions that could be running jobs!) or others.

More Documentation

For starters, please see Job Management.

Most Unix commands and MPI routines have manual (man) pages associated with them to provide usage information. To view a man page, execute:

man command This documentation only touches on some of the available features and way to use the software. There is often much better (and more up to date) documentation on the web, but it can be similar-but-actually-unrelated.

And of course: please let us know if you can’t find the documentation you need.

Reporting Problems or Comments

Please send any problems, questions, or comments to research-computing@wharton.upenn.edu.