The Wharton High-Performance Computing Cluster (HPCC) system is a powerful environment for running research code – code that may require a long run-time, a lot of memory, or numerous iterations. By default, research code on HPCC will run as a job with access to a single CPU core. However, by specifying a parallel environment, jobs can take advantage of more than one core, across nodes (MPI) or within the same node and shared memory (MP).
Let’s take a look at running a simple multiprocessing job with OpenMP.
Once logged into HPCC, the demo code can be copied to your home directory:
cd ~ cp -r /usr/local/demo/OpenMP/C . cd C
Take a look at the source code, in hello_openmp.c, for an idea of what’s actually going on. Essentially it’s a simple “hello world” program that will identify which processor cores the job’s threads of execution have touched.
Let’s compile the code and give it a try on four cores:
qrsh gcc -fopenmp -lgomp hello_openmp.c -o hello_openmp qsub -pe openmp 4 hello_openmp.sh
Once the job completes, we should have an output file in the current directory – HelloOMP.oXXXXXX – where the X’s are the job number. The output should look something like this:
Hello World from thread 3 on cpu 8 Hello World from thread 2 on cpu 0 Hello World from thread 1 on cpu 12 Hello World from thread 0 on cpu 10 Number of threads = 4
*NOTE: This is not just for custom C code written with OpenMP! (E.g., Matlab has many functions that benefit from multithreaded computation. And Python uses optimize routines for its numpy/scipy modules.)