Improved Support for Parallel IPython on Wharton HPCC

You may have seen my previous post on Parallel IPython with Univa Grid Engine (SGE). It involved a fair bit of step-by-step configuration and large chunks of boilerplate code. Well, that has now been simplified. It is now much easier to run Python functions across a hundred CPU cores on HPCC.

All of the prerequisite setup can now be done with a single one-time command on HPCC: setup-ipython-parallel.sh

The ipcode.py example from the previous post now becomes…

Having a start and stop function makes things much simpler. And the ipcode.sh job script…

That’s great. But what if we actually want to do something besides lambda a simple math operation across all our allocated CPU cores? IPython provides dview functions for this purpose, to push out module imports, variables and functions to the engines. Here is a more complete ipcode.py example, illustrating how to push everything out and then iterate on a given function by mapping a list of input parameter.

In the above code we are mapping the workername function with two parameters, x and y. The parameters are each a list of length 32, so the function will be run 32 times. Notice that this function needs more than just x and y to complete. We also need to do an import of the os module, push out the variable z and the function xyz. Done, done and done. The code is ready to go.

With 8 engines and 32 function calls, each engine is getting 4 iterations. From here, we could scale up our engine count and build larger parameter lists. By default, this can get us up to 98 engines on HPCC, since the limit is 100 cores per project (1x ipcode, 1x ipcontroller and 98x ipengines).

Please see the ipcode.py and ipcode.sh examples in /usr/local/demo/Python/IPython/ on HPCC for the code from this post. Make a copy of the directory and give it a go, with the following commands:

As a specialist in Linux and high-performance computing, Burris enjoys enabling faculty within The Wharton School of the University of Pennsylvania by providing effective research computing resources. Burris has been involved in research computing since 2001. Current projects find Burris working with HPC, big data, cloud computing and grid technologies. His favorite languages are Python and BASH. In his free time, he enjoys bad cinema, video editing, synthesizers and bicycling.