You may have seen my previous post on Parallel IPython with Univa Grid Engine (SGE). It involved a fair bit of step-by-step configuration and large chunks of boilerplate code. Well, that has now been simplified. It is now much easier to run Python functions across a hundred CPU cores on HPCC.
All of the prerequisite setup can now be done with a single one-time command on HPCC: setup-ipython-parallel.sh
The ipcode.py example from the previous post now becomes…
#!/usr/bin/env python ### start ipython cluster import hpcc rc, p = hpcc.start() ### do some work ala https://ipython.org/ipython-doc/3/parallel/parallel_multiengine.html#quick-and-easy-parallelism dview = rc[:] serial_result = map(lambda x:x**10, range(32)) parallel_result = dview.map_sync(lambda x: x**10, range(32)) print serial_result print serial_result==parallel_result ### stop ipython cluster hpcc.stop(p)
Having a start and stop function makes things much simpler. And the ipcode.sh job script…
#!/bin/bash #$ -N ipcode #$ -j y workon ipython-parallel # load parallel environment ipython ipcode.py ${1:-"8"} # run 8 engines by default
That’s great. But what if we actually want to do something besides lambda a simple math operation across all our allocated CPU cores? IPython provides dview functions for this purpose, to push out module imports, variables and functions to the engines. Here is a more complete ipcode.py example, illustrating how to push everything out and then iterate on a given function by mapping a list of input parameter.
#!/usr/bin/env python ### start ipython cluster import hpcc rc, p = hpcc.start() # example function for engines def xyz(x, y, z): return x + y + z # example function for iterations def workername(x, y): myhost = os.uname()[1] result = myhost + " " + str(x) + "+" + str(y) + "+" + str(z) + "=" + str(xyz(x,y,z)) return result # use all engines dview = rc[:] # push out imports with dview.sync_imports(): import os # push out variables dview.push(dict(z=1), block=True) # push out functions dview.push(dict(xyz=xyz), block=True) # build iterable list of parameters x = range(32) y = [2] * 32 # iterate and parallel map functions parallel_result = dview.map_sync(workername, x, y) print parallel_result ### stop ipython cluster hpcc.stop(p)
In the above code we are mapping the workername function with two parameters, x and y. The parameters are each a list of length 32, so the function will be run 32 times. Notice that this function needs more than just x and y to complete. We also need to do an import of the os module, push out the variable z and the function xyz. Done, done and done. The code is ready to go.
With 8 engines and 32 function calls, each engine is getting 4 iterations. From here, we could scale up our engine count and build larger parameter lists. By default, this can get us up to 98 engines on HPCC, since the limit is 100 cores per project (1x ipcode, 1x ipcontroller and 98x ipengines).
Please see the ipcode.py and ipcode.sh examples in /usr/local/demo/Python/IPython/ on HPCC for the code from this post. Make a copy of the directory and give it a go, with the following commands:
setup-ipython-parallel.sh cp -r /usr/local/demo/Python/IPython ~ cd IPython qsub ipcode.sh qstat cat ipcode.o*