Parallel IPython with Univa Grid Engine (SGE)

I frequently sing the praises of Python as the go-to language for research programming. Python is extremely fast, in terms of human development time. There are typically more CPU cores available versus research assistants with specific programming skills. On that note, Python is also extremely fast, in terms of compute time, thanks to the already optimized and compiled C/C++ code in modules such as numpy and scipy.

Enter IPython (soon to be rename Jupyter). IPython provides a number of improvements over vanilla Python. My favorites are the improved interactive shell for the terminal and web-enabled “notebooks” that allow a better way of quickly developing code.

IPython also allows for easily scaling up code in a parallel fashion. When working with IPython, one can start a “cluster” of “engines.” These engines are managed by a “controller.” Typically, one would launch one engine per CPU on a system. The engines then accept work from the “client” IPython shell. This is all automated with the IPython ipcluster command on a personal system. But how do we take advantage of this feature in an HPC environment that has many many more CPUs behind a job scheduler? When submitting non-interactive jobs to a job queue, we cannot predict when and what compute nodes of the cluster will be provided. Furthermore, we cannot have different jobs stomping on the setup and tear-down of our cluster of IPython engines.

So, we need a scriptable setup that allows for dynamic resources and simultaneous, non-conflicting clusters of engines. IPython does provide “profile” support for PBS and SGE job schedulers. I have adapted an SGE profile setup and a virtualenv to facilitate a single-script solution on the Wharton HPCC system.

#!/usr/bin/env python                                                                                                                             
### a single script solution to running parallel IPython with Univa Grid Engine (SGE)
### 2015-06-29 Gavin Burris, Wharton Research Computing

Let’s first configure the profile and virtualenv to make this possible, with the following terminal commands.

### create a virtualenv:
# qlogin
# virtualenv ~/projectA
# cd ~/projectA && source ~/projectA/bin/activate
# pip install ipython zmq virtualenvwrapper --upgrade
# ipython profile create sge --parallel
# echo "c.IPEngineApp.wait_for_url_file = 60" >>~/.ipython/profile_sge/
# echo "c.EngineFactory.timeout = 20" >>~/.ipython/profile_sge/
# echo "c.HubFactory.ip = '*'" >>~/.ipython/profile_sge/
# echo "c.HubFactory.engine_ip = '*'" >>~/.ipython/profile_sge/
# echo "c.IPClusterStart.controller_launcher_class = 'SGE'" >>~/.ipython/profile_sge/
# echo "c.IPClusterEngines.engine_launcher_class = 'SGE'" >>~/.ipython/profile_sge/
# exit

Once we have set the profile and virtualenv in our HPCC account, we get to the meat of the python script. Most of this code is wrapper for starting and stopping the IPython cluster engines in a fail-proof way.

### required modules
import subprocess
import os
import sys
import time
from IPython.parallel import Client 

### start IPython cluster
def waitforp(mesg):
    while True:
        line = p.stderr.readline()
        if mesg in line:
jobname = os.environ['JOB_NAME']
jobid = str(os.environ['JOB_ID'])
cluster = jobname + jobid
engines = sys.argv[1]
pcmd = "ipcluster", "start", "--profile=sge", "--cluster-id=" + cluster, "-n", engines
print "Starting IPython cluster", cluster, "with", engines, "engines:", ' '.join(pcmd)
p = subprocess.Popen(pcmd, env=os.environ.copy(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
waitforp("Engines appear to have started successfully")
rc = None
while rc is None:
        rc = Client(profile='sge', cluster_id=cluster)
        print "Waiting for ipcontroller..."
while len(rc.ids) < int(engines):
    print "Waiting for ipengines..."
    #print rc.ids
dview = rc[:] # use all engines

### do some work ala
serial_result = map(lambda x:x**10, range(32))
parallel_result = dview.map_sync(lambda x: x**10, range(32))
print serial_result
print serial_result==parallel_result

### stop IPython cluster
qcmd = "ipcluster", "stop", "--profile=sge", "--cluster-id=" + cluster
print "Stopping IPython cluster", cluster + ":", ' '.join(qcmd)
q = subprocess.Popen(qcmd, env=os.environ.copy(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
waitforp("Removing pid file")

Once the above is saved to the script, we can submit it to grid engine. This script is a template for whatever work is required in the above “do some work” section. In this example, we are simply comparing the results returned from a local versus parallel run of the lambda function that takes x to the power of ten. Please see the IPython Parallel Multiengine documentation for different methods of dividing work across engines, including farming out python function calls.

This single script can then be submitted to the job queue with qsub. Note the “8” in the command line. This tells our script to run eight engines. This one ipcode client script, one ipcontroller and eight ipengines is a total of ten slots in the queue.

### submit job with:
# echo "source ~/projectA/bin/activate && ipython 8" |qsub -j y -N ipcode
# OR 
# qsub 8

Please see /usr/local/demo/Python/IPython/ on HPCC for the demo code from this post.

As a specialist in Linux and high-performance computing, Burris enjoys enabling faculty within The Wharton School of the University of Pennsylvania by providing effective research computing resources. Burris has been involved in research computing since 2001. Current projects find Burris working with HPC, big data, cloud computing and grid technologies. His favorite languages are Python and BASH. In his free time, he enjoys bad cinema, video editing, synthesizers and bicycling.