Python

Python on the HPCC

Python versions 2.7 and 3.6 are available on the HPCC. The default python command will invoke version 2.7.  To invoke version 3.6, first enable with the source /opt/rh/rh-python36/enable commands. We highly recommend using 3.6, only falling back to 2.7 if a required module does not support version 3.

Submitting a Basic Python Job

NOTE: You’ll need to first qlogin to install new modules, or ‘do Python’. Please do not run code on the login nodes.

Create a .py file

Create a myfile.py file with program content, for example:

import random

rlist = []
for i in range(10):
    rlist.append(random.gauss(0,1))

outfile = open('mylist.txt', 'w')
outfile.write(str(rlist) + "n")

Create a job script

Create a script file (myfile.sh in this example) file, for example:

#!/bin/bash
#$ -N jobname
#$ -j y    # join output and error
python myfile.py

Submitting the job script

Submit the job with:

qsub myfile.sh

Setting up your own Python environment

While the above may be okay for extremely simple code, generally you’ll need to install some modules to do Python work. Here’s how to set things up properly in the HPCC for your Python projects.

NOTE: Remember that with each qlogin or job script, Python 3 (if used) must be re-enabled and your virtualenv (if used) re-activated. These source commands can be added to your job script or ~/.bashrc file, in the MPI SELECTION section, between the if > fi. A ‘complete’ example:
## START MPI SELECTION
if [[ $(hostname -s | grep "^hpcc[0-9]*$") ]]; then
    module load mpi/openmpi-x86_64
    module load gcc/gcc-9.2.0
    source /opt/rh/rh-python36/enable
    unset PYTHONPATH
fi
## END MPI SELECTION

Python provides functions and features via what are called modules. The recommended way to install one or more Python modules is with the pip command within a virtualenv-created directory. ‘virtualenv’ creates a self-contained directory that will hold a set of python modules. In this way, you may organize a different set of modules, maybe with different versions, for different projects.

Log on to a Compute Node

Please do all Python work on a compute node:

qlogin -now no

Enable Python 3

If you didn’t set up Python3 in your ~/.bashrc file as recommended above (highly recommended!), you will need to:

source /opt/rh/rh-python36/enable

Create a Project Folder

If you don’t already have a directory specifically for your project, create one. The below command creates a ‘projectA’ directory in your home directory (~ is shorthand for ‘my home directory’ … you can also use $HOME).

mkdir ~/projectA

Create a Project Virtual Environment

Change your directory to the project directory, and create a virtual environment. I like ‘venv36’, where ‘venv’ means ‘virtual environment’, and ’36’ tells me the Python version I’m running under:

cd ~/projectA
virtualenv venv36

Activate the virtualenv

Now you’re ready to ‘activate’ the virtual environment, and ‘do stuff’, whether that’s run code, or install modules.

source venv36/bin/activate

Install modules into the active virtualenv

pip install pandas matplotlib

Now you can work interactively in Python (generally just for simple testing), or log out of the compute host, so you can run batch jobs with qsub:

logout

Submitting a Virtual Environment Python Job (qsub)

Create or Modify a job script

Create a script file (myfile.sh in this example) file, which is in the projectA directory. Note the only difference between this script and the ‘Simple’ script at the top is the virtual environment activation (source) line:

#!/bin/bash
#$ -N jobname
#$ -j y    # join output and error
source venv36/bin/activate
python myfile.py

Run it like any normal qsub job:

qsub myfile.sh

A note about standard output aka print

The Grid Engine job queuing can buffer output of a job. If you would like to see print statements from a running job as they occur, say for debug or progress messages, please do something like the following:

print(f"print some text here with a {variable}", flush=True)