Python

Python on the HPCC

Python versions 2.7 and several 3.x versions are available on the HPCC compute servers. By default, the python command will invoke version 2.7.  To invoke a Python 3 version on a compute server, first enable with the module load python/VERSION command, for example ‘module load python/3.10.4’ to load Python 3.10.4 (the current latest version on the Wharton HPCC systems). We highly recommend using Python 3.8+, only falling back to 2.7 if a required module does not support version 3 (very, very rare at this point), as both Python 2.7 and Python 3.6 are now officially ‘deprecated’ and considered unsupported.

NOTE: All examples below assume that you will be using Python 3

Submitting a Basic Python Job

NOTE: You’ll need to first qlogin to install new modules, or ‘do Python’. Please do not run code on the login nodes.

Create a .py file

Create a myfile.py file with program content, for example:

import random

rlist = []
for i in range(10):
    rlist.append(random.gauss(0,1))

outfile = open('mylist.txt', 'w')
outfile.write(str(rlist) + "n")

Create a job script

Create a script file (myfile.sh in this example) file, for example:

#!/bin/bash
#$ -N jobname
#$ -j y    # join output and error
module load python/3.10.4  # <- load a recent version of Python 3
python myfile.py

Submitting the job script

Submit the job with:

qsub myfile.sh

Setting up your own Python environment

While the above may be okay for extremely simple code, generally you’ll need to install some Python modules to do your Python work. Here’s how to set things up properly in the HPCC for your Python projects.

NOTE: Remember that with each qlogin or job script, Python 3 (if used) must be re-enabled and your virtualenv (if used) re-activated. These module and source commands can be added to your job script or ~/.bashrc file, in the MPI SELECTION section, between the if > fi. A ‘complete’ example:
## START MPI SELECTION
if [[ $(hostname -s | grep "^hpcc[0-9]*$") ]]; then
    module load mpi/openmpi-x86_64
    module load python/3.10.4
    module load gcc/11.3.0   # <- also a more recent compiler!
fi
## END MPI SELECTION

Python provides functions and features via what are called modules. The recommended way to install one or more Python modules is with the pip command within a virtualenv-created directory. ‘virtualenv’ creates a self-contained directory that will hold a set of python modules. In this way, you may organize a different set of modules, maybe with different versions, for different projects.

Log on to a Compute Node

Please do all Python work on a compute node:

qlogin -now no

Enable Python 3

If you didn’t set up Python3 in your ~/.bashrc file as recommended above (highly recommended!), you will need to:

module load python/3.10.4

We also recommend that you install a more recent compiler, as many Python modules will not install properly without one:

module load gcc/11.3.0

Create a Project Folder

If you don’t already have a directory specifically for your project, create one. The below command creates a ‘projectA’ directory (modify for your project name!!) in your home directory (~ is shorthand for ‘my home directory’ … you can also use $HOME).

mkdir ~/projectA

Create a Project Virtual Environment

Change your directory to the project directory, and create a virtual environment. I like ‘venv310’, where ‘venv’ means ‘virtual environment’, and ‘310’ tells me the Python version I’m running under:

cd ~/projectA
python -m venv venv310

Activate the virtualenv

Now you’re ready to ‘activate’ the virtual environment, and ‘do stuff’, whether that’s run code, or install modules.

source venv310/bin/activate

Update the virtualenv

We highly recommend that you initially update a few packages when you create a new virtual environment.

python -m pip install -U pip
python -m pip install -U setuptools wheel

Install the modules that you need into the active virtualenv

python -m pip install pandas matplotlib

Now you can work interactively in Python (generally just for simple testing), or log out of the compute host, so you can run batch jobs with qsub:

logout

Submitting a Virtual Environment Python Job (qsub)

Create or Modify a job script

Create a script file (myfile.sh in this example) file, which is in the projectA directory. Note the only difference between this script and the ‘Simple’ script at the top is the virtual environment activation (source) line:

#!/bin/bash
#$ -N jobname
#$ -j y    # join output and error
module load python/3.10.4
source venv310/bin/activate
python myfile.py

Run it like any normal qsub job:

qsub myfile.sh

A note about standard output aka print

The Grid Engine job queuing can buffer output of a job. If you would like to see print statements from a running job as they occur, say for debug or progress messages, please do one of the following:

print(f"print some text here with a {variable}", flush=True)

OR add the ‘-u’ option to your python command in your job script:

python -u myfile.py