Python

Python on the HPCC

Several versions of Python are available on the HPCC compute servers. To reduce reproducibility issues, by default the python does not work. To invoke Python 3 version on a compute server, first enable with the module load python/VERSION command, for example:

module load python/3.11.5

… to load Python 3.11.5 (the current latest version on the Wharton HPCC systems). While there is a ‘default’ builtin v3.6, that is for system operations use only!

NOTE: All examples below assume that you will be using Python 3

Submitting a Basic Python Job

NOTE: You’ll need to first qlogin to install new modules, or ‘do Python’. Please do not run code on the login nodes.

Create a .py file

Create a myfile.py file with program content, for example:

import random

rlist = []
for i in range(10):
    rlist.append(random.gauss(0,1))

outfile = open('mylist.txt', 'w')
outfile.write(str(rlist) + "n")

Create a job script

Create a script file (myfile.sh in this example) file, for example:

#!/bin/bash
#$ -N jobname
#$ -j y    # join output and error
module load python/3.11.5  # <- load a recent version of Python 3
python myfile.py

Submitting the job script

Submit the job with:

qsub myfile.sh

Setting up your own Python environment

While the above may be okay for extremely simple code, generally you’ll need to install some Python modules to do your Python work. Here’s how to set things up properly in the HPCC for your Python projects.

NOTE: Remember that with each qlogin or job script, Python 3 (if used) must be re-enabled and your virtualenv (if used) re-activated. These module and source commands can be added to your job script or ~/.bashrc file, in the MPI SELECTION section, between the if > fi. A ‘complete’ example:
## START MODULE SELECTION
if [[ $(hostname -s | grep "^hpcc[0-9]*$") ]]; then
    module load python/3.11.5
    module load gcc/12.2.0   # <- also a more recent compiler!
fi
## END MODULE SELECTION

Python provides functions and features via what are called modules. The recommended way to install one or more Python modules is with the pip command within a virtualenv-created directory. ‘virtualenv’ creates a self-contained directory that will hold a set of python modules. In this way, you may organize a different set of modules, maybe with different versions, for different projects.

Log on to a Compute Node

Please do all Python work on a compute node:

qlogin

Enable Python 3

If you didn’t set up Python3 in your ~/.bashrc file as recommended above (highly recommended!), you will need to:

module load python/3.11.5

We also recommend that you install a more recent compiler, as many Python modules will not install properly without one:

module load gcc/12.2.0

Create a Project Folder

If you don’t already have a directory specifically for your project, create one. The below command creates a ‘projectA’ directory (modify for your project name!!) in your home directory (~ is shorthand for ‘my home directory’ … you can also use $HOME).

mkdir ~/projectA

Create a Project Virtual Environment

Change your directory to the project directory, and create a virtual environment. I like ‘venv310’, where ‘venv’ means ‘virtual environment’, and ‘310’ tells me the Python version I’m running under:

cd ~/projectA
python -m venv venv3115

Activate the virtualenv

Now you’re ready to ‘activate’ the virtual environment, and ‘do stuff’, whether that’s run code, or install modules.

source venv3115/bin/activate

Update the virtualenv

We highly recommend that you initially update a few packages when you create a new virtual environment.

python -m pip install -U pip
python -m pip install -U setuptools wheel

Install the modules that you need into the active virtualenv

python -m pip install pandas matplotlib

Now you can work interactively in Python (generally just for simple testing), or log out of the compute host, so you can run batch jobs with qsub:

logout

Submitting a Virtual Environment Python Job (qsub)

Create or Modify a job script

Create a script file (myfile.sh in this example) file, which is in the projectA directory. Note the only difference between this script and the ‘Simple’ script at the top is the virtual environment activation (source) line:

#!/bin/bash
#$ -N jobname
#$ -j y    # join output and error
module load python/3.11.5
source venv3115/bin/activate
python myfile.py

Run it like any normal qsub job:

qsub myfile.sh

A note about standard output aka print

The Grid Engine job queuing can buffer output of a job. If you would like to see print statements from a running job as they occur, say for debug or progress messages, please do one of the following:

print(f"print some text here with a {variable}", flush=True)

OR add the ‘-u’ option to your python command in your job script:

python -u myfile.py