Python on the HPC3
NOTE!! We do not set a default version of Python in the cluster!
Several versions of Python are available on the HPC3 compute servers. To reduce reproducibility issues, we don’t set a default version for our users. To set your Python version on a compute server, first enable with the module load python/VERSION command, for example:
module load python/3.12.3
… to load Python 3.12.3 (the current latest version on the Wharton HPC3 systems). See Setting Up Your Own Python Environment, below, for further details. While there is a ‘default’ builtin v3.6, that is for system operations use only!
NOTE: All examples below assume that you will be using Python 3 |
Submitting a Basic Python Job
NOTE: You’ll need to first qlogin to install new modules, or ‘do Python’. Please do not run code on the login nodes. |
Create a .py file
Create a myfile.py file with program content, for example:
import random rlist = [] for i in range(10): rlist.append(random.gauss(0,1)) outfile = open('mylist.txt', 'w') outfile.write(str(rlist) + "n")
Create a job script
Create a script file (myfile.sh in this example) file, for example:
#!/bin/bash #$ -N jobname #$ -j y # join output and error module load python/3.12.3 # <- load a recent version of Python 3 module load gcc/12.2.0 # <- also a more recent compiler! python myfile.py
Submitting the job script
Submit the job with:
qsub myfile.sh
Setting up your own Python environment
While the above may be okay for extremely simple code, generally you’ll need to install some Python modules to do your Python work. Here’s how to set things up properly in the HPC3 for your Python projects.
NOTE: Remember that with each qlogin or job script, Python 3 (if used) must be re-enabled and your virtualenv (if used) re-activated. These module and source commands can be added to your job script or ~/.bashrc file, in the MPI SELECTION section, between the if > fi. A ‘complete’ example: |
## START MODULE SELECTION if [[ ! $(echo $TAGNAME |egrep "^hpc3-(login|desktop|qmaster)*$") ]]; then module load python/3.12.3 module load gcc/12.2.0 # <- also a more recent compiler! fi ## END MODULE SELECTION
Python provides functions and features via what are called modules. The recommended way to install one or more Python modules is with the pip command within a virtualenv-created directory. ‘virtualenv’ creates a self-contained directory that will hold a set of python modules. In this way, you may organize a different set of modules, maybe with different versions, for different projects.
Log on to a Compute Node
Please do all Python work on a compute node:
qlogin
Enable Python 3
If you didn’t set up Python3 in your ~/.bashrc file as recommended above (highly recommended!), you will need to:
module load python/3.12.3
We also recommend that you install a more recent compiler, as many Python modules will not install properly without one:
module load gcc/12.2.0
Create a Project Folder
If you don’t already have a directory specifically for your project, create one. The below command creates a ‘projectA’ directory (modify for your project name!!) in your home directory (~ is shorthand for ‘my home directory’ … you can also use $HOME).
mkdir ~/projectA
Create a Project Virtual Environment
Change your directory to the project directory, and create a virtual environment. I like ‘venv310’, where ‘venv’ means ‘virtual environment’, and ‘310’ tells me the Python version I’m running under:
cd ~/projectA python -m venv venv3123
Activate the virtualenv
Now you’re ready to ‘activate’ the virtual environment, and ‘do stuff’, whether that’s run code, or install modules.
source venv3123/bin/activate
Update the virtualenv
We highly recommend that you initially update a few packages when you create a new virtual environment.
python -m pip install -U pip python -m pip install -U setuptools wheel
Install the modules that you need into the active virtualenv
python -m pip install pandas matplotlib
Now you can work interactively in Python (generally just for simple testing), or log out of the compute host, so you can run batch jobs with qsub:
logout
Submitting a Virtual Environment Python Job (qsub)
Create or Modify a job script
Create a script file (myfile.sh in this example) file, which is in the projectA directory. Note the only difference between this script and the ‘Simple’ script at the top is the virtual environment activation (source) line:
#!/bin/bash #$ -N jobname #$ -j y # join output and error module load python/3.12.3 source venv3123/bin/activate python myfile.py
Run it like any normal qsub job:
qsub myfile.sh
Jupyter on the HPC3
Running Jupyter on the HPC3 requires some setup. See our Tools > Jupyter Page for instructions.
A note about standard output aka print
The Grid Engine job queuing can buffer output of a job. If you would like to see print statements from a running job as they occur, say for debug or progress messages, please do one of the following:
print(f"print some text here with a {variable}", flush=True)
OR add the ‘-u’ option to your python command in your job script:
python -u myfile.py