NOTE!! We no longer set a default version of R in the cluster!
NOTE2!! Due to a nasty vulnerability in all versions of the R programming language prior to the latest version (4.4.0 at this time), we have upgraded R across the HPC3 cluster, and will be disabling use of the installed older versions at 2PM on 2024-05-01.
Because of reproducibility issues between versions of R (and other software products), we no longer set a default version of R.
About R
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
More information can be found at: http://www.r-project.org/
R on the HPC3
R is installed on all HPC3 compute hosts.
R (and GCC) Versions
We no longer set a default version of R in the cluster!
To list the available versions:
qlogin module avail R -------------------------------------- /etc/modulefiles --------------------------------------- R/4.4.1 module avail gcc --------------------------------------------------- /etc/modulefiles --------------------------------------------------- gcc/12.2.0
Note: there may be more-recent versions, as well.
To use these newer version (and recommended latest GCC compiler), please modify your ~/.bashrc file as follows:
## START MODULE SELECTION if [[ ! $(echo $TAGNAME |egrep "^hpc3-(login|desktop|qmaster)*$") ]]; then ... other modules ... module load R/4.4.1 module load gcc/12.2.0 fi ## END MODULE SELECTION
If you don’t want to permanently set your R & GCC versions for all sessions (qlogin or qsub), which could be useful in the long term for different projects, you can instead use those commands after qlogin, or within your qsub job script file before you run R.
Submitting R Jobs
Create R Commands File
Create a .R file with your commands, for example:
D <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2)) D # Sort on x indexes <- order(D$x) D[indexes,] # Print out sorted dataset, sorted in reverse by y D[rev(order(D$y)),]
Create R Job Script
Create a .sh file with at least the following contents:
#!/bin/bash Rscript --no-save your-commands-file.R
Submit R Job
More information: HPC3 Job Management
Interactive R Sessions
- Graphical R Studio: via our HPC3 Desktop environment
qlogin module load R/4.4.1 # <- if this isn't in your ~/.bashrc file module load gcc/12.2.0 # <- if this isn't in your ~/.bashrc file module load rstudio rstudio
- Textual:
qrsh R --no-save
Installing R Packages
NOTE: if you haven’t set your version of R permanently (see above), you will need to select a version after you qlogin
to do your installs.
Here’s how to install R packages in your home directory / shared workspace.
Configuration Tips
Add the following to your ~/.Rprofile
:
# auto-repo (no need to select) local({r <- getOption("repos") r["CRAN"] <- "https://cloud.r-project.org" options(repos=r) }) # use all threads in your session options(Ncpus = strtoi(Sys.getenv('NSLOTS'))*2)
Manual Installation
- log on to a compute node with qlogin
- download the package with wget (if it’s not a CRAN package)
- install (different depending on whether it’s CRAN or not)
Here are two examples (Copy/Pastable except for the package URL in wget and package name in R CMD INSTALL):
If it IS a CRAN package:
qlogin R --no-save ... > install.packages("anRpackage")
If it’s the first time you have installed packages, or there’s a new version installed, you will see something like:
Installing package into ‘/usr/local/lib/R/library-X.X.X’ (as ‘lib’ is unspecified) Warning in install.packages("anRpackage") : 'lib = "/usr/local/lib/R/library-X.X.X"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-pc-linux-gnu-library/X.X’ to install packages into? (yes/No/cancel) yes
Just answer ‘yes’ to both questions, and the defaults are advised.
If it’s NOT a CRAN package (be careful of your sources, of course!):
qlogin wget 'http://some.server.edu/anRpackage_version.tar.gz' R CMD INSTALL -l $R_LIBS anRpackage_version.tar.gz exit
Start R normally, and you should now be able to use the new package.
Automatic Installation of CRAN Packages in Your Code
Warning: do not use this code in an array (-t X-X) job! The tasks will try to install to the same place at the same time. Run a separate single job first to install the packages, or do a manual install as above
NOTE that you will need to at least install one package by hand (manually), to set up your R library environment. See above.
Simply, we’ll test whether each of your required packages (in the myPKGs array) is already installed (either system wide, or in your personal R library), and if any are not installed, we’ll install them. This needs to be above your ‘library()’ calls in your R code:
# Array of packages to be checked and installed if not already myPKGs <- c('PKG1', 'PKG2', ..., 'PKGN') # check to see if each package is installed, and if not add it to a list to install InstalledPKGs <- names(installed.packages()[,'Package']) InstallThesePKGs <- myPKGs[!myPKGs %in% InstalledPKGs] # install any needed packages if (length(InstallThesePKGs) > 0) install.packages(InstallThesePKGs)
And if you want to get a list of all of the ‘library()’ and ‘require()’ lines in your .R code, mostly formatted for copy / paste into the myPKGs line, above, try something like:
grep -e library -e require *.R | awk -F'[()]' '{print $2}' | sed ':a;N;$!ba;s/\n/" , "/g'
General Package Installation Troubleshooting Tips
- Most package installation issues are related to the R or GCC versions you are using. Upgrade your R & GCC versions, see “R (and GCC) Versions” above.
- If your package installation is failing on a missing library (.so or .h file), you may need to load additional software. In a qlogin session, use the “module avail” package to see if it’s obvious (a common one is ‘gdal’).
Specific Package Notes for Wharton’s HPC3
Certain commonly-used packages are tricky to install on the HPC3. Here are some that we have helped our users install, and how to install them in the HPC3. If you run into difficulties with other packages not listed here, just let us know what package you are trying to install, and we will help.
rstan Package
rstan is one of the trickier packages to install, due to specific C++ variables, and the need for a more-recent C/C++ version. A more-recent R version is recommended, as well.
Edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:
if [[ ! $(echo $TAGNAME |egrep "^hpc3-(login|desktop|qmaster)*$") ]]; then module load mpi/openmpi-x86_64 module load R/4.4.1 module load gcc/12.2.0 fi
Then you need to set up two specific R ‘Makevars’ before installation. That goes in your ~/.R/Makevars file, which should look like:
CXX14FLAGS=-O3 -march=native -mtune=native -fPIC CXX14=g++
To do that, from the command line, do:
mkdir ~/.R echo -e "CXX14FLAGS=-O3 -march=native -mtune=native -fPIC\nCXX14=g++" >> ~/.R/Makevars
Now you are ready to log on to an HPC3 compute node, and install rstan.
qlogin -l m_mem_free=6G R --no-save > install.packages("rstan", repos = "https://cloud.r-project.org/") Warning in install.packages("rstan", repos = "https://cloud.r-project.org/") : 'lib = "/usr/local/R-4.4.1/lib64/R/library"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-pc-linux-gnu-library/4.4’ to install packages into? (yes/No/cancel) yes
This will take a long time (~30 minutes). Once installation is complete, you can test that rstan is working correctly:
> library(rstan) > options(mc.cores = strtoi(Sys.getenv('NSLOTS')) * 2) > rstan_options(auto_write = TRUE) > example(stan_model, package = "rstan", run.dontrun = TRUE)
Please Note: the ‘options(mc.cores = strtoi(Sys.getenv(‘NSLOTS’)) * 2)‘!! For best performance, ALWAYS use that option for cores on Wharton’s HPC3 systems. Do not use ‘detectCores()‘, as this will be inaccurate, and degrade your performance. By default, NSLOTS = 1. To use more than 1 core for a job, you will need to request it at job start, either at the command line:
qsub -pe openmp 4
Or in your job script:
#$ -pe openmp 4
Parallel R
Some pieces of Parallel R can be tricky in the cluster. Tips here! See our Demo Code in /usr/local/demo/R/parallel
on all cluster systems.
Rmpi Installation
Edit your ~/.bashrc
file and change the MPI version to:
module load mpi/openmpi-4.1.6
Setting Cores
Instead of manually setting the number of cores in your parallel code, always use:
cores <- strtoi(Sys.getenv('NSLOTS'))