R

NOTE!! We no longer set a default version of R in the cluster!

NOTE2!! Due to a nasty vulnerability in all versions of the R programming language prior to the latest version (4.4.0 at this time), we have upgraded R across the HPC3 cluster, and will be disabling use of the installed older versions at 2PM on 2024-05-01.

Because of reproducibility issues between versions of R (and other software products), we no longer set a default version of R.

About R

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

More information can be found at: http://www.r-project.org/

R on the HPC3

R is installed on all HPC3 compute hosts.

R (and GCC) Versions

We no longer set a default version of R in the cluster!

To list the available versions:

qlogin
module avail R 
-------------------------------------- /etc/modulefiles ---------------------------------------
R/4.4.0

module avail gcc
--------------------------------------------------- /etc/modulefiles ---------------------------------------------------
gcc/12.2.0

Note: there may be more-recent versions, as well.

To use these newer version (and recommended latest GCC compiler), please modify your ~/.bashrc file as follows:

## START MODULE SELECTION
if [[ ! $(echo $TAGNAME |egrep "^hpc3-(login|desktop|qmaster)*$") ]]; then
... other modules ...
    module load R/4.4.0
    module load gcc/12.2.0
fi
## END MODULE SELECTION

If you don’t want to permanently set your R & GCC versions for all sessions (qlogin or qsub), which could be useful in the long term for different projects, you can instead use those commands after qlogin, or within your qsub job script file before you run R.

Submitting R Jobs

Create R Commands File

Create a .R file with your commands, for example:

D <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))
D

# Sort on x
indexes <- order(D$x)
D[indexes,]

# Print out sorted dataset, sorted in reverse by y
D[rev(order(D$y)),]

 Create R Job Script

Create a .sh file with at least the following contents:

#!/bin/bash

Rscript --no-save your-commands-file.R

 Submit R Job

More information: HPC3 Job Management

Interactive R Sessions

  • Graphical: via our HPC3 Desktop environment (recommended)
  • Textual:
    qrsh R --no-save

Installing R Packages

NOTE: if you haven’t set your version of R permanently (see above), you will need to select a version after you qlogin to do your installs.

Here’s how to install R packages in your home directory / shared workspace.

Configuration Tips

Add the following to your ~/.Rprofile:

# auto-repo (no need to select)
local({r <- getOption("repos")
r["CRAN"] <- "https://cloud.r-project.org"
options(repos=r)
})

# use all threads in your session
options(Ncpus = strtoi(Sys.getenv('NSLOTS'))*2)

Manual Installation

  • log on to a compute node with qlogin
  • download the package with wget (if it’s not a CRAN package)
  • install (different depending on whether it’s CRAN or not)

Here are two examples (Copy/Pastable except for the package URL in wget and package name in R CMD INSTALL):

If it IS a CRAN package:

qlogin
R --no-save
...
> install.packages("anRpackage")

If it’s the first time you have installed packages, or there’s a new version installed, you will see something like:

Installing package into ‘/usr/local/lib/R/library-X.X.X’
(as ‘lib’ is unspecified)
Warning in install.packages("anRpackage") :
  'lib = "/usr/local/lib/R/library-X.X.X"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/X.X’
to install packages into? (yes/No/cancel) yes

Just answer ‘yes’ to both questions, and the defaults are advised.

If it’s NOT a CRAN package (be careful of your sources, of course!):

qlogin
wget 'http://some.server.edu/anRpackage_version.tar.gz'
R CMD INSTALL -l $R_LIBS anRpackage_version.tar.gz
exit

Start R normally, and you should now be able to use the new package.

Automatic Installation of CRAN Packages in Your Code

Warning: do not use this code in an array (-t X-X) job! The tasks will try to install to the same place at the same time. Run a separate single job first to install the packages, or do a manual install as above

NOTE that you will need to at least install one package by hand (manually), to set up your R library environment. See above.

Simply, we’ll test whether each of your required packages (in the myPKGs array) is already installed (either system wide, or in your personal R library), and if any are not installed, we’ll install them. This needs to be above your ‘library()’ calls in your R code:

# Array of packages to be checked and installed if not already
myPKGs <- c('PKG1', 'PKG2', ..., 'PKGN')

# check to see if each package is installed, and if not add it to a list to install
InstalledPKGs <- names(installed.packages()[,'Package'])
InstallThesePKGs <- myPKGs[!myPKGs %in% InstalledPKGs]

# install any needed packages
if (length(InstallThesePKGs) > 0) install.packages(InstallThesePKGs)

And if you want to get a list of all of the ‘library()’ and ‘require()’ lines in your .R code, mostly formatted for copy / paste into the myPKGs line, above, try something like:

grep -e library -e require *.R | awk -F'[()]' '{print $2}' | sed ':a;N;$!ba;s/\n/" , "/g'

General Package Installation Troubleshooting Tips

  • Most package installation issues are related to the R or GCC versions you are using. Upgrade your R & GCC versions, see “R (and GCC) Versions” above.
  • If your package installation is failing on a missing library (.so or .h file), you may need to load additional software. In a qlogin session, use the “module avail” package to see if it’s obvious (a common one is ‘gdal’).

Specific Package Notes for Wharton’s HPC3

Certain commonly-used packages are tricky to install on the HPC3. Here are some that we have helped our users install, and how to install them in the HPC3. If you run into difficulties with other packages not listed here, just let us know what package you are trying to install, and we will help.

rstan Package

rstan is one of the trickier packages to install, due to specific C++ variables, and the need for a more-recent C/C++ version. A more-recent R version is recommended, as well.

Edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:

if [[ ! $(echo $TAGNAME |egrep "^hpc3-(login|desktop|qmaster)*$") ]]; then
    module load mpi/openmpi-x86_64
    module load R/4.4.0
    module load gcc/12.2.0
fi

Then you need to set up two specific R ‘Makevars’ before installation. That goes in your ~/.R/Makevars file, which should look like:

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

To do that, from the command line, do:

mkdir ~/.R
echo -e "CXX14FLAGS=-O3 -march=native -mtune=native -fPIC\nCXX14=g++" >> ~/.R/Makevars

Now you are ready to log on to an HPC3 compute node, and install rstan.

qlogin -l m_mem_free=6G
R --no-save
> install.packages("rstan", repos = "https://cloud.r-project.org/")
Warning in install.packages("rstan", repos = "https://cloud.r-project.org/") :
'lib = "/usr/local/R-4.4.0/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/4.4’
to install packages into? (yes/No/cancel) yes

This will take a long time (~30 minutes). Once installation is complete, you can test that rstan is working correctly:

> library(rstan)
> options(mc.cores = strtoi(Sys.getenv('NSLOTS')) * 2)
> rstan_options(auto_write = TRUE)
> example(stan_model, package = "rstan", run.dontrun = TRUE)

Please Note: the ‘options(mc.cores = strtoi(Sys.getenv(‘NSLOTS’)) * 2)‘!! For best performance, ALWAYS use that option for cores on Wharton’s HPC3 systems. Do not use ‘detectCores()‘, as this will be inaccurate, and degrade your performance. By default, NSLOTS = 1. To use more than 1 core for a job, you will need to request it at job start, either at the command line:

qsub -pe openmp 4

Or in your job script:

#$ -pe openmp 4

Parallel R

Some pieces of Parallel R can be tricky in the cluster. Tips here! See our Demo Code in /usr/local/demo/R/parallel on all cluster systems.

Rmpi Installation

Edit your ~/.bashrc file and change the MPI version to:

module load mpi/openmpi-4.1.6

Setting Cores

Instead of manually setting the number of cores in your parallel code, always use:

cores <- strtoi(Sys.getenv('NSLOTS'))