R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

More information can be found at: http://www.r-project.org/

R on the HPCC

R is installed on all HPCC compute hosts.

R (and GCC) Versions

We no longer set a default version of R in the cluster!

To list the available versions:

qlogin -now no
module avail R 
-------------------------------------- /etc/modulefiles ---------------------------------------
R/4.2.2 R/4.3.1

module avail gcc
--------------------------------------------------- /etc/modulefiles ---------------------------------------------------

Note: there may be more-recent versions, as well.

To use these newer version (and recommended latest GCC compiler), please modify your ~/.bashrc file as follows:

if [[ $(hostname -s | grep -e "^hpcc[0-9]*$" -e "^aws-") ]]; then
    ... other modules ...
    module load R/4.3.1
    module load gcc/12.2.0

If you don’t want to permanently set your R & GCC versions for all sessions (qlogin or qsub), which could be useful in the long term for different projects, you can instead use those commands after qlogin, or within your qsub job script file before you run R.

Submitting R Jobs

Create R Commands File

Create a .R file with your commands, for example:

D <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))

# Sort on x
indexes <- order(D$x)

# Print out sorted dataset, sorted in reverse by y

 Create R Job Script

Create a .sh file with at least the following contents:


Rscript --no-save your-commands-file.R

 Submit R Job

More information: HPCC Job Management

Interactive R Sessions

  • Graphical: via our HPCC Desktop environment (recommended)
  • Textual:
    qrsh R --no-save

Installing R Packages

NOTE (2018-02-14): we no longer install all CRAN and BioConductor packages in the cluster!

Here’s how to install R packages in your home directory / shared workspace.

Configuration Tips

Add the following to your ~/.Rprofile:

# auto-repo (no need to select)
local({r <- getOption("repos")
r["CRAN"] <- "https://cloud.r-project.org"

# use all threads in your session
options(Ncpus = strtoi(Sys.getenv('NSLOTS'))*2)

Manual Installation

  • log on to a compute node with qlogin
  • download the package with wget (if it’s not a CRAN package)
  • install (different depending on whether it’s CRAN or not)

Here are two examples (Copy/Pastable except for the package URL in wget and package name in R CMD INSTALL):

If it IS a CRAN package:

qlogin -now no
R --no-save
> install.packages("anRpackage")

If it’s the first time you have installed packages, or there’s a new version installed, you will see something like:

Installing package into ‘/usr/local/lib/R/library-X.X.X’
(as ‘lib’ is unspecified)
Warning in install.packages("anRpackage") :
  'lib = "/usr/local/lib/R/library-X.X.X"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
to install packages into? (yes/No/cancel) yes

Just answer ‘yes’ to both questions, and the defaults are advised.

If it’s NOT a CRAN package (be careful of your sources, of course!):

qlogin -now no
wget 'http://some.server.edu/anRpackage_version.tar.gz'
R CMD INSTALL -l $R_LIBS anRpackage_version.tar.gz

Start R normally, and you should now be able to use the new package.

Automatic Installation of CRAN Packages in Your Code

Warning: do not use this code in an array (-t X-X) job! The tasks will try to install to the same place at the same time. Run a separate single job first to install the packages, or do a manual install as above

Simply, we’ll test whether each of your required packages (in the myPKGs array) is already installed (either system wide, or in your personal R library), and if any are not installed, we’ll install them. This needs to be above your ‘library()’ calls in your R code:

# Array of packages to be checked and installed if not already
myPKGs <- c('PKG1', 'PKG2', ..., 'PKGN')

# check to see if each package is installed, and if not add it to a list to install
InstalledPKGs <- names(installed.packages()[,'Package'])
InstallThesePKGs <- myPKGs[!myPKGs %in% InstalledPKGs]

# install any needed packages
if (length(InstallThesePKGs) > 0) install.packages(InstallThesePKGs)

And if you want to get a list of all of the ‘library()’ and ‘require()’ lines in your .R code, mostly formatted for copy / paste into the myPKGs line, above, try something like:

grep -e library -e require *.R | awk -F'[()]' '{print $2}' | sed ':a;N;$!ba;s/\n/" , "/g'

General Package Installation Troubleshooting Tips

  • Most package installation issues are related to the R or GCC versions you are using. Upgrade your R & GCC versions, see “R (and GCC) Versions” above.
  • If your package installation is failing on a missing library (.so or .h file), you may need to load additional software. In a qlogin session, use the “module avail” package to see if it’s obvious (a common one is ‘gdal’).

Specific Package Notes for Wharton’s HPCC

Certain commonly-used packages are tricky to install on the HPCC. Here are some that we have helped our users install, and how to install them in the HPCC. If you run into difficulties with other packages not listed here, just let us know what package you are trying to install, and we will help.

rstan Package

rstan is one of the trickier packages to install, due to specific C++ variables, and the need for a more-recent C/C++ version. A more-recent R version is recommended, as well.

Edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:

if [[ $(hostname -s | grep -e "^aws-" -e "^hpcc[0-9]*$") ]]; then
    module load mpi/openmpi-x86_64
    module load R/4.3.1
    module load gcc/12.2.0

Then you need to set up two specific R ‘Makevars’ before installation. That goes in your ~/.R/Makevars file, which should look like:

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC

To do that, from the command line, do:

mkdir ~/.R
echo -e "CXX14FLAGS=-O3 -march=native -mtune=native -fPIC\nCXX14=g++" >> ~/.R/Makevars

Now you are ready to log on to an HPCC compute node, and install rstan.

qlogin -now no
R --no-save
> install.packages("rstan", repos = "https://cloud.r-project.org/")
Warning in install.packages("rstan", repos = "https://cloud.r-project.org/") :
'lib = "/usr/local/R-4.3.1/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
to install packages into? (yes/No/cancel) yes

This will take a long time (~30 minutes). Once installation is complete, you can test that rstan is working correctly:

> library(rstan)
> options(mc.cores = strtoi(Sys.getenv('OMP_NUM_THREADS')))
> rstan_options(auto_write = TRUE)
> example(stan_model, package = "rstan", run.dontrun = TRUE)

Please Note: the ‘options(mc.cores = strtoi(Sys.getenv(‘OMP_NUM_THREADS’)))‘!! For best performance, ALWAYS use that option for cores on Wharton’s HPCC systems. Do not use ‘detectCores()‘, as this will be inaccurate, and degrade your performance. By default, OMP_NUM_THREADS = 1. To use more than 1 core for a job, you will need to request it at job start, either at the command line:

qsub -pe openmp 4

Or in your job script:

#$ -pe openmp 4

Parallel R

Some pieces of Parallel R can be tricky in the cluster. Tips here! See our Demo Code in /usr/local/demo/R/parallel on all cluster systems.

Rmpi Installation

Edit your ~/.bashrc file and change the MPI version to:

module load mpi/openmpi-4.1.6

Setting Cores

Instead of manually setting the number of cores in your parallel code, always use:

cores <- strtoi(Sys.getenv('NSLOTS'))