R

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

More information can be found at: http://www.r-project.org/

R on the HPCC

R is installed on all HPCC compute hosts.

R (and GCC) Versions

R version 3.6.0 and GCC version 4.8.5 are the current system default, but these are now quite old. Many R packages will no longer properly install in that combination of versions. We highly recommend using newer versions of R and GCC!

To list the available versions:

qlogin -now no
module avail R 
-------------------------------------- /etc/modulefiles ---------------------------------------
R/R-3.5.1 R/R-3.6.1 R/R-3.6.3 R/R-4.0.2 R/R-4.0.4 R/R-4.1.2 R/R-4.2.2

module avail gcc

-------------------------------------- /etc/modulefiles ---------------------------------------
gcc/gcc-10.3.0 gcc/gcc-11.3.0 gcc/gcc-7.5.0 gcc/gcc-9.2.0
gcc/gcc-11.1.0 gcc/gcc-6.3.0 gcc/gcc-8.2.0

Note: there may be more-recent versions, as well.

To use these newer version (and recommended latest GCC compiler), please modify your ~/.bashrc file as follows:

## START MPI SELECTION
if [[ $(hostname -s | grep -e "^hpcc[0-9]*$" -e "^aws-") ]]; then
    ... other modules ...
    module load R/R-4.2.2
    module load gcc/gcc-11.3.0
fi
## END MPI SELECTION

If you don’t want to permanently set your R & GCC versions for all sessions (qlogin or qsub), which could be useful in the long term for different projects, you can instead use those commands after qlogin, or within your qsub job script file before you run R.

Submitting R Jobs

Create R Commands File

Create a .R file with your commands, for example:

D <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2))
D

# Sort on x
indexes <- order(D$x)
D[indexes,]

# Print out sorted dataset, sorted in reverse by y
D[rev(order(D$y)),]

 Create R Job Script

Create a .sh file with at least the following contents:

#!/bin/bash

Rscript --no-save your-commands-file.R

 Submit R Job

More information: HPCC Job Management

Interactive R Sessions

  • Graphical: via our HPCC Desktop environment (recommended)
  • Textual:
    qrsh R --no-save

Installing R Packages

NOTE (2018-02-14): we no longer install all CRAN and BioConductor packages in the cluster!

Here’s how to install R packages in your home directory / shared workspace.

Manual Installation

  • log on to a compute node with qlogin
  • download the package with wget (if it’s not a CRAN package)
  • install (different depending on whether it’s CRAN or not)

Here are two examples (Copy/Pastable except for the package URL in wget and package name in R CMD INSTALL):

If it IS a CRAN package:

qlogin -now no
R --no-save
...
> install.packages("anRpackage")

If it’s the first time you have installed packages, or there’s a new version installed, you will see something like:

Installing package into ‘/usr/local/lib/R/library-X.X.X’
(as ‘lib’ is unspecified)
Warning in install.packages("anRpackage") :
  'lib = "/usr/local/lib/R/library-X.X.X"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-redhat-linux-gnu-library/X.X’
to install packages into? (yes/No/cancel) yes

Just answer ‘yes’ to both questions, and the defaults are advised.

If it’s NOT a CRAN package (be careful of your sources, of course!):

qlogin -now no
wget 'http://some.server.edu/anRpackage_version.tar.gz'
R CMD INSTALL -l $R_LIBS anRpackage_version.tar.gz
exit

Start R normally, and you should now be able to use the new package.

Automatic Installation of CRAN Packages in Your Code

Warning: do not use this code in an array (-t X-X) job! The tasks will try to install to the same place at the same time. Run a separate single job first to install the packages, or do a manual install as above

Simply, we’ll test whether each of your required packages (in the myPKGs array) is already installed (either system wide, or in your personal R library), and if any are not installed, we’ll install them. This needs to be above your ‘library()’ calls in your R code:

# Array of packages to be checked and installed if not already
myPKGs <- c('PKG1', 'PKG2', ..., 'PKGN')

# check to see if each package is installed, and if not add it to a list to install
InstalledPKGs <- names(installed.packages()[,'Package'])
InstallThesePKGs <- myPKGs[!myPKGs %in% InstalledPKGs]

# install any needed packages
if (length(InstallThesePKGs) > 0) install.packages(InstallThesePKGs)

And if you want to get a list of all of the ‘library()’ and ‘require()’ lines in your .R code, mostly formatted for copy / paste into the myPKGs line, above, try something like:

grep -e library -e require *.R | awk -F'[()]' '{print $2}' | sed ':a;N;$!ba;s/\n/" , "/g'

General Package Installation Troubleshooting Tips

  • Most package installation issues are related to the R or GCC versions you are using. Upgrade your R & GCC versions, see “R (and GCC) Versions” above.
  • If your package installation is failing on a missing library (.so or .h file), you may need to load additional software. In a qlogin session, use the “module avail” package to see if it’s obvious (a common one is ‘gdal’).

Specific Package Notes for Wharton’s HPCC

Certain commonly-used packages are tricky to install on the HPCC. Here are some that we have helped our users install, and how to install them in the HPCC. If you run into difficulties with other packages not listed here, just let us know what package you are trying to install, and we will help.

rstan Package

rstan is one of the trickier packages to install, due to specific C++ variables, and the need for a more-recent C/C++ version. A more-recent R version is recommended, as well.

Edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:

if [[ $(hostname -s | grep -e "^aws-" -e "^hpcc[0-9]*$") ]]; then
    module load mpi/openmpi-x86_64
    module load R/R-4.0.4
    module load gcc/gcc-9.2.0
fi

Then you need to set up two specific R ‘Makevars’ before installation. That goes in your ~/.R/Makevars file, which should look like:

CXX14FLAGS=-O3 -march=native -mtune=native -fPIC
CXX14=g++

To do that, from the command line, do:

mkdir ~/.R
echo -e "CXX14FLAGS=-O3 -march=native -mtune=native -fPIC\nCXX14=g++" >> ~/.R/Makevars

Now you are ready to log on to an HPCC compute node, and install rstan.

qlogin -now no
R --no-save
> install.packages("rstan", repos = "https://cloud.r-project.org/")
Warning in install.packages("rstan", repos = "https://cloud.r-project.org/") :
'lib = "/usr/local/R-4.0.4/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/4.0’
to install packages into? (yes/No/cancel) yes

This will take a long time (~30 minutes). Once installation is complete, you can test that rstan is working correctly:

> library(rstan)
> options(mc.cores = strtoi(Sys.getenv('OMP_NUM_THREADS')))
> rstan_options(auto_write = TRUE)
> example(stan_model, package = "rstan", run.dontrun = TRUE)

Please Note: the ‘options(mc.cores = strtoi(Sys.getenv(‘OMP_NUM_THREADS’)))‘!! For best performance, ALWAYS use that option for cores on Wharton’s HPCC systems. Do not use ‘detectCores()‘, as this will be inaccurate, and degrade your performance. By default, OMP_NUM_THREADS = 1. To use more than 1 core for a job, you will need to request it at job start, either at the command line:

qsub -pe openmp 4

Or in your job script:

#$ -pe openmp 4

sf Package

sf requires some very specific libraries to be available before installing, or using, as well as a newer C/C++ version. To do that, edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:

if [[ $(hostname -s | grep -e "^aws-" -e "^hpcc[0-9]*$") ]]; then
    module load mpi/openmpi-x86_64
    module load R/R-4.0.4
    module load gcc/gcc-9.2.0
    module load gdal
fi

That will load all required library paths, and then you can install the sf package, per normal installation methods:

qlogin -now no
R --no-save
> install.packages("sf")

This can take a while, as sf has a number of dependencies to install prior to its own installation. After installation, to run some example code, also install sp package:

> install.packages("sp")

Then the examples at https://cran.r-project.org/web/packages/sf/sf.pdf should run correctly.