R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
More information can be found at: http://www.r-project.org/
R on the HPCC
R is installed on all HPCC compute hosts.
R (and GCC) Versions
R version 3.6.0 and GCC version 4.8.5 are the current system default, but these are now quite old. Many R packages will no longer properly install in that combination of versions. We highly recommend using newer versions of R and GCC!
To list the available versions:
qlogin -now no module avail R -------------------------------------- /etc/modulefiles --------------------------------------- R/R-3.5.1 R/R-3.6.1 R/R-3.6.3 R/R-4.0.2 R/R-4.0.4 R/R-4.1.2 R/R-4.2.2 module avail gcc -------------------------------------- /etc/modulefiles --------------------------------------- gcc/gcc-10.3.0 gcc/gcc-11.3.0 gcc/gcc-7.5.0 gcc/gcc-9.2.0 gcc/gcc-11.1.0 gcc/gcc-6.3.0 gcc/gcc-8.2.0
Note: there may be more-recent versions, as well.
To use these newer version (and recommended latest GCC compiler), please modify your ~/.bashrc file as follows:
## START MPI SELECTION if [[ $(hostname -s | grep -e "^hpcc[0-9]*$" -e "^aws-") ]]; then ... other modules ... module load R/R-4.2.2 module load gcc/gcc-11.3.0 fi ## END MPI SELECTION
If you don’t want to permanently set your R & GCC versions for all sessions (qlogin or qsub), which could be useful in the long term for different projects, you can instead use those commands after qlogin, or within your qsub job script file before you run R.
Submitting R Jobs
Create R Commands File
Create a .R file with your commands, for example:
D <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2)) D # Sort on x indexes <- order(D$x) D[indexes,] # Print out sorted dataset, sorted in reverse by y D[rev(order(D$y)),]
Create R Job Script
Create a .sh file with at least the following contents:
#!/bin/bash Rscript --no-save your-commands-file.R
Submit R Job
More information: HPCC Job Management
Interactive R Sessions
- Graphical: via our HPCC Desktop environment (recommended)
- Textual:
qrsh R --no-save
Installing R Packages
NOTE (2018-02-14): we no longer install all CRAN and BioConductor packages in the cluster!
Here’s how to install R packages in your home directory / shared workspace.
Manual Installation
- log on to a compute node with qlogin
- download the package with wget (if it’s not a CRAN package)
- install (different depending on whether it’s CRAN or not)
Here are two examples (Copy/Pastable except for the package URL in wget and package name in R CMD INSTALL):
If it IS a CRAN package:
qlogin -now no R --no-save ... > install.packages("anRpackage")
If it’s the first time you have installed packages, or there’s a new version installed, you will see something like:
Installing package into ‘/usr/local/lib/R/library-X.X.X’ (as ‘lib’ is unspecified) Warning in install.packages("anRpackage") : 'lib = "/usr/local/lib/R/library-X.X.X"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-redhat-linux-gnu-library/X.X’ to install packages into? (yes/No/cancel) yes
Just answer ‘yes’ to both questions, and the defaults are advised.
If it’s NOT a CRAN package (be careful of your sources, of course!):
qlogin -now no wget 'http://some.server.edu/anRpackage_version.tar.gz' R CMD INSTALL -l $R_LIBS anRpackage_version.tar.gz exit
Start R normally, and you should now be able to use the new package.
Automatic Installation of CRAN Packages in Your Code
Warning: do not use this code in an array (-t X-X) job! The tasks will try to install to the same place at the same time. Run a separate single job first to install the packages, or do a manual install as above
Simply, we’ll test whether each of your required packages (in the myPKGs array) is already installed (either system wide, or in your personal R library), and if any are not installed, we’ll install them. This needs to be above your ‘library()’ calls in your R code:
# Array of packages to be checked and installed if not already myPKGs <- c('PKG1', 'PKG2', ..., 'PKGN') # check to see if each package is installed, and if not add it to a list to install InstalledPKGs <- names(installed.packages()[,'Package']) InstallThesePKGs <- myPKGs[!myPKGs %in% InstalledPKGs] # install any needed packages if (length(InstallThesePKGs) > 0) install.packages(InstallThesePKGs)
And if you want to get a list of all of the ‘library()’ and ‘require()’ lines in your .R code, mostly formatted for copy / paste into the myPKGs line, above, try something like:
grep -e library -e require *.R | awk -F'[()]' '{print $2}' | sed ':a;N;$!ba;s/\n/" , "/g'
General Package Installation Troubleshooting Tips
- Most package installation issues are related to the R or GCC versions you are using. Upgrade your R & GCC versions, see “R (and GCC) Versions” above.
- If your package installation is failing on a missing library (.so or .h file), you may need to load additional software. In a qlogin session, use the “module avail” package to see if it’s obvious (a common one is ‘gdal’).
Specific Package Notes for Wharton’s HPCC
Certain commonly-used packages are tricky to install on the HPCC. Here are some that we have helped our users install, and how to install them in the HPCC. If you run into difficulties with other packages not listed here, just let us know what package you are trying to install, and we will help.
rstan Package
rstan is one of the trickier packages to install, due to specific C++ variables, and the need for a more-recent C/C++ version. A more-recent R version is recommended, as well.
Edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:
if [[ $(hostname -s | grep -e "^aws-" -e "^hpcc[0-9]*$") ]]; then module load mpi/openmpi-x86_64 module load R/R-4.0.4 module load gcc/gcc-9.2.0 fi
Then you need to set up two specific R ‘Makevars’ before installation. That goes in your ~/.R/Makevars file, which should look like:
CXX14FLAGS=-O3 -march=native -mtune=native -fPIC CXX14=g++
To do that, from the command line, do:
mkdir ~/.R echo -e "CXX14FLAGS=-O3 -march=native -mtune=native -fPIC\nCXX14=g++" >> ~/.R/Makevars
Now you are ready to log on to an HPCC compute node, and install rstan.
qlogin -now no R --no-save > install.packages("rstan", repos = "https://cloud.r-project.org/") Warning in install.packages("rstan", repos = "https://cloud.r-project.org/") : 'lib = "/usr/local/R-4.0.4/lib64/R/library"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-pc-linux-gnu-library/4.0’ to install packages into? (yes/No/cancel) yes
This will take a long time (~30 minutes). Once installation is complete, you can test that rstan is working correctly:
> library(rstan) > options(mc.cores = strtoi(Sys.getenv('OMP_NUM_THREADS'))) > rstan_options(auto_write = TRUE) > example(stan_model, package = "rstan", run.dontrun = TRUE)
Please Note: the ‘options(mc.cores = strtoi(Sys.getenv(‘OMP_NUM_THREADS’)))‘!! For best performance, ALWAYS use that option for cores on Wharton’s HPCC systems. Do not use ‘detectCores()‘, as this will be inaccurate, and degrade your performance. By default, OMP_NUM_THREADS = 1. To use more than 1 core for a job, you will need to request it at job start, either at the command line:
qsub -pe openmp 4
Or in your job script:
#$ -pe openmp 4
sf Package
sf requires some very specific libraries to be available before installing, or using, as well as a newer C/C++ version. To do that, edit your ~/.bashrc file (‘nano ~/.bashrc‘ is an easy way to do this), and add (or uncomment) the following ‘module load ..‘ lines:
if [[ $(hostname -s | grep -e "^aws-" -e "^hpcc[0-9]*$") ]]; then module load mpi/openmpi-x86_64 module load R/R-4.0.4 module load gcc/gcc-9.2.0 module load gdal fi
That will load all required library paths, and then you can install the sf package, per normal installation methods:
qlogin -now no R --no-save > install.packages("sf")
This can take a while, as sf has a number of dependencies to install prior to its own installation. After installation, to run some example code, also install sp package:
> install.packages("sp")
Then the examples at https://cran.r-project.org/web/packages/sf/sf.pdf should run correctly.