R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
More information can be found at: http://www.r-project.org/
R on the HPCC
R is installed on all HPCC compute hosts.
R Versions
Version 3.5.1 is the current system default. We also have R 3.6.1 & 3.6.3 installed for your use. To use these newer versions, please modify your ~/.bashrc file as follows:
## START MPI SELECTION if [[ $(hostname -s | grep -e "^hpcc[0-9]*$" -e "^aws-") ]]; then ... other modules ... module load R/R-3.6.3 module load gcc/gcc-9.2.0 fi ## END MPI SELECTION
Submitting R Jobs
Create R Commands File
Create a .R file with your commands, for example:
D <- data.frame(x=c(1,2,3,1), y=c(7,19,2,2)) D # Sort on x indexes <- order(D$x) D[indexes,] # Print out sorted dataset, sorted in reverse by y D[rev(order(D$y)),]
Create R Job Script
Create a .sh file with at least the following contents:
#!/bin/bash Rscript --no-save your-commands-file.R
Submit R Job
More information: HPCC Job Management
Interactive R Sessions
- Graphical: via our HPCC Desktop environment (recommended)
- Textual:
qrsh R --no-save
Installing R Packages
NOTE (2018-02-14): we no longer install all CRAN and BioConductor packages in the cluster!
Here’s how to install R packages in your home directory / shared workspace.
Manual Installation
- log on to a compute node with qlogin
- download the package with wget (if it’s not a CRAN package)
- install (different depending on whether it’s CRAN or not)
Here are two examples (Copy/Pastable except for the package URL in wget and package name in R CMD INSTALL):
If it IS a CRAN package:
qlogin -now no R --no-save ... > install.packages("anRpackage")
If it’s the first time you have installed packages, or there’s a new version installed, you will see something like:
Installing package into ‘/usr/local/lib/R/library-X.X.X’ (as ‘lib’ is unspecified) Warning in install.packages("anRpackage") : 'lib = "/usr/local/lib/R/library-X.X.X"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-redhat-linux-gnu-library/X.X’ to install packages into? (yes/No/cancel) yes
Just answer ‘yes’ to both questions, and the defaults are advised.
If it’s NOT a CRAN package (be careful of your sources, of course!):
qlogin -now no wget 'http://some.server.edu/anRpackage_version.tar.gz' R CMD INSTALL -l $R_LIBS anRpackage_version.tar.gz exit
Start R normally, and you should now be able to use the new package.
Automatic Installation of CRAN Packages in Your Code
Warning: do not use this code in an array (-t X-X) job! The tasks will try to install to the same place at the same time. Run a separate single job first to install the packages, or do a manual install as above
Simply, we’ll test whether each of your required packages (in the myPKGs array) is already installed (either system wide, or in your personal R library), and if any are not installed, we’ll install them. This needs to be above your ‘library()’ calls in your R code:
# Array of packages to be checked and installed if not already myPKGs <- c('PKG1', 'PKG2', ..., 'PKGN') # check to see if each package is installed, and if not add it to a list to install InstalledPKGs <- names(installed.packages()[,'Package']) InstallThesePKGs <- myPKGs[!myPKGs %in% InstalledPKGs] # install any needed packages if (length(InstallThesePKGs) > 0) install.packages(InstallThesePKGs)
And if you want to get a list of all of the ‘library()’ and ‘require()’ lines in your .R code, mostly formatted for copy / paste into the myPKGs line, above, try something like:
grep -e library -e require *.R | awk -F'[()]' '{print $2}' | sed ':a;N;$!ba;s/\n/" , "/g'