Rclone

rclone is a command line program to copy files and directories to and from a large number of cloud storage solutions, including Box & Dropbox, S3, OneDrive, Google Drive, and many others. It is a straightforward and solid, yet powerful tool for use in our clustered environment, providing copy up and down service to and from our your favorite storage solution.

rclone is installed across Wharton’s HPC compute (only!) systems, and can be run from any compute server, either in a qlogin session, or in a job script.

Configuration

Before you begin copying files, you will need to configure rclone. It can be a bit challenging, but worth the effort.

Local Install

First, you’ll need to install rclone on your local computer. We will use this version of rclone to authorize your HPC3 configuration setup (create a token). Download rclone to match your operating system from here: https://rclone.org/downloads/, and install it! In MacOS, you will likely need to grant permission for rclone to run.

HPC3 Configuration

After you install rclone on your local computer, log on to the Wharton HPC3 via SSH, and then:

$ qlogin
...
$ rclone config
2018/05/15 14:48:06 NOTICE: Config file "/your/homedir/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> Dropbox
Type of storage to configure.
Choose a number from below, or type in your own value
  1 / Alias for a existing remote
   \ "alias"
...
 10 / Dropbox
   \ "dropbox"
...
Storage> 10
OAuth Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id> ENTER
OAuth Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret> ENTER
Edit advanced config?
y) Yes
n) No (default)
y/n> ENTER
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine

y) Yes (default)
n) No
y/n> n    # <- NOTE THE NON-DEFAULT 'n' HERE!!!

For this to work, you will need rclone available on a machine that has a web browser available.

For more help and alternate methods see: https://rclone.org/remote_setup/

Execute the following on the machine with the web browser (same rclone
version recommended):

rclone authorize "dropbox"

Then paste the result.

Enter a string value. Press Enter for the default ("").
config_token>

Do what it says: on your local computer, run rclone authorize "REMOTE_TYPE", something like this:

$ rclone authorize "dropbox"
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth?state=HTWPUpbZDCBL8YxL-Defyw
Log in and authorize rclone for access
Waiting for code...
Got code
Paste the following into your remote machine --->
{"access_token":"MYCODEHERE","token_type":"bearer","expiry":"0001-01-01T00:00:00Z"}
<---End paste

Copy the entire curly-bracketed {"access_token": ... } line, and copy / paste it into the waiting config_token> line.

That's it for setup!

Reauthorization

Reauthorization is similar to the above, except instead of a new (n) remote, you will edit (e) your existing remote. You will need local rclone installed, then on the HPC3:

$ qlogin
$ rclone config
Current remotes:

Name Type
==== ====
Box box
...

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> e
Choose a number from below, or type in an existing value
1 > Box
...
remote> 1
--------------------
[Box]
type = box
token = {"access_token":"MYTOKEN_SECRET"}
--------------------
Edit remote

PRESS ENTER through all promts until:

Already have a token - refresh?
y) Yes (default)
n) No
y/n>
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine

y) Yes (default)
n) No
y/n> n

Then do the same thing you did locally, above, when initially configuring rclone (rclone authorize "dropbox", or whatever is prompted depending on your cloud storage solution), and copy / paste the token.

Syncing Files

While I say 'syncing' and 'sync', please use the 'copy' command for best results!! The best documentation on usage is rclone's own documentation. The above video demos a few commands, as well.

I recommend that you have a single folder within your Box / Dropbox dedicated to your HPCC research files. For example, I have an HPCC folder in my Box account. On the HPCC, I want to locate it at ~/Box/HPCC, so I copy it down from Box:

$ qlogin
$ rclone copy Box:HPCC ~/Box/HPCC -u -v
2018/05/15 15:05:46 INFO : Local file system at /home/wcit/hughmac/Box/HPCC: Modify window is 1s
2018/05/15 15:05:46 INFO : Local file system at /home/wcit/hughmac/Box/HPCC: Waiting for checks to finish
2018/05/15 15:05:46 INFO : Local file system at /home/wcit/hughmac/Box/HPCC: Waiting for transfers to finish
2018/05/15 15:05:47 INFO : README.md: Copied (new)
2018/05/15 15:05:47 INFO : Waiting for deletions to finish
2018/05/15 15:05:47 INFO :
Transferred: 80 Bytes (37 Bytes/s)
Errors: 0
Checks: 0
Transferred: 1
Elapsed time: 2.1s
$ find ~/Box
/home/wcit/hughmac/Box
/home/wcit/hughmac/Box/HPCC
/home/wcit/hughmac/Box/HPCC/README.md

I have created command line 'aliases' to assist with this process. In my ~/.bashrc file, I put:

alias boxdn='rclone copy Box:HPCC ~/Box/HPCC -u -v'
alias boxup='rclone copy ~/Box/HPCC Box:HPCC -u -v'
shopt -s expand_aliases   # <- this is so that your aliases are expanded in job scripts

I logged out and back in, and now to copy my ~/Box/HPCC directory up to the HPCC directory in my Box cloud account, I just type 'boxup', and 'boxdn' to copy in the other direction. And when I run a job script, I can just add 'boxup' on the line after I've done the work and written the output, and Rclone will copy my files to my Box account!

For example:

#!/bin/bash
#$ -N myjob
#$ -j y
python output_data_to_Box_HPCC_dir.py
boxup

TIP: if you're running in interactive mode, most software products have a way to run a system or os command. Take Stata for example, which uses 'shell' or '!', like so:

. ! rclone copy ~/Box/HPCC Box:HPCC -u -v
2018/05/15 15:25:31 INFO  : box root 'HPCC': Modify window is 1s
2018/05/15 15:25:31 INFO  : box root 'HPCC': Waiting for checks to finish
2018/05/15 15:25:31 INFO  : box root 'HPCC': Waiting for transfers to finish
2018/05/15 15:25:31 INFO  : Waiting for deletions to finish
2018/05/15 15:25:31 INFO  :
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 1
Transferred:            0
Elapsed time:        1.2s

# use some data in Stata
. sysuse auto.dta
(1978 Automobile Data)

# save the data to Box directory
. save ~/Box/HPCC/auto.dta
file ~/Box/HPCC/auto.dta saved

# copy the directory back up to Box
. ! rclone copy ~/Box/HPCC Box:HPCC -u

Unfortunately, aliases aren't 'active' in Stata 'shell', I don't know why (let me know if you do!). You could write a DO files, like:

program boxdn
    ! rclone copy Box:HPCC ~/Box/HPCC -u -v
    end
program boxup
    ! rclone copy ~/Box/HPCC Box:HPCC -u -v
    end

Stash it somewhere (like '~/ado/rclone.do'), and call it from Stata like:

. run ~/ado/rclone.do
. boxup
2018/05/15 15:39:30 INFO : box root 'HPCC': Modify window is 1s
2018/05/15 15:39:31 INFO : box root 'HPCC': Waiting for checks to finish
2018/05/15 15:39:31 INFO : box root 'HPCC': Waiting for transfers to finish
2018/05/15 15:39:31 INFO : Waiting for deletions to finish
2018/05/15 15:39:31 INFO : Transferred: 0 Bytes (0 Bytes/s) Errors: 0 Checks: 1 Transferred: 0 Elapsed time: 1.1s

For more advanced use, I recommend looking through the vendor's documentation.