Data Protection

Wharton’s user data space (/home and /data) are built on redundant, network attached storage, making it a reliable, safe place for you to store your data and code.

Snapshots

Our networked file systems include automated hourly and daily ‘snapshots’ of all user data space, providing our users with multiple user-accessible copies of all of their data. We snapshot:

  • daily and always have the last seven (7) daily snapshots available
  • weekly and always have the last five (5) weekly snapshots available

Reserve Snapshots

In addition to user-accessible snapshots we also perform daily syncing to a secondary filesystem, which is also snapshotted as above. This is for disaster recovery (if our primary network storage has problems that we cannot resolve).

Cloud Backups

We are in the process of creating 3rd tier backups to AWS S3, and will update this post when they are in production. These are archival in nature, and will likely contain copies of weekly snapshots for at least six (6) months (26 weeks).

Recovering Files

Depending on what you need to recover, there are two basic paths to file recovery in our environment.

Self Service

Most file recovery can be accomplished by you, the user, allowing for the fastest and most selective recovery.

Our snapshot system creates .snapshot directories in each departmental directory (/home/fnce, /home/hcmg, etc.). Because of the ‘.’ (dot), these files are hidden when you do an ‘ls‘ in the departmental directory. Trust us: they are there.

So to explore and recover files, log on to the HPCC with your ssh client, and ‘cd‘ into your departmental .snapshot directory:

$ cd ../.snapshot
$  ls -1
daily.2016-03-24_0010
daily.2016-03-25_0010
daily.2016-03-26_0010
daily.2016-03-27_0010
daily.2016-03-28_0010
daily.2016-03-29_0010
daily.2016-03-30_0010
daily.2016-03-31_0010
daily.2016-04-01_0010
daily.2016-04-02_0010
hourly.2016-04-02_0605
hourly.2016-04-02_0705
hourly.2016-04-02_0805
hourly.2016-04-02_0905
$ cd daily.2016-04-01_0010/hughmac
$ ls *.sh
script1.sh    script2.sh   script3.sh
$ cp script2.sh ~

That’s it! You may also be able to use an SFTP client to do recovery … remember the ‘.’ (dot), which will make the .snapshot directory invisible. Trust that it’s there!

Imbox_content
NOTE: Permissions in .snapshot directories are identical to those in your ‘normal’ directories. Only those with proper permissions can browse and restore your files.

Assisted Recovery (Tape)

If the Self Service method (above) isn’t adequate — generally if the files have been out of the user space for more than 10 days — please contact research-computing@wharton.upenn.edu with as much detail as you can provide. Path and name of files, and when the files were last in your user space are the most important details.

Tape recovery can take some time, as it generally requires requesting tapes from off site, loading, and (relatively) time-expensive scanning of tapes and writing them out. We thank you for your patience!

Other Methods

We recommend that you also use a Repository Service (Version Control) or Dropbox syncing to manage your code. Both of these methods provide the ability to recover multiple file versions, along with other valuable features.