Storage

Data Protection

All user data (/home & /data) are protected by Snapshotting and traditional Tape Backups. Please see our Data Protection Page for details on the infrastructure as well as how to recover files.

Quotas

The Grid has storage connected to each compute node to work with files. Your main area for file storage is your “home” directory: /home/department/username (sometimes abbreviated as ~, also available as the $HOME variable). You can get the full path to your home directory using the command: echo $HOME

While the Grid has dedicated storage (in addition to WRDS data access), resources are not unlimited. Therefore, each user and department as a whole has a quota or limit on the size of files allowed. The default quotas are:

  • Per User: 100GB (sofware quotas)
  • Per Department: 1TB (hard volume size)

RAs and Co-Authors share this storage with their sponsor. These limits can be adjusted above the defaults by special request and chargeback.

To check on your quota usage, use the following command: quota

Daily Warning E-mails

On a daily basis all users’ quotas are checked, and if they are above 96% (default) the primary user and all group members (RAs, Co-Authors) associated with the primary user are e-mailed with a warning message with details at their @wharton.upenn.edu address.

Adjusting Warning Triggers

Primary users can adjust the percentage of their quota at which they receive these messages, or to whom these messages are sent, by creating and/or editing a .quotacheck file in their home directory. The primary user and all group members (RAs, Co-Authors) will always be notified via their @wharton.upenn.edu e-mail addresses, but you can add additional users, and/or modify the warning level as below:

WARN_LEVEL 85
NOTIFY someotheruser@gmail.com,myphonenumber@myphonecarrier.com

Increasing Quotas

If you find that you need more than the default 100GB of storage space, please get in touch with the HPC Admin team with details about your needs. Generally this is a fee-based service, chargeable annually to a Penn budget code, but depending on your requirements and status we do award Storage Grants as well.

Size Cost
< 1TB $0.44/GB/year
1TB-9TB $400/TB/year
10TB+ $350/TB/year

 

Transferring Files

There are two currently supported file transfer protocols for moving files on and off the Grid:

  • SSH-FTP (SSH File Transfer Protocol, not Secure FTP)
  • SMB (Windows file sharing)

SSH-FTP is implemented in the command scp already installed on Mac OS X and Linux and many file transfer clients available (for any OS). SMB is commonly known as the Windows file sharing protocol, but is also implemented in other operating systems like Mac OS X and Linux.

Please Note: By design, you can only move files to and from your home directory. Each of the compute nodes has access to this directory.

For either protocol, your username and password are your Wharton domain credentials.

Windows File Sharing

The HPCC has Samba/CIFS file sharing enabled for on-campus (including VPN) file system browsing and usage.

On Windows

For Wharton DOMAIN-joined systems (usually departmental desktops only):

Type \\hpcc.wharton.upenn.edu\username\ in the address bar of a Windows Explorer window

for non-DOMAIN-joined systems:

Start > right-click Computer > Map network drive… and check the ‘Connect using different credentials’ option
Specify “username@wharton.upenn.edu” (without the quotes) as your username, NOT just “username” (or some local username)

On Apple OSX

From Finder, open the Go menu > Connect to Server (or Cmd-K)
Address: smb://hpcc.wharton.upenn.edu/username

On Linux

There are enough flavors of Linux that it’s difficult to document the ‘right way’. Essentially, connect to smb://hpcc.wharton.upenn.edu/username, and if you need assistance just let us know.

SSH-FTP

If you are using a file transfer client (such as SecureCRT, etc) use the following settings:

  • File Transfer Protocol (if selectable): SSH-FTP
  • Host: hpcc.wharton.upenn.edu

The first time you login, you may receive a message similar to “Host key not found from list of known hosts. Are you sure you want to continue connecting?” Answer yes to make the connection. You should not receive this message on subsequent connections.

Please Note: if you are transferring a file from Windows, you should set the transfer mode to ASCII for program and job script files, otherwise line breaks will not translate correctly. If you find a file with incorrect line breaks from Windows, you can use the dos2unix filename command to fix it.

If you are using the command scp, the format is: scp source-filename target-filename. When referencing a remote file, you must use the full syntax for the file (username@remote-system:/remote-filename). Some examples:

  • From your local computer, copy a file from your local computer to the Grid: scp local-file username@hpcc.wharton.upenn.edu:/home/department/username/grid-file
  • From your local computer, copy a file from the Grid to your local computer: scp username@hpcc.wharton.upenn.edu:/home/department/username/grid-file local-file
  • From the Grid (if you have SSH properly setup on your local system), copy a file from your local computer to the Grid: scp local-username@your-system:/local-file grid-file

Please Note: if you are transferring files a large amount of files between your local computer and the Grid, it is much more efficient to tar or zip them into a single file and untarring or unzipping once its transfered. For more details, check out the manual pages for the commands: man tar / man zip / man unzip.

Advanced Users: you may be able to improve the scp transfer rate by choosing the blowfish encryption method rather than using the default. To do this, use: scp -c blowfish local-file username@remote-system:/remote-file