Cloud Bursting

The Wharton High Performance Computing Cluster (HPCC) now offers cloud bursting, thanks to Univa UniCloud. Cloud bursting is a hybrid model that augments our local compute resources with those of Amazon EC2. With UniCloud, the HPCC system becomes a springboard to simultaneously launch as many computationally intensive jobs as required. This is all done behind the scenes, from the normal Wharton HPCC login, without the overhead of a completely different workflow.

Cloud bursting may be a good fit if you have…

  • a short term, high through-put job load
  • a need for GPGPU / CUDA programming with GPU accelerators / coprocessors
  • a number of long-running but low CPU jobs: ‘over-time’ web scraping, etc.
  • existing large data stores in Amazon S3

 

Setup

Cloud bursting requires a bit of setup ‘behind the scenes’. You and your team’s accounts (you can choose who) will be configured to be able to access a new, dedicated job queue on the Wharton High-Performance Computing Cluster (HPCC) as an on-demand computing resource for your team. Please contact research-computing@wharton.upenn.edu if you would like to get started.

Usage

Once the new job queue and your accounts have been configured, jobs on HPCC can be run in the cloud by simply replacing the familiar qsub and qlogin  commands with custom qsub-aws and qlogin-aws commands, as per normal job management, with a few differences…

  • Your team has exclusive and unlimited access to the cloud compute nodes, as many as you need (up to a pre-configured size, per your request)
  • We recommend staging and working with files in the /tmp directory for fast, local IO on cloud instances
  • If there are no active, running cloud jobs, there will be up to a ten-minute wait for the first instance to start, and jobs to begin to run
  • Amazon instances will be automatically shut down when idle for one billable hour to conserve budget

Example Job

Example of using S3 and $TMP for fast file IO

If cloud jobs will do any significant amount of reading or writing data, then those files must be on storage next to the CPU doing the IO. It is highly recommended to use the /tmp directory or the $TMP variable in job scripts on cloud instances. Since /tmp is indeed temporary and will not survive a reboot, we must stash files in AWS Simple Storage Service (S3) buckets. Sync files down from a bucket before computation, then sync resultant files up to a bucket when done. Your account will be pre-configured to use the AWS commandline tool (aws cli).

Work in the local temporary directory:

Name a bucket:

Make a bucket in Amazon S3:

Sync mydata directory on HPCC up to bucket:

List contents of bucket:

List contents of mydata directory object in bucket:

Sync mydata down from bucket to temporary directory on cloud node (in a job script, or by hand before starting jobs):

When working in /tmp, remember to copy small results back to the home directory, and/or sync large data files up to a bucket. Please see the aws s3 documentation for more options.

Cost

Cost is based on instance-hours and data transfer OUT (results). We currently recommend only doing compute-heavy jobs with little data, so data transfer (OUT) costs should be minimal.

Instance Types

While AWS EC2 has a large number of instance types, we generally recommend one of four, based on use case: c4.8xlarge, r3.8xlarge, g2.2xlarge, and g2.8xlarge. If you need or prefer something different for your AWS nodes, it’s trivial to accomodate you. A good ‘complete list’ is HERE.

Here’s a simple chart to help you calculate estimated cost of running work in your AWS Queue

Instance Type vCPU* RAM RAM / vCPU Local Storage GPUs AWS Price ($/hr) Your Price ($/hr)**
c4.8xlarge 36 60GB 1.66 0GB 0 $1.675 $1.675
r3.8xlarge 32 244GB 7.625 640GB 0 $2.660 $2.660
g2.2xlarge 8 15GB 1.875 60GB 1 $0.650 $0.650
g2.8xlarge 32 60GB 1.875 240GB 4 $2.600 $2.600

* The number of cores is actually 1/2 the number of vCPU* AWS EC2 Instances have ‘hyperthreading’ turned on. This means that:

  1. if you use the total number of vCPU on an instance, your jobs will generally take roughly 1.85x longer to run
  2. while the jobs take longer (1.85x), you are doing 2x the number of jobs on an instance (server), so it’s the most cost efficient
  3. if speed is the important thing and you’re willing to pay the small premium, we can set your nodes to use vCPU/2 per instance. Just ask!

** Currently the UPenn account that you sign up for this service with will bear the entire cost of your work in AWS. We are working on getting some grant funding (matching funds) to supplement your funds, but at this time (2016-09-16) this program is not in place.

Cost Calculation Example

I have 1000 non-GPU jobs I want to run in AWS. I have run a few of them in the local cluster to determine RAM usage, and run time. They use <1GB of RAM and 9 hours locally (qacct -j JOBID | grep -e maxvmem -e "^wallclock"), and time is not as important to me (I don’t need them to finish in 9 hours). So:

  • since 2GB RAM is < 60 GB/36 vCPU (1.66GB)*, the c4.8xlarge instance type will be most cost effective
  • 1000 jobs / 36 vCPU = 27.78 servers … round up to 28 instances (servers) needed
  • since we’re willing to use all 36 vCPUs, the time will be roughly 9 x 1.85**, or 16.65 hours / job
  • since 28 instances < max of 32, we can run them all at once … must round up time to nearest hour: 17 hours for run
  • final calculation: $1.675/hr/instance * 28 instances * 17 hours = $797.30

* If RAM > 1.66GB/job but < 3.34GB/job, instead of upping the job to the r3.8xlarge we would recommend running 18 jobs on an instance (NOT use the hyperthreading). > 3.34GB/job we can continue to adjust the jobs/instance count downward, and at some point it will be more cost effective to move to the r3.8xlarge. See the chart below.
* Keep in mind that 1.85x the speed might not be accurate. We run different processors locally than AWS runs in their servers, so times may vary. If you only have $800 for the project, or your project is dramatically larger, we recommend that you (or we can) do a test run (actually, just run the first 36 jobs on one instance) to find out actual times, for a more accurate estimate.

RAM per Job Calculations Table

NOTE: keep in mind that the time or number of servers (thus the cost) will go up as jobs/server goes down

c4.8xlarge r3.8xlarge
cost/server/hr RAM/server cost/server/hr RAM/server
$1.68 60GB $2.66 244GB
jobs/server cost/job/hr RAM/job cost/job/hr RAM/job
1 $1.68 60.0 $2.66 244.0
2 $0.84 30.0 $1.33 122.0
3 $0.56 20.0 $0.89 81.3
4 $0.42 15.0 $0.67 61.0
5 $0.34 12.0 $0.53 48.8
6 $0.28 10.0 $0.44 40.7
7 $0.24 8.6 $0.38 34.9
8 $0.21 7.5 $0.33 30.5
9 $0.19 6.7 $0.30 27.1
10 $0.17 6.0 $0.27 24.4
11 $0.15 5.5 $0.24 22.2
12 $0.14 5.0 $0.22 20.3
13 $0.13 4.6 $0.20 18.8
14 $0.12 4.3 $0.19 17.4
15 $0.11 4.0 $0.18 16.3
16 $0.10 3.8 $0.17 15.3
17 $0.10 3.5 $0.16 14.4
18 $0.09 3.3 $0.15 13.6
19 $0.09 3.2 $0.14 12.8
20 $0.08 3.0 $0.13 12.2
21 $0.08 2.9 $0.13 11.6
22 $0.08 2.7 $0.12 11.1
23 $0.07 2.6 $0.12 10.6
24 $0.07 2.5 $0.11 10.2
25 $0.07 2.4 $0.11 9.8
26 $0.06 2.3 $0.10 9.4
27 $0.06 2.2 $0.10 9.0
28 $0.06 2.1 $0.10 8.7
29 $0.06 2.1 $0.09 8.4
30 $0.06 2.0 $0.09 8.1
31 $0.05 1.9 $0.09 7.9
32 $0.05 1.9 $0.08 7.6
33 $0.05 1.8 $0.08
34 $0.05 1.8 $0.08
35 $0.05 1.7 $0.08
36 $0.05 1.7 $0.07