AWS Athena ODBC Connections with Pennkey Login on the HPCC

If you desire to interact with this data directly using R, Python, Stata or Matlab using an ODBC connection please contact us for access.

Before you request access you should create a Upenn-Bitbucket account if you don’t already have one.

  1. Visit https://bitbucket.org/ and login using your <pennkey username>@upenn.edu
  2. Visit https://bitbucket.org/account/settings/ssh-keys/ and add a new key
  3. Copy the result of cat ~/.ssh/id_dsa.pub  from the HPCC and paste in the new key input prompt on Bitbucket

After we have granted you read-only access to the aws-federated-auth repo you can follow these instructions. They are designed to work on the HPCC.

qlogin -now no
mkdir -p ~/.aws && touch credentials
git clone git@bitbucket.org:codeforpenn/aws-federated-auth.git
cd aws-federated-auth/
source /opt/rh/rh-python36/enable
virtualenv venv36
source venv36/bin/activate
pip install -U pip
pip install -r requirements.txt
python aws-federated-auth.py --list

and log in with your Pennkey and Password(2FA if enable).

Your output should include a line like(not that in most cases the rolename will be “<your pennkey>-role”):

wharton-tweet-collection-<rolename>                                     3600 063038240025   <rolename>

You can now run a command to generate Session tokens that last up to 12 hours.

aws-federated-auth]$ python aws-federated-auth.py --rolename <rolename from above> --duration 43200

This will add a new line to your ~/.aws/credentials file, but most importantly copy the PROFILE NAME; you’ll need it later. It should be something like 063038240025-zamechek-role .

You will now need to define a new ODBC connection in your ~/.odbc.ini file.

vi ~/.odbc.ini

and add the following:

[AthenaConnection]
Description             = Twitter AWS Athena Connector
Workgroup               = <pennkey>-workgroup
Driver                  = /usr/local/athenaodbc/lib/64/libathenaodbc_sb64.so
AwsRegion               = us-east-1
S3OutputLocation        = s3://edu-upenn-wharton-randa-tweet-enhanced-results-<pennkey>
AuthenticationType      = IAM Profile
AWSProfile              = <the PROFILE NAME you copied ealier>

You should now be able to connect to this database using the instructions here. Feel free to test your connection:

isql AthenaConnection
SQL> SELECT id FROM "transformed"."transformed" limit 10 /* note, do not include a semicolon in this environment */
+---------------------+
| id                  |
+---------------------+
| 435708300923904000  |
| 1111078960366944256 |
| 1111078960383692800 |
| 1111078960392081408 |
| 1111078960375521280 |
| 1111078960379629568 |
| 1111078960387887104 |
| 1111078960404676608 |
| 1111078960383905794 |
| 1111078960396410880 |
+---------------------+
SQLRowCount returns -1
10 rows fetched

You can reauthenticate by running:

qlogin -now no
cd aws-federated-auth/ 
source /opt/rh/rh-python36/enable
source venv36/bin/activate
python aws-federated-auth.py --rolename <rolename from above> --duration 43200

Happy querying.