AWS Athena ODBC Connections with Pennkey Login on the HPCC

If you desire to interact with this data directly using R, Python, Stata or Matlab using an ODBC connection please contact us for access.

To get started you can run the following code on the HPCC.

qlogin
aws-federated-auth --list

and log in with your Pennkey and Password, with 2FA.

Your output should include a line like(not that in most cases the rolename will be “<your pennkey>-role”):

wharton-tweet-collection-<rolename>                                     3600 063038240025   <rolename>

You can now run a command to generate Session tokens that last up to 12 hours.

$aws-federated-auth --rolename <rolename from above>

This will add a new line to your ~/.aws/credentials file, but most importantly copy the PROFILE NAME; you’ll need it later. It should be something like 063038240025-zamechek-role .

You will now need to define a new ODBC connection in your ~/.odbc.ini file.

vi ~/.odbc.ini

and add the following:

[AthenaConnection]
Description             = Twitter AWS Athena Connector
Workgroup               = <pennkey>-workgroup
Driver                  = /usr/local/athenaodbc/lib/64/libathenaodbc_sb64.so
AwsRegion               = us-east-1
S3OutputLocation        = s3://edu-upenn-wharton-randa-tweet-enhanced-results-<pennkey>
AuthenticationType      = IAM Profile
AWSProfile              = <the PROFILE NAME you copied ealier>

You should now be able to connect to this database using the instructions here. Feel free to test your connection:

isql AthenaConnection
SQL> SELECT id FROM "transformed"."transformed" limit 10 /* note, do not include a semicolon in this environment */
+---------------------+
| id                  |
+---------------------+
| 435708300923904000  |
| 1111078960366944256 |
| 1111078960383692800 |
| 1111078960392081408 |
| 1111078960375521280 |
| 1111078960379629568 |
| 1111078960387887104 |
| 1111078960404676608 |
| 1111078960383905794 |
| 1111078960396410880 |
+---------------------+
SQLRowCount returns -1
10 rows fetched

You can reauthenticate by running:

qlogin
aws-federated-auth --rolename <rolename from above>

Working with AWSWrangler and Pandas

You can use AWS Data Wrangler to interact with the database. Just setup a Python virtual environment.

qlogin
aws-federated-auth
# login in with your pennkey and take note of the profile name
module load python/3.10.8
module load gcc/11.3.0   # <- also a more recent compiler!
python -m venv venv
source venv/bin/activate
pip install -U pip
pip install -U wheel setuptools
pip install -U awswrangler ipython boto3
export AWS_PROFILE=<PROFILE NAME FROM aws-federated-auth STEP>
ipython

In [1]: import awswrangler as wr

In [2]: df=wr.athena.read_sql_query(
...: sql='SELECT * FROM "transformed" limit 10;',
...: database="transformed", workgroup="LIKELY YOUR PENNKEY-GROUP", ctas_approach=False)

In [3]: df.head()
Out[3]:
created_at timestamp_ms id extended_full_text ... year month day hour
0 Sun Aug 25 22:28:32 +0000 2019 1566772112658 1165752849667624961 RT @removed: 有開車開到一半被主人抓去後車廂插的經驗嗎⋯我有,你們呢?... ... 2019 08 25 22
1 Sun Aug 25 22:28:32 +0000 2019 1566772112659 1165752849671888901 RT @trvpism: the goal is to make yourself happ... ... 2019 08 25 22
2 Sun Aug 25 22:28:32 +0000 2019 1566772112666 1165752849701249025 @CaseyHill23 Ariana grande, shes her own color ... 2019 08 25 22
3 Sun Aug 25 22:28:32 +0000 2019 1566772112665 1165752849697034250 RT @aputariaavulsa: mood https://t.co/T7CouFtvET ... 2019 08 25 22
4 Sun Aug 25 22:28:32 +0000 2019 1566772112660 1165752849676083201 RT @leylikamerdin: Cenazemize, cenazelerimize ... ... 2019 08 25 22

[5 rows x 60 columns]

Happy querying.