If you desire to interact with this data directly using R, Python, Stata or Matlab using an ODBC connection please contact us for access.
Before you request access you should create a Upenn-Bitbucket account if you don’t already have one.
- Visit https://bitbucket.org/ and login using your <pennkey username>@upenn.edu
- Visit https://bitbucket.org/account/settings/ssh-keys/ and add a new key
- Copy the result of cat ~/.ssh/id_dsa.pub from the HPCC and paste in the new key input prompt on Bitbucket
After we have granted you read-only access to the aws-federated-auth repo you can follow these instructions. They are designed to work on the HPCC.
qlogin -now no mkdir -p ~/.aws && touch credentials mv credentials ~/.aws/ git clone git@bitbucket.org:codeforpenn/aws-federated-auth.git cd aws-federated-auth/ source /opt/rh/rh-python36/enable virtualenv venv36 source venv36/bin/activate pip install -U pip pip install -r requirements.txt python aws-federated-auth.py --list
and log in with your Pennkey and Password(2FA if enable).
Your output should include a line like(not that in most cases the rolename will be “<your pennkey>-role”):
wharton-tweet-collection-<rolename> 3600 063038240025 <rolename>
You can now run a command to generate Session tokens that last up to 12 hours.
aws-federated-auth]$ python aws-federated-auth.py --rolename <rolename from above> --duration 43200
This will add a new line to your ~/.aws/credentials file, but most importantly copy the PROFILE NAME; you’ll need it later. It should be something like 063038240025-zamechek-role .
You will now need to define a new ODBC connection in your ~/.odbc.ini file.
vi ~/.odbc.ini
and add the following:
[AthenaConnection] Description = Twitter AWS Athena Connector Workgroup = <pennkey>-workgroup Driver = /usr/local/athenaodbc/lib/64/libathenaodbc_sb64.so AwsRegion = us-east-1 S3OutputLocation = s3://edu-upenn-wharton-randa-tweet-enhanced-results-<pennkey> AuthenticationType = IAM Profile AWSProfile = <the PROFILE NAME you copied ealier>
You should now be able to connect to this database using the instructions here. Feel free to test your connection:
isql AthenaConnection SQL> SELECT id FROM "transformed"."transformed" limit 10 /* note, do not include a semicolon in this environment */ +---------------------+ | id | +---------------------+ | 435708300923904000 | | 1111078960366944256 | | 1111078960383692800 | | 1111078960392081408 | | 1111078960375521280 | | 1111078960379629568 | | 1111078960387887104 | | 1111078960404676608 | | 1111078960383905794 | | 1111078960396410880 | +---------------------+ SQLRowCount returns -1 10 rows fetched
You can reauthenticate by running:
qlogin -now no cd aws-federated-auth/ source /opt/rh/rh-python36/enable source venv36/bin/activate python aws-federated-auth.py --rolename <rolename from above> --duration 43200
Happy querying.