WRDS Data Directly from Python, R, and MATLAB

Using Wharton Research Computing’s new SAS/SHARE service, you can query WRDS datasets directly from your favorite research software, including Python, R, and MATLAB.

Getting data from the entire suite of WRDS data products and into the software you normally use for your research can be challenging. Many users who are unfamiliar with the SAS programming language spend a lot of time struggling with this unfamiliar software to simply gather a subset of a WRDS dataset, which they then save to their home directory space (wasting disk space resources), and then import the data into their software platform of choice for analysis.

There is a better way!

With guidance from the WRDS team, Wharton Research Computing now has our own SAS/SHARE server, which allows direct query of WRDS data via standard database queries. So instead of the multi-step and multi-software package process, you can work with WRDS data as objects in your language of choice.

Please note: due to its size, the NYSE TAQ dataset is not currently available through R at WRDS.

Prerequisite:

Encrypted Wharton Password

To work securely with SAS/SHARE in any language, you’ll want to use an encrypted version of your Wharton password instead of your normal text-based password. It is a violation of university policy to store your ‘normal’ Wharton text-based password in a file, so this step is required if you wish to use the SAS/SHARE service in a script file. Fortunately, it’s easy!

Log onto the HPCC login node at hpcc.wharton.upenn.edu and type:

This will launch the SAS software on a compute node, in interactive mode. After SAS loads, at the 1? prompt, enter:

Replace my fake password with your real password. Obviously, make sure you are in a private setting when you type this so that no one will be able to observe your real password.

You will be returned a SAS encrypted password, similar to:

COPY that entire line, ‘{SAS002}’ and all, and save it to a secure location. NOTE: your password may appear on more than one line. If it does, when you save it somewhere remove the line break. When you are done, press Ctrl-c on your keyboard to ‘break out of’ and close the running SAS software.

Set Up Your Research Software

You’re now ready to set up your research software: Python, R, or MATLAB. Each one is a bit different, so I will describe them separately. Please feel free to skip to the section for the software of your choice!

I will also include links to the WRDS documentation for each product, with the caveat that their systems and setups (particularly system names and file paths) are often different than those on the Wharton Research Computing systems.

Setting Up and Using WRDS with Python

This is a one time setup, unless your Wharton password changes.

Set Up Your Login Credentials

Using your Wharton username and the SAS encrypted password you created above, make a new file in the root of your home directory (~), called .wrdsauthrc, substituting in (copy/paste) your encrypted password:

Using the Python Setup

Testing and Using Interactively

In a qsub Python Script

Create a Python script (test_Py.py, in this example), something like:

Run with something like:

Check your output file:

Further Reading: WRDS and Python Documentation from the WRDS Team

Setting Up and Using WRDS with R

This is a one time setup, unless your Wharton password changes.

Set Up Your Login Credentials

Using your Wharton username and the SAS encrypted password you created above, edit (using the editor of your choice … in this case I used ‘nano’) the .Rprofile file we just created in the root of your home directory (~), substituting your username and (copy/paste) your encrypted password:

Using WRDS With R

Testing and Using Interactively

In a qsub R Script

Create an R script (test_R.R, in this example), something like:

Run with something like:

Check your output file:

Further Reading: WRDS and R Documentation from the WRDS Team

Setting Up and Using WRDS with MATLAB

NOTE: MATLAB version r2015b (the current latest version) has a bug which prevents DB connections from working. Until r2016b comes out you will need to use r2014b version (also installed in the cluster).

This is a one time setup, unless your Wharton password changes. Copy the WRDS_Connect.m function to someplace in your home directory, so you can edit the file and add credentials:

Set Up Your Login Credentials

Using your Wharton username and the SAS encrypted password you created above, edit (using the editor of your choice … in this case I used ‘nano’) the WRDS_Connect.m file we just created in the root of your home directory (~), substituting your username and (copy/paste) your encrypted password:

Using WRDS With MATLAB

Testing and Using Interactively

In a qsub MATLAB Script

Create a MATLAB script (test_MATLAB.m, in this example), something like:

Run with something like:

Check your output file:

Further Reading: WRDS and MATLAB Documentation from the WRDS Team

Final Thoughts

I hope you find this new resource useful. While the SAS/SHARE service and all of the code and details are powerful and (hopefully) useful, they are also new! If you discover any problems, have questions, or think documentation could be clearer, please don’t hesitate to e-mail research-computing@wharton.upenn.edu.

 

With two decades of experience supporting research and more than a decade at The Wharton School, Hugh enjoys the challenges and rewards of working with world-class researchers doing Amazing Things with research computing. Robust and scalable computational solutions (both on premise and in The Cloud), custom research programming solutions (clever ideas, simple code), and holistic, results-focused approaches to projects are the places where Hugh lives these days. On weekends you're likely to find him running through the woods with a topo map and compass, orienteering.