DEPRECATED! We’re working on documentation for connecting to WRDS via their newer PostgreSQL service
See https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/ for details for newest methods … some adjustments to the WRDS documentation are needed, generally installing any libraries, packages, or modules required (Python: ‘wrds’, R: RPostgres, etc.)
Original Article (DEPRECATED):
Using Wharton Research Computing’s new SAS/SHARE service, you can query WRDS datasets directly from your favorite research software, including Python, R, and MATLAB.
Getting data from the entire suite of WRDS data products and into the software you normally use for your research can be challenging. Many users who are unfamiliar with the SAS programming language spend a lot of time struggling with this unfamiliar software to simply gather a subset of a WRDS dataset, which they then save to their home directory space (wasting disk space resources), and then import the data into their software platform of choice for analysis.
There is a better way!
With guidance from the WRDS team, Wharton Research Computing now has our own SAS/SHARE server, which allows direct query of WRDS data via standard database queries. So instead of the multi-step and multi-software package process, you can work with WRDS data as objects in your language of choice.
Please note: due to its size, the NYSE TAQ dataset is not currently available through R at WRDS.
Prerequisite:
Encrypted Wharton Password
To work securely with SAS/SHARE in any language, you’ll want to use an encrypted version of your Wharton password instead of your normal text-based password. It is a violation of university policy to store your ‘normal’ Wharton text-based password in a file, so this step is required if you wish to use the SAS/SHARE service in a script file. Fortunately, it’s easy!
Log onto the HPCC login node at hpcc.wharton.upenn.edu and type:
qrsh sas -nodms
This will launch the SAS software on a compute node, in interactive mode. After SAS loads, at the `1? ` prompt, enter:
PROC PWENCODE in='thisIsNotMyRealPassword'; run; [ENTER]
Replace my fake password with your real password. Obviously, make sure you are in a private setting when you type this so that no one will be able to observe your real password.
You will be returned a SAS encrypted password, similar to:
{SAS002}1E21C9321B1E016F1D4507E31B005BFC9A1BCFF5B348C073F02EA6ADF72A4139F2
COPY that entire line, ‘{SAS002}’ and all, and save it to a secure location. NOTE: your password may appear on more than one line. If it does, when you save it somewhere remove the line break. When you are done, press Ctrl-c on your keyboard to ‘break out of’ and close the running SAS software.
Set Up Your Research Software
You’re now ready to set up your research software: Python, R, or MATLAB. Each one is a bit different, so I will describe them separately. Please feel free to skip to the section for the software of your choice!
I will also include links to the WRDS documentation for each product, with the caveat that their systems and setups (particularly system names and file paths) are often different than those on the Wharton Research Computing systems.
Setting Up and Using WRDS with Python
This is a one time setup, unless your Wharton password changes.
qlogin
mkvirtualenv WRDS
pip install -U numpy
pip install -U iPython # <-- Optional ... if you like to use it
pip install -U pandas
pip install -U jaydebeapi==0.2.0
cat >> .virtualenvs/WRDS/bin/activate
export PYTHONPATH=$PYTHONPATH:/usr/local/demo/Python/WRDS [ENTER]
Ctrl-d
Set Up Your Login Credentials
Using your Wharton username and the SAS encrypted password you created above, make a new file in the root of your home directory (~), called .wrdsauthrc, substituting in (copy/paste) your encrypted password:
cat > ~/.wrdsauthrc
[credentials]
username=my_username
password={SAS002}DBCC5712369DE1C65B19864C1564FB850F398DCF [ENTER]
# type the following to exit the file write:
Ctrl-d
Using the Python Setup
Testing and Using Interactively
qlogin
workon WRDS
python
>>> import wrds
>>> result = wrds.sql('select * from CRSP.DSF (obs=10)')
>>> print(result.to_string())
CUSIP PERMNO PERMCO ISSUNO HEXCD HSICCD DATE BIDLO ASKHI PRC VOL RET BID ASK SHROUT CFACPR CFACSHR OPENPRC NUMTRD RETX
0 68391610 10000 7952 10396 3 3990 1986-01-07 2.375 2.750 -2.5625 1000 NaN None None 3680 1 1 None None NaN
1 68391610 10000 7952 10396 3 3990 1986-01-08 2.375 2.625 -2.5000 12800 -0.024390 None None 3680 1 1 None None -0.024390
2 68391610 10000 7952 10396 3 3990 1986-01-09 2.375 2.625 -2.5000 1400 0.000000 None None 3680 1 1 None None 0.000000
3 68391610 10000 7952 10396 3 3990 1986-01-10 2.375 2.625 -2.5000 8500 0.000000 None None 3680 1 1 None None 0.000000
4 68391610 10000 7952 10396 3 3990 1986-01-13 2.500 2.750 -2.6250 5450 0.050000 None None 3680 1 1 None None 0.050000
5 68391610 10000 7952 10396 3 3990 1986-01-14 2.625 2.875 -2.7500 2075 0.047619 None None 3680 1 1 None None 0.047619
6 68391610 10000 7952 10396 3 3990 1986-01-15 2.750 3.000 -2.8750 22490 0.045455 None None 3680 1 1 None None 0.045455
7 68391610 10000 7952 10396 3 3990 1986-01-16 2.875 3.125 -3.0000 10900 0.043478 None None 3680 1 1 None None 0.043478
8 68391610 10000 7952 10396 3 3990 1986-01-17 2.875 3.125 -3.0000 8470 0.000000 None None 3680 1 1 None None 0.000000
9 68391610 10000 7952 10396 3 3990 1986-01-20 2.875 3.125 -3.0000 1000 0.000000 None None 3680 1 1 None None 0.000000
In a qsub
Python Script
Create a Python script (test_Py.py, in this example), something like:
import wrds
result = wrds.sql('select * from CRSP.DSF (obs=10)')
print(result.to_string())
Run with something like:
qsub -N wrds_Py -j y -b y 'workon WRDS; python test_Py.py'
Check your output file:
# cat wrds_Py.o479969 CUSIP PERMNO PERMCO ISSUNO HEXCD HSICCD DATE BIDLO ASKHI PRC VOL RET BID ASK SHROUT CFACPR CFACSHR OPENPRC NUMTRD RETX 0 68391610 10000 7952 10396 3 3990 1986-01-07 2.375 2.750 -2.5625 1000 NaN None None 3680 1 1 None None NaN 1 68391610 10000 7952 10396 3 3990 1986-01-08 2.375 2.625 -2.5000 12800 -0.024390 None None 3680 1 1 None None -0.024390 2 68391610 10000 7952 10396 3 3990 1986-01-09 2.375 2.625 -2.5000 1400 0.000000 None None 3680 1 1 None None 0.000000 3 68391610 10000 7952 10396 3 3990 1986-01-10 2.375 2.625 -2.5000 8500 0.000000 None None 3680 1 1 None None 0.000000 4 68391610 10000 7952 10396 3 3990 1986-01-13 2.500 2.750 -2.6250 5450 0.050000 None None 3680 1 1 None None 0.050000 5 68391610 10000 7952 10396 3 3990 1986-01-14 2.625 2.875 -2.7500 2075 0.047619 None None 3680 1 1 None None 0.047619 6 68391610 10000 7952 10396 3 3990 1986-01-15 2.750 3.000 -2.8750 22490 0.045455 None None 3680 1 1 None None 0.045455 7 68391610 10000 7952 10396 3 3990 1986-01-16 2.875 3.125 -3.0000 10900 0.043478 None None 3680 1 1 None None 0.043478 8 68391610 10000 7952 10396 3 3990 1986-01-17 2.875 3.125 -3.0000 8470 0.000000 None None 3680 1 1 None None 0.000000 9 68391610 10000 7952 10396 3 3990 1986-01-20 2.875 3.125 -3.0000 1000 0.000000 None None 3680 1 1 None None 0.000000
Further Reading: WRDS and Python Documentation from the WRDS Team
Setting Up and Using WRDS with R
This is a one time setup, unless your Wharton password changes.
cat /usr/local/demo/R/WRDS/Rprofile >> ~/.Rprofile
cat /usr/local/demo/R/WRDS/Renviron >> ~/.Renviron
Set Up Your Login Credentials
Using your Wharton username and the SAS encrypted password you created above, edit (using the editor of your choice … in this case I used ‘nano’) the .Rprofile file we just created in the root of your home directory (~), substituting your username and (copy/paste) your encrypted password:
nano ~/.Rprofile
# substitute your username for 'my_username' and your password from above in the 'pass <- ' variable
Using WRDS With R
Testing and Using Interactively
qlogin
R --no-save
> res <- dbSendQuery(wrds,"select date,dji from DJONES.DJDAILY (obs=10)")
> data <- fetch(res, n = -1)
> data
date dji
1 1896-05-26 40.94
2 1896-05-27 40.58
3 1896-05-28 40.20
4 1896-05-29 40.63
5 1896-06-01 40.60
6 1896-06-02 40.04
7 1896-06-03 39.77
8 1896-06-04 39.94
9 1896-06-05 40.32
10 1896-06-08 39.81
> q()
In a qsub
R Script
Create an R script (test_R.R, in this example), something like:
res <- dbSendQuery(wrds,"select date,dji from DJONES.DJDAILY (obs=10)") data <- fetch(res, n = -1) data
Run with something like:
qsub -N wrds_R -j y -b y 'R --no-save < test_R.R'
Check your output file:
# cat wrds_R.o480044 ... Loading required package: DBI > res <- dbSendQuery(wrds,"select date,dji from DJONES.DJDAILY (obs=10)") > data <- fetch(res, n = -1) > data date dji 1 1896-05-26 40.94 2 1896-05-27 40.58 3 1896-05-28 40.20 4 1896-05-29 40.63 5 1896-06-01 40.60 6 1896-06-02 40.04 7 1896-06-03 39.77 8 1896-06-04 39.94 9 1896-06-05 40.32 10 1896-06-08 39.81 >
Further Reading: WRDS and R Documentation from the WRDS Team
Setting Up and Using WRDS with MATLAB
NOTE: MATLAB version r2015b (the current latest version) has a bug which prevents DB connections from working. Until r2016b comes out you will need to use r2014b version (also installed in the cluster).
This is a one time setup, unless your Wharton password changes. Copy the WRDS_Connect.m function to someplace in your home directory, so you can edit the file and add credentials:
cp /usr/local/demo/Matlab/WRDS/WRDS_Connect.m ~/
Set Up Your Login Credentials
Using your Wharton username and the SAS encrypted password you created above, edit (using the editor of your choice … in this case I used ‘nano’) the WRDS_Connect.m file we just created in the root of your home directory (~), substituting your username and (copy/paste) your encrypted password:
nano ~/WRDS_Connect.m
# substitute your username for 'my_username' and your password from above for the '{SAS002}...' placeholder password
Using WRDS With MATLAB
Testing and Using Interactively
qlogin
matlab -nodisplay
>> WRDS_Connect
# view all WRDS DataSets
>> schemas(WRDS)
# do a query
>> query = exec(WRDS, ['SELECT * FROM CRSP.DSF (obs=10)']);
>> result = fetch(query);
>> result.Data
ans =
Columns 1 through 8
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-07' [2.3750]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-08' [2.3750]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-09' [2.3750]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-10' [2.3750]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-13' [2.5000]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-14' [2.6250]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-15' [2.7500]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-16' [2.8750]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-17' [2.8750]
'68391610' [10000] [7952] [10396] [3] [3990] '1986-01-20' [2.8750]
Columns 9 through 17
[2.7500] [-2.5625] [ 1000] [ NaN] [NaN] [NaN] [3680] [1] [1]
[2.6250] [-2.5000] [12800] [-0.0244] [NaN] [NaN] [3680] [1] [1]
[2.6250] [-2.5000] [ 1400] [ 0] [NaN] [NaN] [3680] [1] [1]
[2.6250] [-2.5000] [ 8500] [ 0] [NaN] [NaN] [3680] [1] [1]
[2.7500] [-2.6250] [ 5450] [ 0.0500] [NaN] [NaN] [3680] [1] [1]
[2.8750] [-2.7500] [ 2075] [ 0.0476] [NaN] [NaN] [3680] [1] [1]
[ 3] [-2.8750] [22490] [ 0.0455] [NaN] [NaN] [3680] [1] [1]
[3.1250] [ -3] [10900] [ 0.0435] [NaN] [NaN] [3680] [1] [1]
[3.1250] [ -3] [ 8470] [ 0] [NaN] [NaN] [3680] [1] [1]
[3.1250] [ -3] [ 1000] [ 0] [NaN] [NaN] [3680] [1] [1]
Columns 18 through 20
[NaN] [NaN] [ NaN]
[NaN] [NaN] [-0.0244]
[NaN] [NaN] [ 0]
[NaN] [NaN] [ 0]
[NaN] [NaN] [ 0.0500]
[NaN] [NaN] [ 0.0476]
[NaN] [NaN] [ 0.0455]
[NaN] [NaN] [ 0.0435]
[NaN] [NaN] [ 0]
[NaN] [NaN] [ 0]
In a qsub
MATLAB Script
Create a MATLAB script (test_MATLAB.m, in this example), something like:
WRDS_Connect
query = exec(WRDS, ['SELECT * FROM CRSP.DSF (obs=10)']);
result = fetch(query);
result.Data
Run with something like:
qsub -N wrds_MATLAB -j y -b y 'matlab -nodisplay < test_MATLAB.m'
Check your output file:
# cat wrds-MATLAB.o480051 ... >> >> >> ans = Columns 1 through 7 '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-07' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-08' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-09' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-10' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-13' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-14' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-15' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-16' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-17' '68391610' [10000] [7952] [10396] [3] [3990] '1986-01-20' Columns 8 through 14 [2.3750] [2.7500] [-2.5625] [ 1000] [ NaN] [NaN] [NaN] [2.3750] [2.6250] [-2.5000] [12800] [-0.0244] [NaN] [NaN] [2.3750] [2.6250] [-2.5000] [ 1400] [ 0] [NaN] [NaN] [2.3750] [2.6250] [-2.5000] [ 8500] [ 0] [NaN] [NaN] [2.5000] [2.7500] [-2.6250] [ 5450] [ 0.0500] [NaN] [NaN] [2.6250] [2.8750] [-2.7500] [ 2075] [ 0.0476] [NaN] [NaN] [2.7500] [ 3] [-2.8750] [22490] [ 0.0455] [NaN] [NaN] [2.8750] [3.1250] [ -3] [10900] [ 0.0435] [NaN] [NaN] [2.8750] [3.1250] [ -3] [ 8470] [ 0] [NaN] [NaN] [2.8750] [3.1250] [ -3] [ 1000] [ 0] [NaN] [NaN] Columns 15 through 20 [3680] [1] [1] [NaN] [NaN] [ NaN] [3680] [1] [1] [NaN] [NaN] [-0.0244] [3680] [1] [1] [NaN] [NaN] [ 0] [3680] [1] [1] [NaN] [NaN] [ 0] [3680] [1] [1] [NaN] [NaN] [ 0.0500] [3680] [1] [1] [NaN] [NaN] [ 0.0476] [3680] [1] [1] [NaN] [NaN] [ 0.0455] [3680] [1] [1] [NaN] [NaN] [ 0.0435] [3680] [1] [1] [NaN] [NaN] [ 0] [3680] [1] [1] [NaN] [NaN] [ 0]
Further Reading: WRDS and MATLAB Documentation from the WRDS Team
Final Thoughts
I hope you find this new resource useful. While the SAS/SHARE service and all of the code and details are powerful and (hopefully) useful, they are also new! If you discover any problems, have questions, or think documentation could be clearer, please don’t hesitate to e-mail research-computing@wharton.upenn.edu.