Corelogic Databases
In 2020, Wharton purchased a few databases from Corelogic that are available to Faculty and Students across Wharton. The datasets include LLMA (Loan-Level Market Analytics), MLS (Multiple Listing Enterprise Solutions), Deeds, and Tax History.
The Corelogic Multiple Listing Enterprise Solutions (MLS) Database is made up of data that real estate agents use
to match buyers with sellers. It includes 10 million observations of property listing data, and more than
600 variables that describe listing details (the listing date and price, the listing office and agent,
the commission rate offered to the buyer’s agent, etc.), property characteristics (number of bedrooms,
bathrooms, remarks from sellers), and transaction details when a sale occurs (the sale price and date,
the purchasing office and agent).
Funding is supplied by the Analytics at Wharton.
Access
Please fill out this form for access. An entry will be added to your .odbc.ini file to make it easier for you to login on the HPCC.
Getting Started on the HPCC
In order to test your connection, simply log into the HPCC as normal and do the following.
$ qlogin $ isql corelogic_aws SQL> SHOW TABLES;
You should get a long list of tables. That means ODBC is working for you.
We also created R, Python, and Stata script examples. The scripts should be helpful creating
the connection string.
Python
In python you will need a virtual environment with pyodbc installed.
See documentation here for creating a virtual environment.
Once you create your environment, simply do:
source /opt/rh/rh-python36/enable source venv/bin/activate pip install pyodbc
And run the following code
#! env python import pyodbc with pyodbc.connect(dsn="corelogic_aws") as dbh: cursor = dbh.cursor() res = cursor.execute("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;") headers = [col[0] for col in res.description] for row in cursor.fetchall(): print(dict(zip(headers, row)))
R
You’ll need to install the ODBC libraries:
install.packages("RODBC")
And run the following code
library("RODBC") ch <- odbcConnect("corelogic_aws") sqlTables(ch) res <- sqlQuery(ch, "SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;", max = 10) write.table(res, file="example.RData")
Stata
set odbcmgr unixodbc odbc load, exec("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;") noquote dsn(corelogic_aws) describe