Corelogic Databases

Corelogic Databases

In 2020, Wharton purchased a few databases from Corelogic that are available to Faculty and Students across Wharton.  The datasets include LLMA (Loan-Level Market Analytics), MLS (Multiple Listing Enterprise Solutions), Deeds, and Tax History.

The Corelogic Multiple Listing Enterprise Solutions (MLS) Database is made up of data that real estate agents use
to match buyers with sellers. It includes 10 million observations of property listing data, and more than
600 variables that describe listing details (the listing date and price, the listing office and agent,
the commission rate offered to the buyer’s agent, etc.), property characteristics (number of bedrooms,
bathrooms, remarks from sellers), and transaction details when a sale occurs (the sale price and date,
the purchasing office and agent).

Funding is supplied by the Analytics at Wharton.

Access

Please fill out this form for access. An entry will be added to your .odbc.ini file to make it easier for you to login on the HPCC.

Getting Started on the HPCC

In order to test your connection, simply log into the HPCC as normal and do the following.

$ qlogin
$ isql corelogic_aws
SQL> SHOW TABLES;

You should get a long list of tables. That means ODBC is working for you.

We also created R, Python, and Stata script examples. The scripts should be helpful creating
the connection string.

Python

In python you will need a virtual environment with pyodbc installed.
See documentation here for creating a virtual environment.
Once you create your environment, simply do:

source /opt/rh/rh-python36/enable
source venv/bin/activate
pip install pyodbc

And run the following code

    
#! env python
import pyodbc
with pyodbc.connect(dsn="corelogic_aws") as dbh:
    cursor = dbh.cursor()
    res = cursor.execute("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;")
    headers = [col[0] for col in res.description]

    for row in cursor.fetchall():
        print(dict(zip(headers, row)))

R

You’ll need to install the ODBC libraries:

install.packages("RODBC")

And run the following code

    
library("RODBC")
ch <- odbcConnect("corelogic_aws")
sqlTables(ch)
res <- sqlQuery(ch, "SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;", max = 10)
write.table(res, file="example.RData")

Stata

set odbcmgr unixodbc

odbc load, exec("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;") noquote dsn(corelogic_aws)

describe
    

FAQs