Corelogic Databases

Corelogic Databases

In 2020, Wharton purchased a few databases from Corelogic that are available to Faculty and Students across Wharton.  The datasets include LLMA (Loan-Level Market Analytics), MLS (Multiple Listing Enterprise Solutions), Deeds, and Tax History.

The Corelogic Multiple Listing Enterprise Solutions (MLS) Database is made up of data that real estate agents use
to match buyers with sellers. It includes 10 million observations of property listing data, and more than
600 variables that describe listing details (the listing date and price, the listing office and agent,
the commission rate offered to the buyer’s agent, etc.), property characteristics (number of bedrooms,
bathrooms, remarks from sellers), and transaction details when a sale occurs (the sale price and date,
the purchasing office and agent).

Funding is supplied by the Analytics at Wharton.

Access

Please fill out this form for access. An entry will be added to your .odbc.ini file to make it easier for you to login on the HPCC.

Getting Started on the HPCC

In order to test your connection, simply log into the HPCC as normal and do the following.

$ qlogin
$ isql corelogic_aws
SQL> SHOW TABLES;

You should get a long list of tables. That means ODBC is working for you.

We also created R, Python, and Stata script examples. The scripts should be helpful creating
the connection string.

Python

You will need a virtual environment with pyodbc installed. Here are the details. More Python details can be found on our Python Tools page.

qlogin
cd projects/some_project   # <- optional but recommended (1 venv per project)
module load python/3.11.5  # <- note that versions update semi-annually (module avail to see)
python -m venv venv3115    # <- create virtual environment
source venv3115/bin/activate   # <- activate virtual environment
# update pip and installation tools:
python -m pip install -U pip
python -m pip install -U setuptools wheel
# install (finally!) the ODBC connector
python -m pip install -U pyodbc

And run the following code

import pyodbc
with pyodbc.connect(dsn="corelogic_aws") as dbh:
    cursor = dbh.cursor()
    res = cursor.execute("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;")
    headers = [col[0] for col in res.description]

    for row in cursor.fetchall():
        print(dict(zip(headers, row)))

R

You’ll need to install the ODBC libraries:

install.packages("RODBC")

And run the following code

    
library("RODBC")
ch <- odbcConnect("corelogic_aws")
sqlTables(ch)
res <- sqlQuery(ch, "SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;", max = 10)
write.table(res, file="example.RData")

Stata

set odbcmgr unixodbc

odbc load, exec("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;") noquote dsn(corelogic_aws)

describe
    

FAQs