Corelogic Databases
NOTE 2024 :: the underlying Corelogic architecture has changed, along with access methods. Please contact Research Computing for details until this page is updated.
In 2020, Wharton purchased a few databases from Corelogic that are available to Faculty and Students across Wharton. The datasets include LLMA (Loan-Level Market Analytics), MLS (Multiple Listing Enterprise Solutions), Deeds, and Tax History.
The Corelogic Multiple Listing Enterprise Solutions (MLS) Database is made up of data that real estate agents use
to match buyers with sellers. It includes 10 million observations of property listing data, and more than
600 variables that describe listing details (the listing date and price, the listing office and agent,
the commission rate offered to the buyer’s agent, etc.), property characteristics (number of bedrooms,
bathrooms, remarks from sellers), and transaction details when a sale occurs (the sale price and date,
the purchasing office and agent).
Funding is supplied by the Analytics at Wharton.
Access
Please fill out this form for access. An entry will be added to your .odbc.ini file to make it easier for you to login on the HPC3.
Getting Started on the HPC3
In order to test your connection, simply log into the HPC3 as normal and do the following.
$ qlogin $ isql corelogic_aws SQL> SHOW TABLES;
You should get a long list of tables. That means ODBC is working for you.
We also created R, Python, and Stata script examples. The scripts should be helpful creating
the connection string.
Python
You will need a virtual environment with pyodbc installed. Here are the details. More Python details can be found on our Python Tools page.
qlogin cd projects/some_project # <- optional but recommended (1 venv per project) module load python/3.11.5 # <- note that versions update semi-annually (module avail to see) python -m venv venv3115 # <- create virtual environment source venv3115/bin/activate # <- activate virtual environment # update pip and installation tools: python -m pip install -U pip python -m pip install -U setuptools wheel # install (finally!) the ODBC connector python -m pip install -U pyodbc
And run the following code
import pyodbc with pyodbc.connect(dsn="corelogic_aws") as dbh: cursor = dbh.cursor() res = cursor.execute("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;") headers = [col[0] for col in res.description] for row in cursor.fetchall(): print(dict(zip(headers, row)))
R
You’ll need to install the ODBC libraries:
install.packages("RODBC")
And run the following code
library("RODBC") ch <- odbcConnect("corelogic_aws") sqlTables(ch) res <- sqlQuery(ch, "SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;", max = 10) write.table(res, file="example.RData")
Stata
set odbcmgr unixodbc odbc load, exec("SELECT `FA_APN`, `FA_ListDate`, `ClosePrice` FROM MLS_NCRMLS LIMIT 10;") noquote dsn(corelogic_aws) describe
FAQs
[/vc_column_text][/vc_column][/vc_row]