Amazon User Review Database

Currently limited to Wharton affiliated research

Working with UCSD we have collected a massive (~233 million) collection of Amazon product user reviews spanning nearly 22 (1996-2018) years. Please email us to request access. The database uses Amazon Web Services’ Athena offering and is queryable using SQL. Queries can be made using the AWS console or using an ODBC connection on the HPCC.

Reviews are partitioned by category,  year and month.

Categories

  • AMAZON FASHION (883,636 reviews)
  • All Beauty (371,345 reviews)
  • Appliances (602,777 reviews)
  • Arts Crafts and Sewing (2,875,917 reviews)
  • Automotive (7,990,166 reviews)
  • Books (51,311,621 reviews)
  • CDs and Vinyl (4,543,369 reviews)
  • Cell Phones and Accessories (10,063,255 reviews)
  • Clothing Shoes and Jewelry (32,292,099 reviews)
  • Digital Music (1,584,082 reviews)
  • Electronics (20,994,353 reviews)
  • Gift Cards (147,194 reviews)
  • Grocery and Gourmet Food (5,074,160 reviews)
  • Home and Kitchen (21,928,568 reviews)
  • Industrial and Scientific (1,758,333 reviews)
  • Kindle Store (5,722,988 reviews)
  • Luxury Beauty (574,628 reviews)
  • Magazine Subscriptions (89,689 reviews)
  • Movies and TV (8,765,568 reviews)
  • Musical Instruments (1,512,530 reviews)
  • Office Products (5,581,313 reviews)
  • Patio Lawn and Garden (5,236,058 reviews)
  • Pet Supplies (6,542,483 reviews)
  • Prime Pantry (471,614 reviews)
  • Software (459,436 reviews)
  • Sports and Outdoors (12,980,837 reviews)
  • Tools and Home Improvement (9,015,203 reviews)
  • Toys and Games (8,201,231 reviews)
  • Video Games (2,565,349 reviews)

Examples

SELECT * FROM "user_product_reviews"."edu_upenn_wharton_randa_amazon_user_review_db" 
limit 10;

returns 409cca25-38ac-4a97-9062-a4e0a77644d9

SELECT
regexp_extract(reviewtext, '.{0,50}sucks.{0,50}')
FROM "user_product_reviews"."edu_upenn_wharton_randa_amazon_user_review_db"
WHERE category = 'Home_and_Kitchen'
AND description LIKE '%vacuum%'
AND reviewtext LIKE '%sucks%'
AND overall = 5
limit 10;

Returns

  1. it fine in the model I have and the vacuum really sucks now (pun intended).
  2. I could make a lame joke the fact that it really sucks, but a vacuum this amazing deserves a joke of hi
  3. Never thought I’d give a 5 stars on an item that sucks. And boy, does it ever. I’ve vacuumed everyday fo
  4. xtension, one perfect for those who have cats. It sucks in cat hair like crazy. One head is powered with
  5. irty which proves this cleaner actually works and sucks all the dirts. Highly recommended.
  6. maneuvering around furniture super easy! And it sucks better than those expensive vacuums. It has a ho
  7. e. Features to love about this vacuum: * really sucks up the dirt – it sucks in a good way. We had a p
  8. f the previous reviewers. This B&D vacuum really sucks! And I mean that in a great way. Very lightweig
  9. t. But, it does big work! Its easy to handle, and sucks up all that grime that builds in your carpet. I a
  10. asy to assemble and man when I tell you that this sucks the dirt from the foundation of your house lol. N

Fields

  • overall (bigint) – User rating (1-5)
  • verified (boolean)  Is the review verified (true, false)
  • reviewtime (string) – Timestamp formatted like “10 30, 2009”
  • reviewerid (string) – Amazon generated user id
  • asin (string) – Unique Amazon generated product id
  • reviewername (string) – User supplied name
  • reviewtext (string) – Review text
  • summary (string) – Summary of Review text
  • unixreviewtime (bigint) – Timestamp formatted in epoch time.
  • vote (double) – Userful voters form other users
  • title (string) – Product title
  • description (string) – Product description
  • category (string) (Partitioned)
  • year (string) (Partitioned)
  • month (string) (Partitioned)

Citation

Please cite the following paper if you use the data in any way:

Justifying recommendations using distantly-labeled reviews and fined-grained aspects
Jianmo Ni, Jiacheng Li, Julian McAuley
Empirical Methods in Natural Language Processing (EMNLP), 2019
pdf