Build a Text Dataset from AMAZON

Build a Text Dataset from AMAZON Raymond ZHAO Wenlong(03/07/2018)

Collect data In the data age In StatisticalML/DL/NLP, volumes of data is a key. We could collect data from the wide world of web.

HTML HTML stands for Hyper Text Markup Language. HTML describes the structure of Web pages.

Web scraping Download the webpage and parse it.

The process Download:Requests is a HTTP library Parse:BeautifulSoup is to parses a web page See the developed script amazon_scraper.py

The dataset There are about 12k reviews for 180 laptops, and about 712k review words totally. Each review is from 2 words to about 600 words; The mean is about 60 words. See the AMAZON dataset amazon_reviews.json

Thanks Thanks Dr. Wong, David and Linkai

Build a Text Dataset from AMAZON

Build a Text Dataset from AMAZON

Presentation Transcript

New Requirements for Dataset Metadata : a perspective from CODATA

Dataset Citation: From Pilot to Production

Dataset:

Making a portrait from text

c hoose a dataset…

Dataset

Populating A Knowledge Base From Text

Learning from Text

DATASET

Dataset from HL-2A

DataSet

Learning from Text

Dataset

Textbook from amazon

Quoting Accurately from a Text

Amazon from above

Scraping data from amazon| Amazon web scraping

Shipping From China Amazon

Why approach migrateshop to build a amazon clone app

Big Deals from Amazon

Build Amazon Price Scraper

Build up with Amazon PPC Management