Using Text Mining to Infer Semantic Attributes for Retail Data Mining

Using Text Mining to Infer Semantic Attributes for Retail Data Mining Authors: Rayid Ghani & Andrew E. Fano Presenter: Vishal Mahajan INFS795

Agenda • Drawbacks in Current Data Mining Techniques. • Purpose. • Assumptions and Constraints. • Methodology or Approach. • Extraction of Feature Set. • Labeling . • Classification Techniques. • Naïve Bayes • EM • Experimental Results. • Recommender System.

Drawbacks in Current Data Mining Techniques • Semantic Features not automatically considered. • Transactional Data analyzed without analyzing the customer. • Trending is partial. • Retail Items treated as objects with no associated semantics. • Data Mining Techniques (association rules, decision trees, neural networks) ignore the meaning of items and semantics associated with them.

Purpose of the Presentation • Describe a system that extracts semantic features. • Populate the knowledge base with the semantic features. • Use of text mining in retailing to extract semantic features from website of retailers. • How profiles of customers or group of customers can be build using Text Mining.

Assumptions & Constraints • Focus on Apparel Retail segment only. • Results focus on extracting those semantic features that are deemed important by CRM or Retail experts. • Data extracted from retailers website. • Models generated can be extended beyond the Apparel Retail segment.

Approach • Collect Information about products. • Define set of features to be extracted. • Label the data with values of the features. • Train a classifier/extractor to use the labeled training to extract features from unseen data. • Extract Semantic Features from new products by using trained classifier. • Populate a knowledge base with the products and corresponding feature.

Data Collection Methodology • Use of web crawler to extract the following from large retailers’ website: • Names • URLs • Description • Prices • Categories of all Products Available • Use of wrappers. • Extracted Information stored in a database and a subset chosen.

Extraction of Feature Set • Feature selection based on Expert Systems. • Use of extensive domain knowledge. • Feature selection based on Retail Apparel section in mind. • Feature Selected for the project  • Age Group • Functionality • Price • Formality • Degree of Conservativeness • Degree of Sportiness • Degree of Trendiness • Degree of Brand Appeal

Labeling Training Data • Database created with data from collected from retailer website. • Subset of 600 products chosen and labeled. • Labeling guidelines provided

Details of Features extracted from each Product Description

Verifying Training Data • Disjoint Dataset as labeling done by different individuals. • Association rules (between features) used to obtain consistency in labeled data. • Apriori algorithm • Apriori Algorithm implemented with single and two feature antecedents and consequents. • Desired Consistency in Labeling achieved by applying associating rules

Apriori Algorithm • Find the frequent itemsets: the sets of items that have minimum support • A subset of a frequent itemset must also be a frequent itemset • i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset • Use the frequent itemsets to generate association rules.

The Apriori Algorithm — Example L1 C1 Scan D C2 Database D C2 L2 Scan D L3 C3 Scan D

Training from Labeled Data • Learning problem treated as a text classification problem. • Only one text classifier for each semantic feature. • e.g Price of product will be classified as either discount or average or luxury. • Age group is classified as Juniors or Teens or GenX or Mature or All Ages. • Classification was performed using Naïve Bayes classification.

Sample Association Rules

Naïve Bayes • Simple but effective text classification method. • Class is selected according to class prior probabilities. • This Model assumes each word in a document is generated independently of the other in the class. where N(wt,di) = count of times word wt occurs in document di and Pr(cj,di) = {0,1)

Incorporating Unlabeled Data • Initial sample was for 600 products only. • Need to take care of unlabeled products to make any meaningful predictions. • Use of Supervised learning algorithms. • These algorithms have proved to reduce the classification error considerably. • Use of Expectation-Maximization (EM) Algorithm as the supervised technique.

Expectation-Maximization (EM) Method • EM is an iterative statistical technique for maximum likelihood estimation for incomplete data. • In the retail classification problem, unlabeled data is considered as incomplete data. • EM  • Locally maximizes the likelihood of the parameter. • Gives estimates for missing values.

Expectation-Maximization (EM) Method- cont • EM method is a 2-step process. • Initial Parameters are set using naïve Bayes from just the labeled documents. • Subsequent iteration of E- and M-Steps. • E-Step • Calculates probabilistically weighed class label Pr(cj|dj), for every unlabeled document. • M-Step • Estimates new classifier parameter using all documents (Equation 1). • E and M steps iterated unless classifier converges

Experimental Results

Results on new data set • The subset of data that was used earlier was from a single retailer. • Another sample of data was collected from variety of retailers. The results are as follows. • Results are consistently better.

Recommender System • Creation of customer profiles (real time) is feasible by analyzing the text associated with products and by mapping it to pre-defined semantic features. • Identity of customer is not known and prior transaction history is unknown. • Semantic features are inferred by the “browsing” pattern of the customer. • Helps in suggesting new products to the customers.

Recommender System Mathematically  • P(Aij|Product) • Where Aij is the jth value of ith attribute • i=semantic attributes, j=possible values • User profile is constructed as follows • Pr(Ui,j|Past N Items) = 1/N i,j is calculated

Types of Recommender Systems • Two Types of Recommender Systems. • Collaborative Filtering. • Collect user feedback in terms of ratings. • Exploit similarities and differences of customers to recommend items. • Issues • Sparsity Problem. • New Items. • Content Filtering • Compares the contents • Issues • Narrow in scope • Recommends similar products only

Conclusions • The systems learns from the use of supervised and semi-supervised techniques. • Major assumptions..Products accurately convey the semantic attributes.?? • Small sample of data used to Infer results. Practical applications not verified. • System bootstrapped from a small number of labeled training examples. • Interesting application which could be evolved to generate trends for retail marketers.

Using Text Mining to Infer Semantic Attributes for Retail Data Mining

Using Text Mining to Infer Semantic Attributes for Retail Data Mining

Presentation Transcript

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

Untangling Text Data Mining

Text mining- text analytics- data mining

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

Preserving Semantic Content in Text Mining Using Multigrams

CS583 – Data Mining and Text Mining

Untangling Text Data Mining

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining

Preserving Semantic Content in Text Mining Using Multigrams

CS583 – Data Mining and Text Mining

CS583 – Data Mining and Text Mining