Using Text Mining to Infer Semantic Attributes for Retail Data Mining

Using Text Mining to Infer Semantic Attributes for Retail Data Mining. Authors: Rayid Ghani & Andrew E. Fano Presenter: Vishal Mahajan INFS795.

  2. Agenda • Drawbacks in Current Data Mining Techniques. • Purpose. • Assumptions and Constraints. • Methodology or Approach. • Extraction of Feature Set. • Labeling . • Classification Techniques. • Naïve Bayes • EM • Experimental Results. • Recommender System.

  3. Drawbacks in Current Data Mining Techniques • Semantic Features not automatically considered. • Transactional Data analyzed without analyzing the customer. • Trending is partial. • Retail Items treated as objects with no associated semantics. • Data Mining Techniques (association rules, decision trees, neural networks) ignore the meaning of items and semantics associated with them.

  4. Purpose of the Presentation • Describe a system that extracts semantic features. • Populate the knowledge base with the semantic features. • Use of text mining in retailing to extract semantic features from website of retailers. • How profiles of customers or group of customers can be build using Text Mining.

  5. Assumptions & Constraints • Focus on Apparel Retail segment only. • Results focus on extracting those semantic features that are deemed important by CRM or Retail experts. • Data extracted from retailers website. • Models generated can be extended beyond the Apparel Retail segment.

  6. Approach • Collect Information about products. • Define set of features to be extracted. • Label the data with values of the features. • Train a classifier/extractor to use the labeled training to extract features from unseen data. • Extract Semantic Features from new products by using trained classifier. • Populate a knowledge base with the products and corresponding feature.

  7. Data Collection Methodology • Use of web crawler to extract the following from large retailers’ website: • Names • URLs • Description • Prices • Categories of all Products Available • Use of wrappers. • Extracted Information stored in a database and a subset chosen.

  8. Extraction of Feature Set • Feature selection based on Expert Systems. • Use of extensive domain knowledge. • Feature selection based on Retail Apparel section in mind. • Feature Selected for the project  • Age Group • Functionality • Price • Formality • Degree of Conservativeness • Degree of Sportiness • Degree of Trendiness • Degree of Brand Appeal

  9. Labeling Training Data • Database created with data from collected from retailer website. • Subset of 600 products chosen and labeled. • Labeling guidelines provided

  10. Details of Features extracted from each Product Description

  11. Verifying Training Data • Disjoint Dataset as labeling done by different individuals. • Association rules (between features) used to obtain consistency in labeled data. • Apriori algorithm • Apriori Algorithm implemented with single and two feature antecedents and consequents. • Desired Consistency in Labeling achieved by applying associating rules

  12. Apriori Algorithm • Find the frequent itemsets: the sets of items that have minimum support • A subset of a frequent itemset must also be a frequent itemset • i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset • Use the frequent itemsets to generate association rules.

  13. The Apriori Algorithm — Example L1 C1 Scan D C2 Database D C2 L2 Scan D L3 C3 Scan D

  14. Training from Labeled Data • Learning problem treated as a text classification problem. • Only one text classifier for each semantic feature. • e.g Price of product will be classified as either discount or average or luxury. • Age group is classified as Juniors or Teens or GenX or Mature or All Ages. • Classification was performed using Naïve Bayes classification.

  15. Sample Association Rules

  16. Naïve Bayes • Simple but effective text classification method. • Class is selected according to class prior probabilities. • This Model assumes each word in a document is generated independently of the other in the class. where N(wt,di) = count of times word wt occurs in document di and Pr(cj,di) = {0,1)

  17. Incorporating Unlabeled Data • Initial sample was for 600 products only. • Need to take care of unlabeled products to make any meaningful predictions. • Use of Supervised learning algorithms. • These algorithms have proved to reduce the classification error considerably. • Use of Expectation-Maximization (EM) Algorithm as the supervised technique.

  18. Expectation-Maximization (EM) Method • EM is an iterative statistical technique for maximum likelihood estimation for incomplete data. • In the retail classification problem, unlabeled data is considered as incomplete data. • EM  • Locally maximizes the likelihood of the parameter. • Gives estimates for missing values.

  19. Expectation-Maximization (EM) Method- cont • EM method is a 2-step process. • Initial Parameters are set using naïve Bayes from just the labeled documents. • Subsequent iteration of E- and M-Steps. • E-Step • Calculates probabilistically weighed class label Pr(cj|dj), for every unlabeled document. • M-Step • Estimates new classifier parameter using all documents (Equation 1). • E and M steps iterated unless classifier converges

  20. Experimental Results

  21. Experimental Results

  22. Results on new data set • The subset of data that was used earlier was from a single retailer. • Another sample of data was collected from variety of retailers. The results are as follows. • Results are consistently better.

  23. Recommender System • Creation of customer profiles (real time) is feasible by analyzing the text associated with products and by mapping it to pre-defined semantic features. • Identity of customer is not known and prior transaction history is unknown. • Semantic features are inferred by the “browsing” pattern of the customer. • Helps in suggesting new products to the customers.

  24. Recommender System Mathematically  • P(Aij|Product) • Where Aij is the jth value of ith attribute • i=semantic attributes, j=possible values • User profile is constructed as follows • Pr(Ui,j|Past N Items) = 1/N i,j is calculated

  25. Types of Recommender Systems • Two Types of Recommender Systems. • Collaborative Filtering. • Collect user feedback in terms of ratings. • Exploit similarities and differences of customers to recommend items. • Issues • Sparsity Problem. • New Items. • Content Filtering • Compares the contents • Issues • Narrow in scope • Recommends similar products only

  26. Conclusions • The systems learns from the use of supervised and semi-supervised techniques. • Major assumptions..Products accurately convey the semantic attributes.?? • Small sample of data used to Infer results. Practical applications not verified. • System bootstrapped from a small number of labeled training examples. • Interesting application which could be evolved to generate trends for retail marketers.

