160 likes | 255 Views
Product Review Summarization from a Deeper Perspective. Duy Khang Ly, Kazunari Sugiyama , Ziheng Lin, Min-Yen Kan. National University of Singapore. Introduction. “ 754 customer reviews ”. “ fantastic results ”. “ Best photos that I have ever taken and a joy to use”.
E N D
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore
WING, NUS Introduction “754customer reviews” “fantastic results” “Best photos that I have ever taken and a joy to use” • Other customers can refer to the review when they buy it or not • Manufacturers can get a kind of feedback from customers
WING, NUS Introduction a. Lens (+): 57 sentences 1. The lens feels very solid! 2. I have taken a whole bunch of excellent pictures with this lens. … (-): 15 sentences 1. I do not satisfy with the included lens kit. 2. The lens cap is very loose and come off very easily ! … b. Battery Life (+): 32 sentences 1. The battery lasts for ever on one single charge. 2. The battery duration is amazing ! … (-): 8 sentences 1. I experienced very short battery life from this camera. 2. It uses a heavy battery. … • Does not organize • the sentences in each • sentiment • Users need to read • through the sentences • to know the reasons • that justify the sentiment Output of summary in existing systems [Hu and Liu, KDD’04], [Hu and Liu, AAAI’04], [Popescu and Etzioni, HLT/EMNLP’05]
WING, NUS Introduction a. Lens (+): The lens feels very solid! (+10 similar) (-): I think the lens does not worth it, it’s a bit too fragile. (+2 similar) (+): I have taken a lot of excellent pictures with this lens. (+7 similar) (-): Don’t buy this lens, I always get my pictures blurred. (+0 similar) … b. Battery Life (+): The battery lasts for ever on one single charge. (+18 similar) (-): I experienced very short battery life from this camera. (+4 similar) (+): 0 sentence (-): It uses a heavy battery. … • Provides a representative reason for the sentiment • Users can read a concise summary Output of desirable summary that our system aims at
WING, NUS Proposed Method • PRODUCT FACET • IDENTIFICATION (2) SUMMARIZATION Syntactic role Opinionated Sentence Extraction Pre- processing 1.The lens is too plastic! 2.The price of this lens is affordable! … 1.The output pictures are crystal clear. 2.I like the sharpness of the picture. … Association Rule Mining … Clustering Subtopic Clustering Product Reviews Post- processing Output Summary Sentence Representation Sentence Clustering Infreq. Facet Extraction Compact Presentation
WING, NUS Product Facet Identification POS tagging Extract noun and noun phrases Syntactic Roles Filter away noisy results Pre- processing Identify all the frequent explicit product facets Association Rule Mining Remove irrelevant facets Post- processing Help discover infrequent facets Infreq. Facet Extraction
WING, NUS Summarization Opinionated Sentence Extraction • [Ding’s et al., WSDM’08] • Assign a polarity score per sentence • Compute summation of polarity score • of its constituent words 1.The lens is too plastic! 2.The price of this lens is affordable! … 1.The output pictures are crystal clear. 2.I like the sharpness of the picture. … … Compute content-based pairwise similarities between all resulting opinion sentences Subtopic Clustering Sentence Representation Sentence Clustering • Clustering • Hierarchical clustering • with groupwise-average distance • Non-hierarchical clustering Compact Presentation Select the most representative sentence in the cluster
WING, NUS Experiments • Evaluation Measure • Product Facet Identification • - Recall, Precision • (2) Summarization • - Purity, Inverse purity • - F (harmonic mean of purity and inverse purity) • [Hotho et al., GLDV-Journal for Computational Linguistics and Language Technology ‘05] Experimental Data 3 products from [Hu and Liu, KDD’04]
WING, NUS Purity (i) In each generated cluster, precision is first computed regarding each label, the maximum value is then selected. (ii) The overall value for purity are computed by taking the weighted average of (i). Target documents for clustering (20) (i) Maximum precision of each cluster (8) (5) (4) (ii) “purity” for this clustering result (3) × × × × × ×
WING, NUS Inverse purity (i) In each generated cluster, recall is first computed regarding each label, the maximum value is then selected. (ii) The overall value for inverse purity are computed by taking the weighted average of (i). Target documents for clustering (20) (i) Maximum recall of each label (8) × (5) (4) (ii) “inverse purity” for this clustering result (3) × × × × × ×
WING, NUS F1-measure (α = 0.5) Harmonic mean of “purity” and “inverse purity”
WING, NUS (1) Product Facet Identification Example of extracted facet: Camera: “battery,” “picture,” “lens” Phone: “signal,” “headset” DVD player: “remote control,” “format”
(1) Product Facet Identification Performance of the product facet identification component [Hu and Liu, KDD’04] Performance of the product facet identification component [Hu and Liu, KDD’04] + syntactic role WING, NUS
WING, NUS (2) Summarization Number of facets in each product “Camera” has richer properties.
(2) Summarization Effective when the number of subtopics is small. Effective when the number of subtopics is large. Performance of summarization (F1-measure) 15 WING, NUS
WING, NUS Conclusion Future Work • Recognize brand names to improve facet identification • “My Canon camera has longer battery life than Nikon.” Thank you very much! • Design a system that can summarize product reviews and organize them into a structured, extractive summary • Product facet identification • Syntactic role information within a sentence is effective. • Summarization • Both hierarchical and non-hierarchical clustering work better compared with random clustering.