220 likes | 359 Views
Exploiting Structured Ontology to Organize Scattered Online Opinions. Yue Lu , Huizhong Duan , Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign. August 24, COLING’2010 Beijing, China. Online Opinions: Valuable Resource. …. Need to organize them
E N D
Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, HuizhongDuan, Hongning Wang, ChengXiangZhai University of Illinois at Urbana-Champaign • August 24, COLING’2010 • Beijing, China
Online Opinions: Valuable Resource … • Need to organize them • in a meaningful way!
Aspect Summarization • What are “good aspects”? • 1. Concise • 2. Relevant to topic • 3. Captures major opinions • 4. Reasonable order
Existing Work Clustering + Phrase Selection What are “good aspects”? 1. Concise 2. Relevant to topic 3. Captures major opinions 4. Reasonable order [Chen&Dumais 2000] NA • Our idea: • use structured ontology
Why Using Ontology? Clustering based Ontology based What are “good aspects”? 1. Concise 2. Relevant to topic 3. Captures major opinions 4. Reasonable order NA • In addition: • Great coverage • 12 millions of entities, e.g. person, place, or thing • Consistently growing • Anyone can contribute data
Problem Definition • Two Main Tasks: • - Aspect Selection • - Aspect Ordering Ontology (>50 aspects) Topic = “Abraham Lincoln” Online Opinion Sentences Output Professions Parents Quotations Children Date of Birth Date of Birth … Books written … Place of Death Professions Place of Birth Spouse Place of Death Selected Subset of Aspects Selected Matching Opinions Ordered to optimize readability
Aspect Selection: Task Definition What are “good aspects”? • 3. Captures major opinions Aligned relevant opinions KL-divergence retrieval model Query: Professions … Professions Collection: … Parents … … Task: Select a subset of K aspects
Aspect Selection: Methods (1) (2) Size=800 … • Size-based • Size = Number of aligned relevant opinions • Select K aspects of largest size • Opinion Coverage-based • Reduce redundancy, maximum coverage • Select K aspects sequentially (max cover problem) Professions 1 2 3 Size=600 … Position 4 5 3 Size=500 … Parents 4 5 6
Aspect Selection: Method (3)Conditional Entropy-based Use a greedy algorithm to approximate the solution Collection: A = argmin H(C|A) p(Ai,Ci) = argmin - ∑i p(Ai,Ci) log ---------- p(Ai) … Clustering, e.g. K-means … … C1 A1 Professions Aspect Subset: A Clusters: C … … C2 A2 Position … … C3 A3 Parents
Aspect Ordering: Task Definition What are “good aspects”? • 4. Reasonable order Ordered Un-Ordered Aspect Subset Date of Birth Place of Death Professions Date of Birth Professions Quotations Quotations Place of Death
Aspect Ordering: Methods • Ontology Order • Use the order that aspects appear in ontology • Coherence Order • Follow the order of aligned opinions in their original articles (e.g. blog article, customer review)
Aspect Ordering: Coherence Order Original Articles A1 Place of Death … A2 Date of Birth Coherence(A1, A2) #( is before ) Coherence(A2, A1) #( is before ) Use a greedy algorithm to approximate the solution So, Coherence(A2, A1) > Coherence (A1, A2) Π(A) = argmax ∑ Ai before AjCoherence(Ai, Aj)
Experiments: Data Sets • Ontology • Freebase • Opinions • Blog entries and CNET customer reviews
Aspect Selection: Evaluation Measures = 2/3 • Aspect Coverage (AC) • Aspect Precision (AP) = Jaccard similarity • Average Aspect Precision (AAP) = 0.625 • = 0.42 A1 Professions C1 J(A3,C1)=2/4 AP=0.5 J(A1,C2)=1 A2 AP=0.75 C2 Position J(A2,C2)=2/4 A3 C3 AP=0 Parents
Conditional Entropy-based method provides best trade-off for Aspect Selection US Presidents Digital Cameras
Aspect Ordering: Human Labeling Aspect subset size = K X 3 Human Agreement Cluster Constraints Parents Spouse Professions Spouse Children Parents Quotations … Date of Birth Party Positions … X 3 Order Constraints Date of Birth Date of Death Education Positions Spouse Children X 3 …
Aspect Ordering: Measures Cluster Constraints Parents Spouse Children Cluster Precision = 0.5 Cluster Penalty = 1.25 Party Positions Is this pair presented together in the output? # aspects placed between this pair in the output? 1 0 Parents Spouse 0 2 Parents Children 1 0 Children Spouse 0 3 Party Positions
Aspect Ordering: Evaluation Results Measures: Cluster Precision Higher is better Cluster Penalty Lower is better
Aspect Ordering: Evaluation Results Is this order pair preserved in the output? Order Constraints Higher is better 1 Date of Birth Date of Death Order Precision = 0.67 Education Positions 0 Spouse Children 1
Conclusions • Novel Problem: exploit ontology for structured organization of online opinions • Aspect selection • Aspect ordering • Evaluation: US presidents and digital cameras • Conditional Entropy-based aspect selection • Coherence ordering • Future Directions: • New aspect suggestion for ontology • Better alignment of opinion sentences and aspects • Ontology + well-written articles