230 likes | 386 Views
Classifying Sentences using Induced Structure. Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod pizzato@ics.mq.edu.au Centre for Language Technology Macquarie University Sydney, Australia. Overview. Sentence Classification Problem Induced Structure Approach
E N D
Classifying Sentences using Induced Structure Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod pizzato@ics.mq.edu.au Centre for Language Technology Macquarie University Sydney, Australia
Overview • Sentence Classification Problem • Induced Structure Approach • Alignment Based Learning • Trie Based Classifier • Results • Concluding Remarks • Future Work Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(2/23)
Sentence Classification • Assist several NLP task: document summarisation, information extraction, question answering, among others. • Question Classification: • Definition: What is a golden parachute? • List: Name two brands of shaving cream. • Factoid questions: • HUM:IND: Who discover the penicillin? • LOC:CITY: What is the capital of Australia? • FOOD, PLANT, ANIMAL: What do bats eat? Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(3/23)
Current approaches • Handcrafted regular expressions: • Pros: Rules are understandable. Few rules satisfy a large amount of the questions (Zip’s Law). • Cons: Difficult to construct. Limited performance. • Machine Learning: • Pros: Computer automatically finds “rules”. • Cons: Rules and knowledge generated are not readable. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(4/23)
TrainingData Extract Structure Structure Sentence SentenceClassifier Class Classifying by Induced Structure • Process fits between ML and RE • Learn patterns from sentences; • Use these patterns in the classification phase; Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(5/23)
Classifying by Induced Structure • Propose two distinct approaches: • Alignment-Based Learning Classifier (ABL) • ABL is a generic grammatical inference framework, that learns structure using plain text. • Trie-Based Classifier • Classifies sentences based on partial matches in a Trie structure. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(6/23)
Alignment-Based Learning Classifier (ABL) • Developed under the idea that constituents in sentences can be interchanged. • The book is on the table. • The car is on the driveway. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(7/23)
book table the is on the car driveway Alignment-Based Learning Classifier (ABL) • Developed under the idea that constituents in sentences can be interchanged. • The (book) is on the (table). • The (car) is on the (driveway). Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(8/23)
Alignment-Based Learning Classifier (ABL) Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(9/23)
a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|...|r|...|z a|b|c|d|...|r|...|z car zebra a|b|c|d|e|f|...|z Trie-Based Classifier • T(S) = {T(S/a1), T(S/a2) ,…,T(S/ar)} • Where S is the set of sentences and S/an are the sentences starting with an, but stripped of the initial element. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(10/23)
is 3 1 where the 2 8 who is $ (eoq) $ (eoq) Sting J. dean is of Chile ICS of $ (eoq) $ (eoq) far Smith Athens $ (eoq) ICS 6 7 9 10 11 5 13 12 16 17 14 4 25 18 20 27 22 26 23 15 19 $ (eoq) how 18 7 21 is ^ (boq) 24 tall EAT EAT Freq Freq HUM:DESC HUM:DESC 2 1 HUM:IND 1 Trie-Based Classifier Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(11/23)
^ who is the prime minister of Australia $ ? ? who is the dean of ICS $ (eoq) 1 6 7 8 9 10 11 12 ^ (boq) Trie-Based Classifier • Look-ahead process: Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(12/23)
ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(13/23)
ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(14/23)
ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(15/23)
ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations What is a mobile phone? default: 4: DESC 1: LOC prior: 2: DESC 1: LOC Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(16/23)
ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS ^ who is the prime minister of Australia $ ? ? who is the dean of ICS $ (eoq) 1 6 7 8 9 10 11 12 ^ (boq) Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(17/23)
ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS ^ boq whoWP is VBZ theDT prime JJ minister NN ofIN AustraliaNNP $ eoq ? ? who WP is VBZ the DT deanNN ofIN ICSNNP $ (eoq) 1 6 7 8 9 10 11 12 ^ (boq) Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(18/23)
Results Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(19/23)
Concluding Remarks • Numeric results are not better than ML • Showed that induced structure can obtain good results without using complex linguistic features • These approaches can produce rules in the form of regular expressions than can be manually adjusted to better fit the problem. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(20/23)
Future Work • Regular Expressions can be improved: • Hand-tuning unique REs found by ABL • Augmenting the complexity of REs by incorporating extra information • Wildcard match: • Words tend to be semantically related; • Seem to be the focus words of the questions Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(21/23)
Review • Sentence Classification Problem • Induced Structure Approach • Alignment Based Learning • Trie Based Classifier • Results • Concluding Remarks • Future Work Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(22/23)
Classifying Sentences using Induced Structure Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod pizzato@ics.mq.edu.au Centre for Language Technology Macquarie University Sydney, Australia