1 / 23

Classifying Sentences using Induced Structure

Classifying Sentences using Induced Structure. Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod pizzato@ics.mq.edu.au Centre for Language Technology Macquarie University Sydney, Australia. Overview. Sentence Classification Problem Induced Structure Approach

amil
Download Presentation

Classifying Sentences using Induced Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying Sentences using Induced Structure Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod pizzato@ics.mq.edu.au Centre for Language Technology Macquarie University Sydney, Australia

  2. Overview • Sentence Classification Problem • Induced Structure Approach • Alignment Based Learning • Trie Based Classifier • Results • Concluding Remarks • Future Work Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(2/23)

  3. Sentence Classification • Assist several NLP task: document summarisation, information extraction, question answering, among others. • Question Classification: • Definition: What is a golden parachute? • List: Name two brands of shaving cream. • Factoid questions: • HUM:IND: Who discover the penicillin? • LOC:CITY: What is the capital of Australia? • FOOD, PLANT, ANIMAL: What do bats eat? Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(3/23)

  4. Current approaches • Handcrafted regular expressions: • Pros: Rules are understandable. Few rules satisfy a large amount of the questions (Zip’s Law). • Cons: Difficult to construct. Limited performance. • Machine Learning: • Pros: Computer automatically finds “rules”. • Cons: Rules and knowledge generated are not readable. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(4/23)

  5. TrainingData Extract Structure Structure Sentence SentenceClassifier Class Classifying by Induced Structure • Process fits between ML and RE • Learn patterns from sentences; • Use these patterns in the classification phase; Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(5/23)

  6. Classifying by Induced Structure • Propose two distinct approaches: • Alignment-Based Learning Classifier (ABL) • ABL is a generic grammatical inference framework, that learns structure using plain text. • Trie-Based Classifier • Classifies sentences based on partial matches in a Trie structure. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(6/23)

  7. Alignment-Based Learning Classifier (ABL) • Developed under the idea that constituents in sentences can be interchanged. • The book is on the table. • The car is on the driveway. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(7/23)

  8. book table the is on the car driveway Alignment-Based Learning Classifier (ABL) • Developed under the idea that constituents in sentences can be interchanged. • The (book) is on the (table). • The (car) is on the (driveway). Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(8/23)

  9. Alignment-Based Learning Classifier (ABL) Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(9/23)

  10. a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|...|r|...|z a|b|c|d|...|r|...|z car zebra a|b|c|d|e|f|...|z Trie-Based Classifier • T(S) = {T(S/a1), T(S/a2) ,…,T(S/ar)} • Where S is the set of sentences and S/an are the sentences starting with an, but stripped of the initial element. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(10/23)

  11. is 3 1 where the 2 8 who is $ (eoq) $ (eoq) Sting J. dean is of Chile ICS of $ (eoq) $ (eoq) far Smith Athens $ (eoq) ICS 6 7 9 10 11 5 13 12 16 17 14 4 25 18 20 27 22 26 23 15 19 $ (eoq) how 18 7 21 is ^ (boq) 24 tall EAT EAT Freq Freq HUM:DESC HUM:DESC 2 1 HUM:IND 1 Trie-Based Classifier Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(11/23)

  12. ^ who is the prime minister of Australia $ ? ? who is the dean of ICS $ (eoq) 1 6 7 8 9 10 11 12 ^ (boq) Trie-Based Classifier • Look-ahead process: Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(12/23)

  13. ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(13/23)

  14. ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(14/23)

  15. ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(15/23)

  16. ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS Implementations What is a mobile phone? default: 4: DESC 1: LOC prior: 2: DESC 1: LOC Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(16/23)

  17. ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS ^ who is the prime minister of Australia $ ? ? who is the dean of ICS $ (eoq) 1 6 7 8 9 10 11 12 ^ (boq) Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(17/23)

  18. ABL Hypo / Unhypo Words / POS default / prior Trie-based Strict / Flex Words / POS ^ boq whoWP is VBZ theDT prime JJ minister NN ofIN AustraliaNNP $ eoq ? ? who WP is VBZ the DT deanNN ofIN ICSNNP $ (eoq) 1 6 7 8 9 10 11 12 ^ (boq) Implementations Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(18/23)

  19. Results Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(19/23)

  20. Concluding Remarks • Numeric results are not better than ML • Showed that induced structure can obtain good results without using complex linguistic features • These approaches can produce rules in the form of regular expressions than can be manually adjusted to better fit the problem. Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(20/23)

  21. Future Work • Regular Expressions can be improved: • Hand-tuning unique REs found by ABL • Augmenting the complexity of REs by incorporating extra information • Wildcard match: • Words tend to be semantically related; • Seem to be the focus words of the questions Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(21/23)

  22. Review • Sentence Classification Problem • Induced Structure Approach • Alignment Based Learning • Trie Based Classifier • Results • Concluding Remarks • Future Work Van Zaanen, Pizzato, Molla; SPIRE-2005. Buenos Aires, 2-4 November 2005.(22/23)

  23. Classifying Sentences using Induced Structure Menno Van Zaanen Luiz Augusto Pizzato Diego Mollá-Aliod pizzato@ics.mq.edu.au Centre for Language Technology Macquarie University Sydney, Australia

More Related