1 / 11

Relational Learning of Pattern-Match Rules for Information Extraction

Relational Learning of Pattern-Match Rules for Information Extraction. Presentation by Tim Chartrand of A paper by Mary Elaine Califf and Raymond J. Mooney. Introduction. Information Extraction (IE) is the task of locating specific pieces of information in NL text

jun
Download Presentation

Relational Learning of Pattern-Match Rules for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper by Mary Elaine Califf and Raymond J. Mooney

  2. Introduction • Information Extraction (IE) is the task of locating specific pieces of information in NL text • IE is an important subpart of text understanding • IE systems are difficult and time consuming to build and they don’t port well to different domains • Researchers are combining learning methods with NLP methods to automate IE

  3. Overview of RAPIER • RAPIER – Robust Automated Production of Information Extraction Rules • Learn IE rules automatically • Use a corpus of documents paired with filled templates • Resulting rules do not require prior parsing or subsequent processing • Uses limited syntactic information from a POS tagger • Induced patterns incorporate semantic classes • Rules characterize slot-fillers and their context

  4. RAPIER Rules • Consist of three parts: • Pre-filler pattern – matches text immediately preceding the extracted information • Filler pattern – matches the exact text to be extracted • Post-filler pattern – matches text after information • Each pattern is a sequence of pattern items or pattern lists • Pattern item specifies constraints for one word or symbol • Pattern list specifies constraints for 0..n words or symbols • Constraints include: • List of words, one of which must match the item • POS tag • Semantic class

  5. RAPIER Rules (cont.)

  6. Learning Algorithm located in Atlanta, Georgia. offices in Kansas City, Missouri. For each slot, S in the template being learned SlotRules = most specific rules from document S while compression has failed fewer than lim times randomly select r pairs of rules from SlotRules find the set L of generalizations of the fillers of the rule pairs create rules from L, evaluate, and initialize RulesList let n = 0 while best rule in RuleList produces spurious fillers and weighted information value of best rule is improving increment n specialize each rule in RuleList with generalizations of the last n items of the pre-filler patterns of the rule pair and add specializations to RuleList specialize each rule in RuleList with generalizations of the last n items of the post-filler patterns of the rule pair and add specializations to RuleList if best rule in RuleList produces only valid fillers Add it to SlotRules Remove empirically subsumed rules

  7. Experimental Results • The task: Extract information from coputer-related job postings • 17 slots used, including employer, salary, etc. • Results do not employ semantic categories • 100 document dataset with filled templates with 10-fold cross validation • Measured precision, recall, and F-measure

  8. Experimental Results – continued • Performance: • Is comparable to Crystal on a medical domain • Is better than AutoSlog and AutoSlog-TS on MUC-4 terrorism task • Is hard to compare because of the different domains tested • Is good because precision is most important

  9. Related Work • Resolve • Uses decision trees • Uses annotated coreference examples • Crystal • Uses a clustering algorithm to build a dictionary of extraction patterns • Requires patterns identified by an expert • Requires prior syntax analysis to identify syntactic elements and their relationships • AutoSlog • Specializes a set of general syntatic patterns • An expert must examine the patterns it produces • Requires prior syntax analysis • Liep • Requires prior syntax analysis • Makes no real use of semantic information • Has not been applied to complex domains

  10. Related Work – BYU DEG • RAPIER rules correspond closely to DEG data frames. • Data frames are finer-grained, based on character patterns, whereas rules are based on word patterns • Pre-filler and Post-filler patterns correspond closely to data frame contexts and key words • Semantic categories correspond closely with lexicons • Not mentioned how RAPIER handles multiple record documents • Rapier data structure is given by the template (slots) defined in the input data • RAPIER is very similar in purpose to what Joe is trying to do – learn extraction rules based on a filled in form

  11. Conclusions • Extracting desired pieces of information from NL text is important • Manually constructing IE systems too hard • RAPIER uses relational learning to build a set of pattern-match rules given a database of texts and filled templates • Learned patterns employ syntactic and semantic information to match slot fillers and context • Fairly accurate results can be obtained for a real-world problem with relatively small datasets • RAPIER compares favorably with other IE learning systems

More Related