1 / 15

Machine Learning for Information Extraction

Machine Learning for Information Extraction. Li Xu. Objective. Learn how to apply the machine learning concept to the application Learn how to improve the performance of the existed application by applying the machine learning algorithms. Introduction.

lyle-walker
Download Presentation

Machine Learning for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning for Information Extraction Li Xu

  2. Objective • Learn how to apply the machine learning concept to the application • Learn how to improve the performance of the existed application by applying the machine learning algorithms

  3. Introduction • Information Extraction (IE) is concerned with extracting the relevant data from a collection of document. • Key component: extraction patterns. • Machine Learning algorithms.

  4. IE for Free Text • Syntactic and semantic constraints • AutoSlog • LIEP • PALKA • CRYSTAL • CRYSTAL + Webfoot • HASTEN

  5. IE from online Document • WHISK (Soderland 1998) • Domain: Rental Ads • Precision: ~95%; Recall: 73%-90% • RAPIER (Califf & Mooney 1997) • Domain: software jobs • Precision: 84%; Recall: 53% • SRV (Freitag 1998) • Domain: Seminar announcement • Precision: Speaker, 75%; Location,75%; start time 99%, end time 96%.

  6. WHISK

  7. RAPIER

  8. SRV

  9. Problems • Bottom-up search • RAPIER • WHISK • Single-slot extraction rules • SRV • RAPIER • Heavily depend on the layout pattern

  10. Obituary Ontology

  11. Improvement

  12. Lexical Object • Relational Learning • FOIL • Feature design • Regular expression • Rote Learning

  13. Multi-slot Hierarchy

  14. Multi-slot Boundary • Relational Learning • Feature Design • Individual heuristics • Combining heuristics

  15. Conclusion • How to applying the machine learning algorithm to IE? • What is the problem for each system? • How to improve an existed IE approach through machine learning? And how to avoid the problems appeared in other machine learning based IE systems?

More Related