Machine Learning for Information Extraction

Machine Learning for Information Extraction Li Xu

Objective • Learn how to apply the machine learning concept to the application • Learn how to improve the performance of the existed application by applying the machine learning algorithms

Introduction • Information Extraction (IE) is concerned with extracting the relevant data from a collection of document. • Key component: extraction patterns. • Machine Learning algorithms.

IE for Free Text • Syntactic and semantic constraints • AutoSlog • LIEP • PALKA • CRYSTAL • CRYSTAL + Webfoot • HASTEN

IE from online Document • WHISK (Soderland 1998) • Domain: Rental Ads • Precision: ~95%; Recall: 73%-90% • RAPIER (Califf & Mooney 1997) • Domain: software jobs • Precision: 84%; Recall: 53% • SRV (Freitag 1998) • Domain: Seminar announcement • Precision: Speaker, 75%; Location,75%; start time 99%, end time 96%.

WHISK

RAPIER

SRV

Problems • Bottom-up search • RAPIER • WHISK • Single-slot extraction rules • SRV • RAPIER • Heavily depend on the layout pattern

Obituary Ontology

Improvement

Lexical Object • Relational Learning • FOIL • Feature design • Regular expression • Rote Learning

Multi-slot Hierarchy

Multi-slot Boundary • Relational Learning • Feature Design • Individual heuristics • Combining heuristics

Conclusion • How to applying the machine learning algorithm to IE? • What is the problem for each system? • How to improve an existed IE approach through machine learning? And how to avoid the problems appeared in other machine learning based IE systems?

Machine Learning for Information Extraction

Machine Learning for Information Extraction

Presentation Transcript

Information Extraction Lecture 12 – More Machine Learning

Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning)

Learning Hidden Markov Model Structure for Information Extraction

Discriminative Learning of Extraction Sets for Machine Translation

Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning)

information extraction

Machine Learning for multimedia information retrieval

Text Learning and Information Extraction

Learning Effective Patterns for Information Extraction

Information Extraction from HTML: General Machine Learning Approach Using SRV

Learning for Biomedical Information Extraction with ILP

Relation Extraction and Machine Learning for IE

Plain Text Information Extraction (based on Machine Learning )

Information Extraction Language Technology (A Machine Learning Approach) 24 March 2005

Coupled Semi-Supervised Learning for Information Extraction

Introduction to Machine Learning for Information Retrieval

Information Extraction from the WWW using Machine Learning Techniques

ONDUX On-Demand Unsupervised Learning for Information Extraction

Machine Learning for Personal Information Management

Plain Text Information Extraction (based on Machine Learning )