120 likes | 222 Views
ME eting S chedule Extractor (MES). November 25, 2010. Kyoungryol Kim Semantic Web Research Center, KAIST. Table of Contents. Introduction Related Works MES System Specific Problems Meeting Location Start Time, End Time Attendee Schedule. Introduction.
E N D
MEeting Schedule Extractor (MES) November 25, 2010 Kyoungryol Kim Semantic Web Research Center, KAIST
Table of Contents • Introduction • Related Works • MES System • Specific Problems • Meeting Location • Start Time, End Time • Attendee • Schedule
Introduction • Schedule management is now one of the most important task for our life, automatic schedule information extraction is strongly required by people. • Schedule information can be found in 2 types of emails : • Meeting Announcement email • usually use standard format • include all information about the meeting • Appointment email (Conversation-style) • use spoken language • imperfect, uncertain information can be used • schedule can be changed frequently We will have a meeting as follow : Time : Nov. 25, 2010 PM 7:30-9:00 Location : KAIST CS B/D #2441 ..... P1 : ...Are you available on Monday?.. P2 : .. Alright, how about around 3 PM? P1 : .. Dear professor, Then I will be right there at 3.. P2. ...By the way, can you come together with TA?... • * Referenced from [DH Choi 2010]
Introduction • Meeting announcement consist of 2 PARTS: • itemized text (66%) • non-itemized text (34%) : natural language text • 96% of announcements are consist of mixed contents (itemized + non-itemized) • To get accurate information, we should look at both of them. (* The statistic data is based on our corpus) itemized non-itemized Meeting Announcement • Time : Nov. 25, 2010 PM 7:30 • Attendee : Prof. Choi, Henry • ... We will have a meeting on Thursday at CS B/D #2441. ...
Introduction • Assume that we can extract information from itemized text,and we can classify whether the sentence is itemized text or non-itemized text. • In this research, to gather more information, we just focus on the information buried in non-itemized text. Meeting Announcement We will have a meeting on Thursday at CS B/D #2441.... • Time : .... • Location : ... itemized non-itemized Extracted Schedule
Introduction • Domain • Korean Meeting announcement • Task • Schedule Information Extraction to the below information types • Start time, End time • Meeting Location • Attendee • Input • Meeting announcement email • Output • Extracted information Research Goal Extracted Schedule Meeting Announcement Email Internet-based Calendar (Google Calendar, iCalendar) Possible application
Related Works • CMRadar [Modi et al. 2005] • Personal assistant agent for calendar management, from natural language processing of incoming scheduling-related emails, to making autonomous scheduling decisions. • They focused on the design of the system : • Template data structure to communicate between the components. • Modular design. • Limitations • They just followed research on applying state-of-the-art NLP techniques • Defining some parsing rules specialized in English • Even they didn't consider about meeting announcement email.
Related Works • SIES [Min et al. 2005] • Sogang Information Extraction System, SIES • Corpus was Korean email documents for scheduling. • 245 emails (23.5 sent. on average) • Target information types • Attendee, Location, Time, Date • Features • Context feature : Lexico-semantic pattern → to avoid data sparseness problem • Sentence, document feature : position, # of occurence, surrounding word • Limitations • Overall performance was low except time and date, and they used too small corpus • Since they didn't normalize time and date, cannot be integrated with calendar system.
Input MES System Meeting Announcement (Email) itemized text processing module Sentence Type Classifier Output Extracted Information : - Start time / End time - Meeting Location - Attendee non-itemized text processing module Start/End Time Classifier NER Time ( Location ) Person Meeting Location Classifier Attendee Classifier
Classifier : Meeting Location • Input • Location tagged document • Output • Meeting Location • Features • Start Time • Surrounding words • TODO : • Apply position feature forsentences, documents • light syntactic pattern feature (e.g. lexico-syntactic pattern) Input Named Entity Recognition Meeting Location Classification Corpus Tagging Training Corpus Classification Classification Training Model Classification System Training System Output
Other Components • NER • Named Entity Recognizer specialized on meeting announcement • 3 target types: Time / Location / Person • Classifiers • Start / End time classifier to the time-type NE • need to think later • Attendee classifier person-type NE • need to think later
Schedule • ~December 3, 2010 • Study and summarize >5 related research papers • Apply position feature for sentences, documents • light syntactic pattern feature (e.g. lexico-syntactic pattern) • F1 > 85% (Meeting Location classifier) • ~December 31, 2010 • Study and summarize time extraction paper. • Design a classifier for start/end time information. • Construct system, F1>85%(goal)