190 likes | 322 Views
Detecting Contextual Event Information in Emails. Zubair . A. Shaikh (Professor) Shaukat Wasi (PhD Fellow ) Jawwad Shamsi (Assistant Professor) National University of Computer and Emerging Sciences (FAST-NU). Motivation.
E N D
Detecting Contextual Event Information in Emails Zubair. A. Shaikh (Professor) Shaukat Wasi (PhD Fellow) JawwadShamsi(Assistant Professor) National University of Computer and Emerging Sciences (FAST-NU)
Motivation • Emails are commonly used for broadcasting upcoming events to target groups and planning events among interested ones. • One receives a lot of emails in his inbox on daily basis and it’s difficult to manage all mails with due attention. • People keep reminders at calendars for important Events, but again reading an event email and manually placing the Event and its contextual information to the Calendar itself is a tedious and time consuming task. • Automatic detection of the Occurrence (Title) and its contextual information (Location, Temporal information, Participants) associated with the email shall significantly help the user to manage and plan important Events.
Problem Definition • Consider a set of emails where each email may or may not have some Event associated with it. Our goal is to extract the actual occurrence and the contextual information from an email so that the emails containing some event are identified and the actual event associated with the email is detected. • Event Definition: An event is something that occurs in a certain place at a certain time involving some actor(s). • (We have extended the event definition as used in the TDT research [3])
The Proposed Approach • We consider an email to have an event description, if we are able to extract answers to the following ubiquitous information components: What (Occurrence), Where (Place), When (Time) and Who (Actor). An email lacking in any of these components is considered as a non-event email. • Finite State Automata (FSA) are used to extract phrases revealing the Places, temporal information and the actual occurrence. The transaction within states are triggered on Part of Speech (POS) tags. • We have extended the POS tags for our purpose. Sample tag sets and their corresponding FSA are presented in next slides.
Extended POS tag set for identification of Place revealing Phrases
FSA for identification of Places P1 determines phrases like “At CRUC Room”. P2 is for phrases like “AT CRUC Room in FAST Karachi”. P3 accepts the exceptional style phrase like “Venue: FAST Juice Bar”. P4 and P6 determine phrases with more than one place like “At FAST Karachi and FAST Islamabad” and “At Karachi, Lahore and Islamabad” respectively. Phrases like “At Degree College, Nazimabad, Karachi” are accepted by P5.
Extended POS tag set for identification of Date/Time revealing Phrases
FSA for detection of Date/Day (Duration) • DD1 accepts phrases expressing duration like "in first quarter”. • DD2 is for phrases like “in summer”. • Durations spanned over a week, month or quarter etc is determined by DD3. • Phrases revealing duration start and end points are recognized by DD4. • DD5 accepts phrases like “in the beginning of the month”.
FSA for detection of Time (Specific Point/Duration) • Exact time phrases like “3:00 AM” are recognized by T1. • T2 accepts the specific Time representation style like “Time: 3:00 PM-4:00 PM”. • T3 is for phrases like “in two hours”. • TD1 recognizes phrases depicting time durations like “in evening” • TD2 is for phrases like “from 3:00 to 4:00”.
Canonicalization • Canonicalization process is used to represent the temporal information in the form of actual time expressions. • Our representation has 4 digits for the year, 2 digits for the month, 2 digits for the date, 1 digit for the week, 1 digit for the day of the week and 1 digit for the quarter of the year. • A phrase like “in the first quarter of the next year” is converted into the representation <20110101, 20110430>, if the email is sent on some date in 2010. Similarly a phrase “on the first day of the second week of august” is converted to 20100808:2:1:2, if the email is posted on some date in 2010. • The time used as the reference point for calculating exact clock time is the time of creation of email.
FSA for detecting Actual Occurence • E1 accepts phrases like “have a meeting”. • E2 recognizes phrases like “have scheduled”. • E3 is for phrases like “invites to attend”. • The passive voice phrases like “Seminar is scheduled” are determined by E6. • Occurrences described through titles are recognized by E5. • E6 accepts phrases like “have invited to attend”. • It must be noted that since our area is information extraction and not NLP, therefore we do not claim to address each and every type of occurrence revealing phrases.
Results…. • 1000 emails from our inbox were selected as a data corpus. • 230 of the emails were non-event emails and the rest 770 were related to different events like Conferences, Seminars, Lectures, Personal Meetings, Weddings, and Sport Events etc. • We used Precision, Recall and FSCORE for evaluating our work.
Results….. individual information components Different Event Types
Future Directions • We aim to further improve the system with different features. • Firstly the identification of actual participants is certainly desirable. • Similarly, extracted values for venue and occurrence must be more accurate. • Techniques like Hidden Markov Models or n-grams may help to achieve the above targets. • There is a need to cater with emails containing more than one event and emails containing more than one value for any of the ubiquitous information component.
References… • [1] J. Allan, R. Papka, and V. Lavrenko, "On-Line New Event Detection and Tracking," presented at SIGIR'98, Melbourne, Australia, 1998. • [2] Y. Yang, T. Pierce, and J. Carbonell, "A Study on Retrospective and Online Event Detection," presented at SIGIR'98, Melbourne, Australia, 1998. • [3] J. Allan, et al., "Topic Detection and Tracking Pilot Study Final Report," in DARPA Broadcast News Transcription and Understanding Workshop, 1998. • [4] Xiaoming Zhang and Z. Li, "Online New Event Detection Based on Iplsa " presented at ADMA, Beijing, China, 2009. • [5] Zhen Lei, et al., "Event Detection and Tracking Based on Improved Incremental K-Means and TransductiveSvm " presented at ICIC 2008, Shanghai, China, 2008. • [6] JuhaMakkonen, Helena Anonen-Myka, and M. Salmenkivi, "Simple Semantics in Topicdetection and Tracking," Information Retreival Journal, vol. 7, pp. 347-368, 2004. • [7] FumiyoFukumoto and Y. Suzuki, "Event Tracking Based on Domain Dependency," presented at Event Tracking based on Domain Dependency, Athens, Greece, 2000. • [8] G. Kumaran and J. Allan, "Text Classification and Named Entities for New Event Detection," presented at SIGIR'04, Sheffield, South Yorkshire, UK, 2004. • [9] Z. Kuo, L. J. Zi, and W. Gang, "New Event Detection Based on Indexing-Tree and Named Entity," presented at SIGIR'07, Amsterdam, The Netherlands, 2007. • [10] Y. Yang, et al., "Learning Approaches for Detecting and Tracking News Events," IEEE Intelligent Systems Special Issue on Applications of Intelligent Information Retrieval, vol. 4, pp. 32-43, 1999.
References…. • [11] C. C. Chen, M. C. Chen, and M. S. Chen, "An Adaptive Threshold Framework for Event Detection Using Hmm-Based Life Profiles," ACM Transactions on Information Systems, vol. 27, 2009. • [12] JunliangBai, et al., "An Efficient Algorithm of Hot Events Detection in Text Streams," presented at Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2010, Huangshan 2010. • [13] Tingting He, et al., "Semi-Automatic Hot Event Detection," presented at ADMA, Xian, China, 2006. • [14] ManolisPlatakis, DimitriosKotsakos, and D. Gunopulos, "Discovering Hot Topics in the Blogosphere," presented at 2nd Panhellenic Scientific Student Conference, Samos, Greece, 2008. • [15] K. Chen, L. Luesukprasert, and S. T. Chou, "Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling," IEEE Transactions on Knowledge and Data Engineering, vol. 19, 2007. • [16] X. Wan, E. Milios, and N. Kalyaniwalla, "Link-Based Event Detection in Email Communication Networks," presented at SAC'09, Honolulu, Hawaii, U.S.A, 2009. • [17] Q. Zhao and P. Mitra, "Event Detection and Visualization for Social Text Streams," presented at ICWSM, Coloroda, USA, 2007. • [18] Q. Zhao, P. Mitra, and B. Chen, "Temporal and Information Flow Based Event Detection from Social Text Streams," presented at American Association for Artificial Intelligence (AAAI 2007), Vancouver, British Columbia, Canada 2007. • [19] V. Pekar, "Information Extraction from Email Announcements," in Lncs, Natural Language Processing and Information Systems. Berlin Heidelberg: Springer Verlag, 2005, pp. 372-375. • [20] C. X. Lin, et al., "Pet: A Statistical Model for Popular Events Tracking in Social Communities," presented at SIGKDD, New York, USA, 2010.
References • [21] V. Ha-Thuc, et al., "Event Intensity Tracking in Weblog Collections," presented at ICWSM-DCW' 09, California, USA, 2009. • [22] H. Sayyadi, M. Hurst, and A. Maykov, "Event Detection and Tracking in Social Streams," presented at Association for Advancement of Artificial Intelligence (AAAI'09), 2009. • [23] H. Becker, M. Naaman, and L. Gravano, "Event Identfication in Social Media," presented at Twelfth International Workshop on the Web Databases (WebDB 2009), Providence, USA, 2009. • [24] H. Becker, M. Naaman, and L. Gravano, "Learning Similarity Metrics for Event Identification in Social Media," presented at WSDM, New York, USA, 2010. • [25] P. King and S. H. Mayeng, "Usefulness of Temporal Information Automatically Extracted from News Articles for Topic Tracking," ACM Transactions on Asian Language Information Processing, vol. 3, pp. 227-242, 2004. • [26] C.-N. Seon, H. Kim, and H. Kim, "Information Extraction Using Finite State Automata and Syllable N-Gramsin a Mobile Environment," presented at ACL-08: HLT Workshop on Mobile Language Processing, Ohio, USA, 2008. • [27] J. HOBBS, et al., "Fastus: Acascaded Finite-State Transducer for Extracting Information from Natural-Language Text," presented at MUC, Cambridge, MA, 1997. • [28] K. Y. S. P. Kim, S.H.MYAENG, J.C.RYOU, "Extracting Temporal Information from Korean News Articles for Event Detection and Tracking," presented at 20th International Conference on Computer Processing of Oriental Languages, 2003. • [29] L.FERRO, et al., "Tides Temporal Annotation Guidelines," 2001.