130 likes | 138 Views
Learn how to extract vital information from vast text data using NLP techniques at NYU's Proteus Project. Discover methods to understand and process language for accurate insights. Explore weakly supervised learning and active learning for robust knowledge discovery. Join the course for in-depth understanding.
E N D
NYU Natural Language Processing at NYU:the Proteus Project Ralph Grishman September 2009
Proteus Project Faculty • Ralph Grishman • Satoshi Sekine • Adam Meyers http://nlp.cs.nyu.edu/
‘Just the Facts’ • Vast amount of information is now available on-line in text form • but getting ‘the facts’ can be very hard and slow • Where has Secretary Clinton been over the last month? • Which places on the East Coast have had swine flu outbreaks this month? • To move from search to question answering we need more than a bag of words • we need to figure out who-did-what-to-whom
Understanding natural language isn’t easy • The rebels strafed the car … with automatic weapons fire. … with the Minister and his deputy. • They … died instantly. … were promptly arrested. Understanding language requires a lot of knowledge.
How to get all this knowledge? • By hand … too expensive • Use weakly supervised learning • Give a few examples (‘seeds’) • Use very large text corpus to learn similar examples
Knowledge Discovery: An Example • Goal: want to keep track of all the hirings and departures of executivesneed to find all the ways such events are described • Method: • identify a few seed patterns • retrieve documents containing patterns • find subject-verb-object pattern with • high frequency in retrieved documents • relatively high frequency in retrieved docs vs. other docs • add pattern to seed and repeat
#1: pick seed pattern Seed: < person retires >
#2: retrieve relevant documents Seed: < person retires > Fred retired. ... Harry was named president. Maki retired. ... Yuki was named president. Relevant documents Otherdocuments
#3: pick new pattern Seed: < person retires > < person was named president > appears in several relevant documents Fred retired. ... Harry was named president. Maki retired. ... Yuki was named president.
#4: add new pattern to pattern set Pattern set: < person retires > < person was named president >
Results for some event types, unsupervised learning can do as well as manual pattern development Recall and precision asa function of number of iterations of learner:
Robust Learning • Quality of learned patterns is uneven • ambiguity of language leads us to learn incorrect patterns • Need to identify cases of uncertainty • Potential linguistic ambiguities • With multiple classifiers using distinct features, cases where they disagree • Query user for selected uncertain examples • Weakly supervised learning +active learning robust, rapid knowledge discovery
For More Information • Project web site nlp.cs.nyu.edu • Course G22.2590 - Natural Language Processing (Spring 2010)