80 likes | 256 Views
Automatic Summarization. Student: David Kent Professor: Dr. Rakesh Verma. Types of Summarization. Late for a meeting Paying by the letter War. Single Document Multi-Document Corpus-Based Update Summarization. Motivation. Human Summary Strategies. Low-Level Deletion Copying
E N D
Automatic Summarization Student: David Kent Professor: Dr. RakeshVerma
Types of Summarization • Late for a meeting • Paying by the letter • War • Single Document • Multi-Document • Corpus-Based • Update Summarization Motivation
Human Summary Strategies • Low-Level • Deletion • Copying • Mid-Level • List substitution • Paragraph summary • Higher-Level • Multi-paragraph abstraction • Summary Structure
Problems • Understanding • Machine Learning • Mimicking understanding • Bag of Words Model • Vector Analysis and Clustering • TextRank • Partial Linguistic Understanding • WordNet • FrameNet
Simple Metrics • Titles and headings • Sentence location • Cue Words/Key Words • Term Frequency . Inverse Document Frequency Less Simple Metrics • Vector Analysis • Clustering of words and/or sentences • Use of Lexical Databases (WordNet, FrameNet, etc.)
Our Technique • Extraction-based • Corpus-free with a mélange of low to mid-level lexical techniques • Relationship to Headings • Sample Summary:
WordNet • Developed at Princeton, hand-built by lexicographers • Every word defined by text, part of speech, and a sense number. • Basic unit of organization: Synonym Set (Synset) • Four forests: Nouns, Verbs, Adjectives, and Adverbs
Parsing Text • Stanford Part of Speech tagger. • Used to determine if a word is a noun, verb, adjective, or adverb. • Stanford Named Entity Recognizer • Tags names, locations, and organizations (proper nouns). • SenseLearner • Determines which WordNet sense is most appropriate.