300 likes | 489 Views
Introduction to Natural Language Processing and Speech. Computer Science Research Practicum Fall 2012 Andrew Rosenberg. Artificial Intelligence. AI is no longer a single subdiscipline in computer science Natural Language Processing Speech/Spoken Language Processing Robotics
E N D
Introduction to Natural Language Processing and Speech Computer Science Research Practicum Fall 2012 Andrew Rosenberg
Artificial Intelligence • AI is no longer a single subdiscipline in computer science • Natural Language Processing • Speech/Spoken Language Processing • Robotics • Logic/Planning • “Cognitive Radio” • Machine Learning
Artificial Intelligence • What is intelligence? • How does computer science make “intelligent” tools, systems, algorithms? • Does computer science theory contribute to the definition of “intelligence”?
Language and Speech • What is the relationship between language and intelligence/thought/cognition?
Language and Speech • Most people consider language to be the most direct access to cognition and thought. • Language is core to Artificial Intelligence
Natural Language Processing • Information Retrieval (search) • Information Extraction • Knowledge Base Population • Summarization • Question Answering • Named Entity Recognition • Named Entity Linking, Co-reference resolution • Parsing • Sentiment Analysis
Information Retrieval • Input: Query • Output: Relevant Documents • Simplest approach: • Identify every document that contains the word or words in the query • What about related words? • “run” is related to “running” “runs” and “marathon” • How do you rank for relevance?
Information Extraction • Identify specific information from a single document or set of documents. • Who works for what organization • Who was born when? died when? • Who did what to whom. • This is *very* complex. • Domain specific systems are developed • How many different ways are there to say the same thing?
Named Entity Recognition and Linking • Bo Obama is Fat. POTUS says so. • The President called his dog fat. Mr. Obama, speaking to an interviewer said that The White House dog needs to go on a diet. • Recognize that “Bo Obama” “POTUS”, “The President” “Mr. Obama”, “The White House” are all ENTITIES? • How do you recognize that “POTUS”, “The President”, “Mr. Obama”, “him” all refer to the same person?
Parsing • Understanding grammatical structure from text. • Important step in some relation extraction, question answering, etc.
Sentiment Analysis • Can you tell the difference between a positive review and a negative one? • Some reviews come with labels • Some labels have no reviews • Some reviews have no “stars”
Spoken Language Processing • Automatic Speech Recognition • “Rich” Transcription • Speaker Recognition • Speech Synthesis • Text Normalization • Discourse and Dialog • Turn taking • Emotion Recognition
Speech Recognition • Converting speech to text. • Acoustic Modeling • Speech to Phoneme • Pronunciation Modeling • How are words pronounced? • Language Modeling • What sequences of words are most common?
Rich Transcription ALSO FROM NORTH STATION I THINK THE ORANGE LINE RUNS BY THERE TOO SO YOU CAN ALSO CATCH THE ORANGE LINE AND THEN INSTEAD OF TRANSFERRING UM I YOU KNOW THE MAP IS REALLY OBVIOUS ABOUT THIS BUT INSTEAD OF TRANSFERRING AT PARK STREET YOU CAN TRANSFER AT UH WHAT’S THE STATION NAME DOWNTOWN CROSSING UM AND THAT’LL GET YOU BACK TO THE RED LINE JUST AS EASILY
Rich Transcription Also, from the North Station... (I think the Orange Line runs by there too so you can also catch the Orange Line... ) And then instead of transferring (um I- you know, the map is really obvious about this but) Instead of transferring at Park Street, you can transfer at (uh what’s the station name) Downtown Crossing and (um) that’ll get you back to the Red Line just as easily.
Speaker/Author Recognition • What makes one speaker or author distinguishable from another? • Email hacks, Chat transcripts, Anonymous authors. • What are the acoustics which distinguish across two speakers? • Spectral Qualities • Prosodic Qualities • Lexical, syntactic and content usage
Speech Synthesis • Generating Speech from Text • There are tools like Festival, HTS and Mary TTS that make this relatively easy • Unit Selection • Use a corpus of a single speaker and paste together small slices of speech to make new words • Watson http://www.youtube.com/watch?v=WFR3lOm_xhE • Parametric Synthesis • Learn the spectral shape of different speech sounds, and synthesize them from oscillators and additive noise. • Mary TTS Web client • http://mary.dfki.de:59125/
Discourse and Dialog • How do you accomplish some task through discourse? • Understanding the semantics of a user turn • Generating an appropriate prompt • Dialog/Task planning. • Semantic Frame filling.
Emotion Recognition Three Hundred Twelve. Three Thousand Twelve. • What are the acousticproperties of emotionexpression? • Loudness, speaking rate, pitch, hesitation etc. • This type of analysis can extend to other speaker states • Intoxication • Sleepiness • Age • Gender • Personality Factors • Deception
Corpus Analysis • A corpus is a body of linguistic material • Corpora (plural of corpus) are generally shared across research groups • Allow for reproducible findings • Division of Labor • Describing phenomena is an important first step in most research. • What is the distribution of ratings? • What are the correlations between features and labels? • Are there errors in the annotation?
Some famous corpora • Penn Treebank • Parse trees and part of speech • ACE and KBP • Information Extraction • Switchboard • Conversational telephone speech • TIMIT • Phonetic Transcription • Boston Radio News Corpus • Prosodic Annotation
The “standard” approach • Identify labeled training data • Decide what to label • What is a data point? • Extract features based on the entity • Train a supervised classifier • Machine Learning • Evaluate • Cross-validation or a held-out test set.
How does machine learning fit in? • Automatically identifying patterns in data • Automatically making decisions based on data • Hypothesis: Data Learning Algorithm Behavior ≥ Data Programmer or Expert Behavior
Challenges • Conversational text • Social Media: Facebook, Twitter, reddit • Email • Chat/IM • Spoken Dialog Systems • Text Dialog Systems • Sentiment Analysis • Reviews • Collaborative Filtering • Natural Language Generation
Publicly available web-data • Social Media • twitter, google plus, forums, etc. • Reviews • amazon, tripadvisor, etc. • Wikipedia. • Find missing links in wikipedia • Find potentially incorrect information in wikipedia • YouTube videos, soundcloudsongs. • Can you classify topics? • Music genres?
Use of web technologies • The feedback loop. • The use of the tool provides information that can be used to improve the tool. • The use of the product provides training data. • Which search results are best. • Which ads are useful • Which recommendations are correct
Feedback in Google • Rank the top hits in response to a query • When someone clicks on a link, boost its ranking/relevance • Same for ads • UI/UX experimnets
Feedback in Amazon • Try to give users an offer. • If they take it increase its value.
Feedback in Netflix • Suggestions for people “like you” • How do you group people • How do you group movies
Project ideas • Look at the most recent conferences in NLP and Speech • ICASSP, Interspeech, ASRU • ACL, EMNLP, NAACL-HLT, CoLING • Also, Journals • Computational Linguistics • Computer Speech and Language • IEEE transactions on Audio Speech and Language Processing • Consider real-world problems and applications