500 likes | 1.1k Views
Human Language Technology. Overview. Acknowledgement. Material for some of these slides taken from J Nivre, University of Gotheborg, Sweden D. Jurafsky & J. Martin. Human Language Technology. HLT sometimes referred to as Natural Language Processing focus on linguistic processing
E N D
Human Language Technology Overview HLT
Acknowledgement • Material for some of these slides taken from • J Nivre, University of Gotheborg, Sweden • D. Jurafsky & J. Martin HLT
Human Language Technology HLT sometimes referred to as • Natural Language Processing • focus on linguistic processing • Computational Linguistics • focus on understanding language • Language Engineering • focus on practical tasks and results HLT
HLT – Engineering v. Science • Engineering • NLP is concerned with the design and implementation of effective NL input and output components for computational systems (Robert Dale 2000) • Science • The use of computers for linguistic research and applications HLT
HLT is Interdisciplinary • Linguistics • Theoretical • Applied • Computer Science • Algorithms • Compiling Techniques • Artificial Intelligence • Understanding, reasoning • Intelligent Action HLT
HLT is Commercial • Lot’s of exciting stuff going on… Powerset HLT
Google Translate HLT
Google Translate HLT
Web Q/A HLT
Web Analytics • Data-mining of social media • weblogs, discussion forums, message boards, user groups, and other forms of user generated media • Sentiment analysis, social network analysis • Product marketing information • Opinion tracking over space and time • Social network analysis • Buzz analysis (what’s hot, what topics are people talking about right now). HLT
HLT can help with • Understanding how language works • by implementing complex theories directly • More Natural Communication • development of multimodal M/M communication: language, speech, gesture • Development of multilingual applications • Knowledge Management • Language is the fabric of the web HLT
Language Enabled Applications • What makes an application a language processing application (as opposed to any other piece of software)? • An application that requires the use of knowledge about human languages • Example: Is Unix wc (word count) an example of a language processing application? HLT
Language Enabled Applications • Word count? • When it counts words: Yes • To count words you need to know what a word is. That’s knowledge of language. • When it counts lines and bytes: No • Lines and bytes are computer artifacts, not linguistic entities HLT
Small Spelling correction Hyphenation Medium Word-sense disambiguation Named entity recognition Information retrieval Big Question answering Conversational agents Automatic Summarisation Machine translation Stand-alone Enabling applications Funding/Business plans Topics: Applications HLT
Big Applications • These kinds of applications require a tremendous amount of knowledge of language. • Consider the following interaction with HAL the computer from 2001: A Space Odyssey HLT
HAL from 2001 • Dave: Open the pod bay doors, Hal. • HAL: I’m sorry Dave, I’m afraid I can’t do that. • http://www.youtube.com/watch?v=kkyUMmNl4hk HLT
What’s needed? • Speech recognition and synthesis • Knowledge of the English words involved • What they mean • How groups of words fit together into groups • What the groups mean • How the groups relate to each other. HLT
What’s needed? • Dialog • It is polite to respond, even if you’re planning to kill someone. • It is polite to pretend to want to be cooperative (I’m afraid, I can’t…) HLT
Summary of Application Areas • Document Processing • Classification • Summarisation • Information Extraction • Question Answering • Information Retrieval • Dialogue • Multilinguality • Machine Translation • Translation tools • Multimodality • speech • intonation • image HLT
Basic Problems • Analysis • Conversion of NL input to internal representations • Generation • Conversion of internal representations to NL output • Issues • What kind of input/output/representations? • Role of learning • Supervised v unsupervised • What training data is available? • System Evaluation HLT
Levels of Linguistic Knowledge • Phonetics/Phonology: sound structure • Morphology: word structure • Syntax: sentence structure • Semantics: meanings • Pragmatics: use of language in context • Discourse: paragraphs, texts, dialogues HLT
Each level of knowledge is associated with an encapsulated set of processes. Interfaces are defined that allow the various levels to communicate. This often leads to a pipeline architecture. Processing Pipelines HLT
Ambiguity • Computational linguists are obsessed with ambiguity • Ambiguity is a fundamental problem of computational linguistics • Resolving ambiguity is a crucial goal • Ambiguity arises at different levels of analysis HLT
Ambiguity – different flavours • LexicalI made her duck • SyntacticYoung men and women • ReferentialShe did it • PragmaticCan you pass the salt? HLT
Ambiguity • Find at least 5 meanings of this sentence: • I made her duck • I cooked waterfowl for her benefit (to eat) • I cooked waterfowl belonging to her • I created the (plaster?) duck she owns • I caused her to quickly lower her head or body • I waved my magic wand and turned her into undifferentiated waterfowl HLT
Ambiguity is Pervasive I made her duck • I caused her to quickly lower her head or body • Lexical category: “duck” can be a N or V • I cooked waterfowl belonging to her. • Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun • I made the (plaster) duck statue she owns • Lexical semantics: “make” can mean “create” or “cook” HLT
Ambiguity is Pervasive • Grammar: Make can be: • Transitive: (verb has a noun direct object) • I cooked [waterfowl belonging to her] • Ditransitive: (verb has 2 noun objects) • I made [her] (into) [undifferentiated waterfowl] • Action-transitive (verb has a direct object and another verb) • I caused [her] [to move her body] HLT
Ambiguity is Pervasive • Phonetics! • I mate or duck • I’m eight or duck • Eye maid; her duck • Aye mate, her duck • I maid her duck • I’m aid her duck • I mate her duck • I’m ate her duck • I’m ate or duck • I mate or duck HLT
Dealing with Ambiguity • Four possible approaches: • Tightly coupled interaction among processing levels; knowledge from other levels can help decide among choices at ambiguous levels. • Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures. HLT
Dealing with Ambiguity • Probabilistic approaches based on making the most likely choices • Don’t do anything, maybe it won’t matter • We’ll leave when the duck is ready to eat. • The duck is ready to eat now. • Does the “duck” ambiguity matter with respect to whether we can leave? HLT
Ways of Studying NLP • By ApplicationMT, IE, IR etc. • By Approachrational vs. empirical • By Linguistic Levelmorphology, syntax etc. • By Algorithm HLT
Algorithms • State Machines • automata and transducers • Rule Systems • regular and context free grammars • Search • top-down/bottom-up parsing • Probabilistic algorithms HLT
Organisation of Course • Module 1: Words • Linguistics: Morphological Structure • Morphological Processing • LAB + Assignment I • Module 2: Sentences • Linguistics: Syntactic Structure • NL Parsing Algorithms • LAB + Assignment II • Module 3: Texts • Statistics • Text Classification • LAB + Assignment III HLT
Course Information • Course Websitehttp://staff.um.edu.mt/mros1/hlt • Reference Texts • D. Jurafsky and J. Martin, Speech and Language Processing, 2nd Edition, Prentice-Hall • S. Bird, E. Klein and E. Loper, Natural Language Processing with Pythonhttp://www.nltk.org • Thank you HLT