610 likes | 623 Views
Natural Language Processing (1) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011 zhaohai@cs.sjtu.edu.cn. Outline. Course Goals Course Schedule Course Requirements Overview. Course Goals.
E N D
Natural Language Processing(1)Zhao Hai 赵海Department of Computer Science and EngineeringShanghai Jiao Tong University2010-2011zhaohai@cs.sjtu.edu.cn
Outline • Course Goals • Course Schedule • Course Requirements • Overview
Course Goals • Introduction tothe know-how of NLP, especially NLU, including research highlights, crucial technologies and application achievements; • Providing a chance to train students for reading and evaluating new academic papers from an important international conference in related areas, such as ACL conference; • Encouraging students to present and discuss their comments for the papers. • Accomplish a practical NLP system through a course project.
Course Schedule (1) • Overview (2 lhs= 2 lecture hours) 1.1 Natural Language Understanding (NLU) 1.2 Different Levels of Language Analysis 1.3 Applied Approaches in NLU Systems 1.4 Applications of NLU
Course Schedule (2) • Lexicons and Lexical Analysis (11 lhs) 2.1 Lexicon: A Language Resource 2.2 A Lexicon for English Words: WordNet 2.3 Generative Lexicon 2.4 Finite State Models and Morphological Analysis 2.5 Collocations 2.6 Statistical Inference: n-gram Models over Sparse Data
Course Schedule (3) • Syntactic Processing (14 lhs) 3.1 Basic English Syntax 3.2 Grammars and Parsing 3.3 Features and Augmented Grammars 3.4 Grammars for Natural Language 3.5 Toward Efficient Parsing 3.6 Ambiguity Resolution: Statistical Methods
Course Schedule (4) • Semantic Interpretation (6 lhs) 4.1 Semantics and Logical Form 4.2 Linking Syntax and Semantics 4.3 Ambiguity Resolution 4.4 Other Strategies for Semantic Interpretation
Course Schedule (5) • Machine Learning Approaches for Natural language processing (6 lhs) 5.1 Main machine learning approaches Maximum entropy K-nearest neighbor Support vector machine Structure learning 5.2 A Case Study: train a Part-of-speech tagger from labeled corpus
Course Schedule (6) • Course Discussion(1 lh) 6.1 Discussion for given Course Content 6.2 How to Prepare for the Paper Reading 6.3 Other Related Issues
Course Schedule (7) • Students Workshop(2 lh) 7.1 ACL/EMNLP Paper Reading Groups 7.2 Summary and Comment Writing 7.3 Presentation and Discussion
Course Schedule (8) Curriculum Schedule Time: The 3rd and 4th classes, Monday morning, The 1st and 2nd classes, Wednesday morning, The 1st-8th week; Location:
Course Requirements (1) • Final Grade • Attendance and Assignments 30% • ACL/EMNLP Paper Summary, Comment and Presentation 30% • Course project 40%
Course Requirements (2) • Texts and References • James Allen. Natural Language Understanding(The Second Ver.).The Benjamin / Cummings Publishing Company, Inc., 1995. • Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press. Springer-Verlag, 1999.
Course Requirements (3) • ACL Anthology • http://www.aclweb.org/anthology-new/ • Other Related References.
Course Requirements (4) • FTP Site and Contact Email • Old version: ftp://ftp.cs.sjtu.edu.cn/yao-tf/nlu/ • The latest: http://bcmi.sjtu.edu.cn/~zhaohai/lessons/nlp2011/index.html zhaohai@cs.sjtu.edu.cn
Overview (1) Natural Language Understanding (1) What is Natural Language? • It means human language. The most common way that people communicate is by speaking or writing in one of the natural language such as English, Chinese, German, or French. • There are two forms of natural language: written and spoken forms.
Overview (2) Natural Language Understanding (2) NLP & NLU (1) • NLP (Natural Langauge Processing) sums up all methods covering the pure processing of language by means of algorithmic, statistic,heuristic etc. means. • NLU (Natural Langauge Understanding) indicates the real understanding of a text that is formulated in some natural languages. • Semantic or pragmatic issue?
Overview (3) Natural Language Understanding (3) NLP & NLU (2) • Information Retrieval (IR) NLP • Information Extraction (IE) NLP with NLU (Shallow Parsing) • Summarization NLP with NLU (Shallow and Deep Parsing) • Question Answering (QA) NLP with NLU (Shallow or Deep Parsing) • Machine Translation (MT) NLP with NLU (Deep Parsing) • Natural Language Generation (NLG) NLP • …
Overview (4) Natural Language Understanding (4) Why is NLU a Difficult Task? (1) • Complexity of the target representation into which the matching is being done In fact, the procedure of understanding natural language is to transform it from one representation into another. Extracting meaningful information of source representation often requires the use of additional knowledge.
Overview (5) Natural Language Understanding (5) Why is NLU a Difficult Task? (2) • Type of mapping There are one-to-one, many-to-one, one-to-many, or many-to-many mappings. One-to-many mappings require a great deal of domain knowledge beyond the input to make the correct choice among target representations. For example (one-to-many): a) a tall giraffe vs. b) a tall poodle (a small dog with thick curling hair has proud bearing)
Overview (6) Natural Language Understanding (6) Why is NLU a Difficult Task? (3) • Level of interaction of the components of the source representation In many natural language sentences, changing a single word can alter the interpretation of the entire structure. As the number of interactions increases, so does the complexity of the mapping.
Overview (7) Natural Language Understanding (7) Why is NLU a Difficult Task? (4) • Modifier attachment problem The sentence Give me all the employees in a division making more than $50,000 doesn't make it clear whether the speaker wants all employees making more than $50,000, or only those in divisions making more than $50,000.
Overview (8) Natural Language Understanding (8) Why is NLU a Difficult Task? (5) • Quantifier scoping problem In logic, some words such as “the”, “each”, or “what” that express “universal” () or “existential” (). They can have several readings. • Elliptical utterances The interpretation of a query may depend on previous queries and their interpretations. E.g., asking Who is the manager of the automobile division and then saying, of aircraft?
Overview (9) Natural Language Understanding (9) Computational Linguistics (1) Research in Computational Linguistics, the use of computers in the study of languages, started soon after computers became available in the 1940’s. This discipline, along with AI discipline and so on, promoted the progress of NLU.
Overview (10) Natural Language Understanding (10) Computational Linguistics (2) Computational Linguistics Engineering Science Bioscience Psychology Computer Science Cognitive Science AI Philosophy Linguistics
Overview (11) Natural Language Understanding (11) Symbolic Processing In the procedure of NLU, we mainly use machine to manipulate different symbols. e.g. it was readily used on written text to compile: • wordindexes (lists of word occurrences) and • concordance (indexes including a line of context for each occurrence).
Overview (12) Natural Language Understanding (12) Machine Translation (1) • In 1949, Warren Weaver proposed that computers might be useful for “the solution of world-wide translation problems”. • However, even after more than 50 years of effort, current systems still produce output of limited quality, which is suitable for assimilation of foreign-language documents, but not for the production of publishable material.
Overview (13) Natural Language Understanding (13) Machine Translation (2) • By practice, the researchers have realized that human language translation is a complex cognitive ability involving knowledge of different kinds: • the structure of sentences; • the meaning of words; • a model of the listener (user model); • the rules of conversation (dialogue translation); • an extensive shared body of general information about the world.
Overview (14) Natural Language Understanding (14) Machine Translation (3) • Some forms of translation for information access is already today available in the web at no cost. e.g. • http://babelfish.altavista.com/tr • http://translate.google.com/?hl=zh-CN&tab=wT#auto|en| • The increasing demand for these services will give a push to improve their quality; • The translation providers will find ways to increase vocabularies and translation quality semi-automatically from terminological resources, bilingual corpora and similar sources.
Overview (15) Natural Language Understanding (15) Machine Translation (4) • Clearly, any systematic collection of lexical and terminological information in the form of domain-specific ontologies will help to build better MT systems for these domains. • Conversely, the construction of ontologies can be facilitated by automatic alignment of existing translations, as this will naturally lead to a clustering of the vocabulary along the relevant semantic distinctions.
Overview (16) Natural Language Understanding (16) Investigation Goals AI researchers in natural language processing expected their work to lead both to: • the development of practical, useful language understanding systems and • a better understanding of language and the nature of intelligence.
Overview (17) Different Levels of Language Analysis (1) Six Analysis Levels for Written Texts • Morphological Analysis (Lexical Analysis) • Syntactic Analysis (Deep & Shallow Parsing) • Semantic Analysis • Pragmatic Analysis • Discourse Analysis (Text Analysis) • World Knowledge Analysis (is it possible?)
Overview (18) Different Levels of Language Analysis (2) Morphological Analysis (1) • It is the identification of a word-stem from a full word-form (and sometimes also the identification of the syntactic category of the stem). • For example, the word friendly is combined by the noun (stem) friend and the suffix -ly, which transforms a noun into an adjective.
Overview (19) Different Levels of Language Analysis (3) Morphological Analysis (2) • Most systems that analyze natural language text typically start by segmenting the text into meaningful tokens. • In general, this procedure includes tokenization(segmentation), normalization (stemming), POS (part-of-speech) tagging, named entity / phrase identification.
Overview (20) Different Levels of Language Analysis (4) Syntactic Analysis (1) • Its goal is to break down given textual units, e.g. sentences, into smaller constituents, to assign categorical labels to them, and to identify the grammatical relations that hold between the various parts. • In most parsers, the grammar is separated from the processing components. The grammar consists of a lexicon, and rules that syntactically and semantically combine words and phrases into larger phrases and sentences.
Overview (21) Different Levels of Language Analysis (5) Syntactic Analysis (2) • The output of a shallow parser is less complete than that from a deep parser, that is, it is not a phrase-structure tree. • A shallow parser may identify some phrasal constituents, such as noun phrase, without indicating their internal structure and their function in the sentence. • It has the advantages of efficiency and robustness.
Overview (22) Different Levels of Language Analysis (6) Syntactic Analysis (3) • The challenges will be how to find syntactic parsers that are at the same time fast, robust, deliver a detailed analysis that is correct with high probability and that are easily to adapt to special domains. • One of the current research emphases is to integrate shallow syntactic parsers with deeper syntactic approaches.
Overview (23) Different Levels of Language Analysis (7) Semantic Analysis (1) • The goal of semantic analysis is to assign meanings to utterances whose meaning is complete, containing word meaning and combination of word meaning, which is a context-independent meaning.
Overview (24) Different Levels of Language Analysis (8) Semantic Analysis (2) • The task of semantic analysis can be divided into several subtasks, depending on the linguistic level where it takes place. • The most important subtasks are the semantic tagging of ambiguous words and phrases, and the resolution of referring expressions.
Overview (25) Different Levels of Language Analysis (9) Pragmatic Analysis • It depicts the relationships between the symbols of texts (talks) and the producers / users. • Note that here those present writers / readers and speakers / hearers. • In other words, the context of situation has significant impact for the interpretation of a discourse.
Overview (26) Different Levels of Language Analysis (10) Discourse Analysis • Extracting the knowledge contained in texts requires more than the resolution of local semantic ambiguities. • Discourse analysis needs to consider the global argumentative structure of texts. In addition, it also analyzes the relationships between sentences in a text. • This analysis is especially important for pronoun and temporal constituents.
Overview (27) Different Levels of Language Analysis (11) World Knowledge Analysis • It analyzes and infers the general world knowledge that each language users must have, e.g. other user’s beliefs and goals in a conversation.
Overview (28) Different Levels of Language Analysis (12) Examples Consider each example below as a candidate for the initial sentence of the book concerning natural language processing: • Language is one of the fundamental aspects of human behavior and is a crucial component of our lives. • Green frogs have large noses. • Green ideas have large noses. • Large have green ideas nose.
Overview (29) Applied Approaches in NLU Systems (1) Historical Categories Borrowed from Winograd (1972), groups NLU approaches according to how they represent and use knowledge of their subject matter. On this basis, they can be divided into four historical categories.
Overview (30) Applied Approaches in NLU Systems (2) Historical Categories • The earliest approach with limited results in specific, constrained domains (BASEBALL, SAD-SAM, STUDENT and ELIZA); • Text-based approach (PROTOSYNTHEX-I and Semantic Memory); • Limited logic-based approach (SIR, TLC, DEACON and CONVERSE); • Knowledge-based approach (LUNAR, SHRDLU, MARGIE, SAM and LIFER).
Overview (31) Applied Approaches in NLU Systems (3) BASEBALL [Bert Green, 1963] An information retrieval program with a large database of facts about all American League games over a given year. It accepted input questions from the user, limited to one clause with no logical connectives.
Overview (32) Applied Approaches in NLU Systems (4) SAD-SAM [Lindsay, 1963] • Syntactic Appraiser and Diagrammer -- Semantic Analyzing Machine. Programmed by Robert Lindsay in 1963 at CMU. • It uses an basic English vocabulary (1,700 words) and follows a context-free grammar. • It parses input from left to right, builds derivation trees, and passes them to SAM, which extracts the semantically relevant information to build family trees and find answers to questions.
Overview (33) Applied Approaches in NLU Systems (5) ELIZA [Weizenbaum, 1966] • It was built at MIT in 1966 and was the most famous pattern-matching natural language system. The system assumes the role of a Rogerian, or “nondirective”, therapist in its dialog with the user. • It operated by matching the left sides of its rules against the user’s last sentence, and using the appropriate right side to generate a response. Rules were indexed by keywords so only a few had to be matched against a particular sentence. Some rules had no left side, so they could apply anywhere.
Overview (34) Applied Approaches in NLU Systems (6) ELIZA: Sample Data Word Rank Pattern Outputs alike 10 ?X In what way? What resemblance do you see? are 3 ?X are you ?Y Would you prefer it if I weren’t ?Y? 3 ?X are ?Y What if they were not ?Y? always 5 ?X Can you think of a specific example? When? Really, always? what 2 ?X Why do you ask? Does that interest you?
Overview (35) Applied Approaches in NLU Systems (7) ELIZA: A Dialogue User: ELIZA: Men are all alike. In what way? They’realways bugging us about something or other. Can you think of a specific example? Well, my boyfriend made me come here. Your boyfriend made you come here. He says I’m depressed much of the time. I am sorry to hear you are depressed. It’s true, I am unhappy. Do you think coming here will help you not to be unhappy? I need some help; that much seems certain. What would you mean to you if you got some help? Perhaps I could learn to get along with my mother. Tell me more about your family. My mother takes care of me. Who else in your family takes care of you? My father. Your father. You are like my father in some days. What resemblance do you see?