300 likes | 309 Views
Explore linguistic knowledge bases, approaches to NLU, and proposals for effective text or speech processing with different representations.
E N D
Knowledge Representation for Natural Language Understanding Chengqing ZONG Institute of Automation, Chinese Academy of Sciences cqzong@nlpr.ia.ac.cn
Outline • CASIA and NLPR • Introduction • Some Linguistic Knowledge Bases • Approaches to NLU • Proposal
CASIA Institute of Automation (IA), Chinese Academy of Sciences (CAS) Founded in 1956
Personnel • Faculty members: 320, including 38 full time professors • Post-doc research fellows: 30 • Students (Ph.D. and MSc): 600 • Visiting researchers: 40+
NLPR National Laboratory of Pattern Recognition • Staff: 29 • Ph.D. candidates: 140 • MSc: 120 • Post-Doc.: 7
Directors Academic Committee Management Committee General Office Pattern Recognition and its Cognitive Mechanisms Group Biometric Information Processing Group Visual Information Processing Group NLPR Speech and Language Technology Group
K.B. 1. Introduction • Natural language understanding is a typical task of knowledge processing Text or speech Processor Text or speech
Title Time 1. Introduction • For the different tasks or different approaches, the different representations are necessitated. e.g., for document summarization or information extraction, the knowledge for discourse analyzing and topic understanding is necessary.
Rule-based MT: I saw [a man with a telescope]. I [saw a man] with a telescope. NP Det NN NP NP PP NP …… 我用望远镜看见一个男孩。 我看见一个带望远镜的男孩。 • Statistical MT: 1. Introduction For machine translation (MT), the knowledge for sentence analyzing and translating is necessary. e.g., I saw a man with a telescope.
1. Introduction Questions: • How is about the current linguistic K. B. ? • Is an algorithm designed according to the K. B. or the representation designed for an algorithm?
? 2. Some Linguistic K. B. 2.1 WordNet (http://wordnet.princeton.edu) • Three basic Preconditions: • Separability hypothesis • Patterning hypothesis • Comprehensiveness hypothesis • Take synset as the building block • Relationships: synonymy / antonymy / hypernymy / hyponymy / meronymy / entailment
2. Some Linguistic K. B. 2.2 HowNet (http://www.keenage.com) • Knowledge, specifically, the form of knowledge that is computer-operable, is a system encompassing the varied relations amongst concepts as well as those amongst the attributes of concepts. As one acquires more concepts, or rather, captures more relations amongst concepts alongside the links between the attributes attached to the concepts, one simply becomes more knowledgeable; • On the creation of a knowledge base, a common-sense knowledge base constituting a knowledge system should first be constructed. This database shall describe general concepts and map out the relations among them.
2. Some Linguistic K. B. • Some concepts and relationships are defined.
IP NP-SBJ 。PU VP VP 他PN VP NP-OBJ ADVP 还AD 提出VV QP 一CD NP 和CC 系列M NP NP CLP 具体JJ 措施NN 策略NN 要点NN 2. Some Linguistic K. B. 2.3 UPenn TreeBank http://www.cis.upenn.edu/~treebank/home.html
2. Some Linguistic K. B. 2.4 FrameNet and Others • FrameNet(frame semantics) http://framenet.icsi.berkeley.edu • PropBank、NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html
2. Some Linguistic K. B. Summary: • All the presentations motioned above are human-made and human-defined; • The different K. B. is built at different level and based on the different grain, such as at lexical level and tagging lexicons, or at sentence level and annotating the syntactic structure, and so on;
2. Some Linguistic K. B. • Generally, the K. B. are developed for all-purposes and single linguistic knowledge is expressed in a specific K. B.; • However, are the representations sufficient or even complete for a natural language processing system?
3. Approaches to NLU Three methods: • Rationalistic • Empirical • Rationalistic + Empirical
Inter-lingual Logical-Form Semantic-Tree Syntactic-Tree Chunk Phrase Word SL TL 3. Approaches to NLU Take MT as an example • Word-to-Word • Phrase-to-Phrase • Chunk-to-Chunk • Chunk-to-String • Tree-to-Tree (Learned, Syntactic or Semantic) • Tree-to-String • Logical-Form-to-Logical-Form p(t|s) vs. p(s|t)×p(t)
Performance Years 3. Approaches to NLU Rule base Dictionary + Machine Learning Corpus base More data is better data.
3. Approaches to NLU So many hard nuts are still remained to crack: • Word sense disambiguation • Syntactic disambiguation • Semantic analysis and translating • Automatic evaluation of translation … …
Increasing Number of Chinese Webpages The data are from the Information Center of China Internet 3. Approaches to NLU • The number of webpages is exponentially increased • The highest accuracy of Chinese information retrieval (webpage search) in 2006 was only about 36.7% (from 863 report)
3. Approaches to NLU What is the problem?
3. Approaches to NLU “One should build the rocket, instead of climbing the tree, if he wants to reach the moon”, Martin Kay • Is it building the rocket or climbing the tree? • Does it currently take the right way to build the rocket?
Input:Speech Text Affective Computing + + Semantic Computing Perception Vision K. B. Output 3. Approaches to NLU • How does a human brain work when it translates a sentence? Dynamic Static
3. Approaches to NLU _ A man can infer the unknown word sense or sentence structure etc. from his common sense (limited knowledge), but a system can not; _ A man can dynamically and syntheticallyuse multiple knowledge sources (lexical/ syntactic/ semantic/ pragmatic) to process a specific language phenomenon. It is easy to determine what knowledge is necessary and what knowledge is unnecessary, but a system usually can not;
3. Approaches to NLU _ A man can easily get the new knowledge and renew his memory, but a system is usually difficult to do. However, a computer can memorize a number of words and phrases, do the very fast computing, and so on, but a man can not. Currently, the models for NLU mainly use the capability of computing, but rarely or hardly simulate the human’s cognitive process.
4. Proposal • For a specific task of NLU, such as word sense disambiguation, syntactic parsing, or translating etc., we need to model the cognitive process of human brain; • According to the models, to build the task-oriented knowledge base.
4. Proposal e.g., for the speech-to-speech (S2S) translation in a specific domain, the following aspects are addressed: • Investigate the effect of rhythm, tone, and accent; • Model translation in combination with language model, speech model, and common sense model etc.; • Build the knowledge base describing the language, semantic, speech, emotion, and domain-related common sense as well, which are all oriented to the S2S translation and based on the needs of translation model.
thanks 谢谢 !