400 likes | 808 Views
Natural Language Processing. A language is defined as a set of strings without reference to any world being described or task to be performed. By studying the language knowledge about the world is acquired. Acquisition can be in the form of :
E N D
Natural Language Processing A language is defined as a set of strings without reference to any world being described or task to be performed. By studying the language knowledge about the world is acquired. Acquisition can be in the form of : written text , speech/voice, images /patterns etc….. Natural language means a native langauge like Hindi , English, French ,Urdu etc… For a NLP m/c requirement is how to : “Generate , Understand and Translate” By: AnujKhanna(Asst. Prof.) www.uptunotes.com
State of The Art • NLP includes both understanding & generation . • This is a subfield of AI and Linguistics deals with “problems of automated Generation and Understanding of language” • Conversion of computer database info into normal sounding human language. Samples of human language are converted to more formal representations that are easier for computer programs to manipulate. • “NLU is a AI complete problem” “Definition of understanding is a major problem in NLP system” Understanding something is to transform one representation into another By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Entire NLP problem cn be sub-divided as: (i) Processing of written text using Lexical , Syntactic & Semantic Knowledge of language as well as real world information. (ii) Processing spoken language, using all info. Needed plus additional knowledge of “Phonolgy & Ambiguity Resolving ” Idea is to control a m/c by talking them in our native language a interactive manner. This requires firstly to find the underlying task and goal. “Natural language is ambiguous so it leads to difficulty in processing at various levels of Knowledge Domain” Till date human linguistics communication in speech form are used majorly as compared to written text. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
NLP methodology and the concerned problem domain have attracted the researchers & educationalist from different areas and discipline of knowledge such : • Classical & Computational Linguistics • Computer Sc. & Engg. • Psycholinguistics • Statistics “Open domain Question & answers are required. Multi document summarization and info. Interaction are required in a wide variety of languages”. Current Problems are : • Ambiguity at written as well as speech level. • Discourse Analysis. • Generation of various degrees of complexities in a Intelligent System. • Knowledge acquisition methods to incorporate data in World Net, Lexicon Methodology , KB system for Multi-Lingual text classification and Hyperlinking. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Why is NLU task difficult ? • Natural language constructs are made up of an infinite no. of sentences. So Much ambiguity in Natural Language Constructs. Levels of Ambiguity • Syntactic ambiguity: Syntax relates to the structure of language , how the word are put together? “Can be more than one correct interpretations for a same sentence”. • E.g : “I hit the man with the hammer”. Was the hammer the weapon used or was it in the hand of the victim? • E.g : Back can be : an adverb (go back) , an adjective (back door), a noun (the back of room) or a verb (back up your files) By: AnujKhanna(Asst. Prof.) www.uptunotes.com
2.Lexical ambiguity: Ambiguity in lexemes i.e. words having more than one meanings. eg: I went to the bank. Now whether Bank is finance org. or river bank…… 3. Referential ambiguity: Concerned with what the sentence refers to ? It my refer to more than one thing. E. g: “Ram killed Ravana because he liked Sita”. Who liked Sita, (Ram or Ravana) ? 4. Semantic level ambiguity: Ambiguity in meaning associated with a single sentence. • E.g: He saw her duck. Whether he dip down or saw a web footed bird. • Semantic ambiguity can also occur if no lexical /syntactic ambiguity E.g : A sentence “cat person” can be someone who likes felines…. or it may be the lead of movie ” Attack of the cat people”. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
5. Pragmatic ambiguity: Level of interpretation within its context i.e. a same word /phrase may be interpreted differently in two distinct contexts/situations. E.g: “I went to the doctor yesterday “. Here yesterday depends on the context , when the sentence was spoken . Example: (i) I waited for a long time at the bank. (ii) There is a drought because it hasn’t rained for a long time. (iii) Dinosaurs have been extinct for a long time. “In above three sentences phrase a long time refers to different time intervals depending on their context”. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Levels of Knowledge used in NLU • Phonological Knowledge: “Phoneme is the smallest unit of sound and relates to the sound of word”. This may lead to phonetic ambiguity in speech recognition system due to different accent used by different people from different parts/region. Syntactic Knowledge:How words are arranged together to form a coherent , grammatically correct sentence. Semantic Knowledge:Relates to the meaning of the word/phrases & how they combine to form a meaningful sentence. Morphological Knowledge: Word construction fromMorphemes. Pragmatic Knowledge: Relates to the use of sentences in different contexts & how contexts affects meaning of sentence. Word Knowledge: Language of the user to carry out conversation. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Computational Model of Language Processing ** Naom Chomsky developed the theory of language processing. ** Designed Chomsky Classification/Chomsky Grammar • Syntactic Analysis • Semantic Analysis • Pragmatic Analysis • Morphological analysis • Discourse Integration ** Discourse is any string of language ususally one that is more than one sentence long. Eg: text books , novels, Web page , weather reports etc…. • Meaning of a sentence may depend on preceding as well as up coming words & phrases. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
E.g: “ Ram wanted it ”. ** In this sentenceit depends on the prior dicourse, like a CAR which Ram wants to purchase. ** Where as in “he purchased the car”, a next coming sentence , he is influenced by Ram in the previous sentence. Note: • This type of interpretation is of a PRONOUN/DEFINITE NOUN PHRASE which refers to the world object/entity/Agent. • Choosing the best referent is a process of disambiguation, depending on combining variation in Syntactic , Semantic & Pragmatic info. • Pronouns must agree in gender and number with their antecedents : he can refer to Bobby not Arisha. they can refer to a group , not a single person By: AnujKhanna(Asst. Prof.) www.uptunotes.com
An Example Sentence • Arisha dropped the cup on the plate. • Above sentence pose a problem that “Not clear whether cup /plate is referent ofit (ambiguity at referential level). Now consider a larger context: Arisha was fond of the blue cup. The cup was presented to her by her mother. Unfortunately, one day while washing utensils, Arisha dropped the cup on the plate. It broke. Here cup is the focus of attention and hence is the referent (Ambiguity resolved) By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Syntactic processing & Formal Grammars • Parsing /Syntax analysis Two components (i) declarative representation, called grammar, of syntactic facts about the language. (ii) A procedure called a parser , that compares the grammar against i/p sentences to produce parsed structure. Formal Language: “ Infinite set of strings”. Each string is concatenation of terminal symbols, also called words. e.g: Java, First order predicate logic, C, C++ etc. These languages have strict mathematical definitions as compared to natural language like Hindi , English. Formal Grammar: G= { V, T , S , P } • V is the set of variables or non-terminals .Usually written in Upper Case • T is the finite set of terminals or lexemes or tokens, (Lower Case) • S is the start symbol of grammar rules. • P is the set of productions of the form By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Key Points for Natural Language Grammar (e.g: English) • Most grammar rule formalisms are based on the idea of phrase structure i.e. strings are composed of sub strings called phrases Example : Noun Phrase (NP) , Verb phrase (VP) , Prepositional Phrase (PP) , Adverb Phrase (ADVP)…… Here NP, VP , PP , ADVP are all Non terminals/variables of formal grammar for a English sentence. • Other non –terminals can be Noun (N) , Verb (V) , Preposition (P) , Articles (ART) , Determiners (DET like a , an , the ). ART and DET can be used interchangeably. • Terminals/Lexemes/Tokens can be words like: a , an , the , Ram , Joseph, run , upon , into , put ,good , long , very , fast, etc………infinitely By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Example: “Joseph ate the chicken” Grammar rules of G: • S → NP VP • NP → ART N • PP → PREP NP • VP → V | NP | V NP PP | V PP • N → Ram | Joseph | tree | tea | road | chicken • V → ate | walk | drink | sit • AUXV → is | am | are | was | were • PREP → with | under| into | on • ART → a | an | the V= { S, NP , VP , PP , PREP , ART , N , V , AUXV }, set of non terminals T = { Joseph , ate , the chicken } S is start symbol of grammar G. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Top down Parsing S → NP VP → N VP → Joseph VP → Joseph V NP → Joseph ate ART N → Joseph ate the N → Joseph ate the chicken Bottom up Parsing → Joseph ate the chicken → N ate the chicken → N V the chicken → N V ART chicken → N V ART N → NP V NP → NP VP → S Top down & Bottom up parsing Parsing Techniques By: AnujKhanna(Asst. Prof.) www.uptunotes.com
O/P representation structure Parser I/P String LEXICON • to find the meaning of a word , parser access to lexicon. • While selecting a word from i/p stream parser locates the word in lexicon • Extracts possible meanings , attributes , syntax , semantics of that word. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
“ Lexicon is the dictionary words (like morphemes, tokens , lexemes, phonemes) containing syntactic , semantic , pragmatic knowledge “ Organization & enteries of lexicons vary from one implementation to another. Usually made up of variable length data structures such as lists, dynamic arrays, arranged in alphabetical order Depending upon usage frequency of words (e.g : a , an , the , to , by ,of , from etc…) lists can be initialized with these words to minimize the search time for locating lexemes. Access of words can be facilitated by : Indexing Binary search Hashing By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Knowledge Based System Approaches in NLP 1. SHRDLU • System developed by Winograd at MIT in 1970’s • Controls a robot in a restricted “Blocks ” domain. • No. of blocks of various shapes , size , colors, textures. • Robot can manipulate the blocks world as per instructions given in natural language. Example: Instructions can be • Find a block which is taller than the one you are holding & place it in the box. Refer. Ambiguity. It refers to what?) • How many blocks are on the top of the green block? (Semantic ambiguity) 3. Put the red pyramid on the block in the box. (Syntactic Ambiguity, either block is in the box or red pyramid) By: AnujKhanna(Asst. Prof.) www.uptunotes.com
2. Information matching & Extraction • Knowledge based system extraction/machine learning methods are deployed for rapid prototyping techniques and incorporating data acquisition. • Set of events , objects & their attributes built a Word Model. • Supports inheritance and transforms word model to Discourse model specific to a particular text. 3. Machine Translation • Began in 1950s….Norbert Weiner translated Russian script to English • IBM also worked on this…. • IBM introduced statistical approach to language & parameter estimation in m/c translation through Mathematical Models…… E.g: Hidden Markov Model (HMM), Boolean keyword model , probabilistic model based on Bayesian Classification By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Machine Translation Approaches Rule Based Transl. Corpus based transl. Direct m/c translation knowledge Based Transl. Interlingua Based m/c Translation Transfer based m/c Translation. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Direct Machine Translation • This carries out word by word translation with the help of a bilingual dictionary, usually followed by some syntactic arrangement • Monolithic Approach is followed i.e “Consider all the details of one language pair”. • Little analysis of source text required , no parsing. Lexical transfer using Bilingual dictionary Source text Morphological Text Target language text Local reordering By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Corpus based m/c translation(CBMT) • Also called data driven translation • Overcomes the problem of knowledge acquisition in Rule Based m/c Translation (RBMT). • Uses bilingual parallel corpus to obtain knowledge for new incoming translation. • Fully automated , less human intervention as in RBMT Statistical Machine Translation (SMT) • Uses bilingual corpus to learn translation models • Uses monolingual corpus to learn the grammar of the target language. • SMT models are trained on a sentence aligned translation corpus which is based on : 1.) n- gram modeling and 2.) probability distribution of some target language pair in a very large corpus. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Maximize Probabilities From Models Transl. model Bilingual Corpus P(S/T) Tranl. Result Language Model Monolingual Corpus P(T) T is target language, S is source language, Translation Probability P(S/T) , P(T) is target language probability. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Advantages of SMT • No knowledge of linguistics required, so saves cost and time in knowledge acquisition from the Domain Experts 2). Expertise transfer is minimize. 3). Fast and less costly as compared to DMT. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Input ES Intelligent Computing Model from English to Sanskrit m/c Translation tokenizer POS target module Adverb Conversation table module GNP detection module By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Tense & sentence detection module From GNP module Sanskrit rule detection ANN based system Roop , Dhaatu detection Noun & object detection Dhaatu form generation Word form generation By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Adverb conversation From word form From dhaatu form Output Sanskrit Source Concatenation of Kartaa , adjective , karma , adverb , verb • GNP module detects the gender , number & person of Noun in the English sentence • Noun & object detection module gives nouns for Sanskrit of equivalent English noun. • RoopDhaatu module gives verbs for Sanskrit of equivalent verbs. • ANN is a feed forward n/w , performs: Encoding of user data vector(UDV) , I/O generation of UDV & finally Decoding of UDV. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Computer Vision What is Computer Vision? • “Computing properties of the 3D world from one or more digital images” • Sockman and Shapiro:To make useful decisions about real physical objects and scenes based on sensed images • Ballard and Brown:The construction of explicit, meaningful description of physical objects from images • Forsyth and Ponce: Extracting descriptions of the world from pictures or sequences of pictures By: AnujKhanna(Asst. Prof.) www.uptunotes.com
What is in this image? 1. A hand holding a man? 2. A hand holding a mirrored sphere? 3. An Escher drawing? Interpretations are ambiguous The forward problem (graphics) is well-posed By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Changing viewpoint Moving light source Deforming Shape What do you see? By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Changing viewpoint Moving light source Deforming shape What was happening? By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Images and movies are everywhere Fast-growing collection of useful applications building representations of the 3D world from pictures automated surveillance (who’s doing what) movie post-processing face recognition Various deep and attractive scientific mysteries How does object recognition work? Beautiful marriage of math, biology, physics, engineering Greater understanding of human vision Why study Computer Vision? By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Some Objectives Segmentation • Breaking images and video into meaningful pieces • Reconstructing the 3D world – from multiple views – from shading – from structural models Recognition • What are the objects in a scene? • What is happening in a video? • Control • Obstacle avoidance • Robots, machines, etc. By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Football Movies Surveillance HCI – hand gestures, American Sign Language Face recognition & Biometrics Road monitoring Industrial inspection Robotic control Autonomous driving Space: planetary exploration, docking Medicine – pathology, surgery, diagnosis Microscopy Military Remote Sensing Applications: Touching your life By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Image Interpretation - Cues • Variation in appearance in multiple views – stereo – motion • Shading & highlights • Shadows • Contours • Texture • Blur • Geometric constraints • Prior knowledge By: AnujKhanna(Asst. Prof.) www.uptunotes.com
ILLumination Variability “The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity.” By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Early Vision in One Image • Representing small patches of image – For three reasons • We wish to establish correspondence between (say) points in different images, so we need to describe the neighborhood of the points • Sharp changes are important in practice --- known as “edges”. • Representing texture by giving some statistics of the different kinds of small patch present in the texture. E.g : “Tigers have lots of bars, few spots while Leopards are the other way” By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Segmentation • Which image components “belong together”? • Belong together=lie on the same object • Cues – similar color – similar texture – not separated by contour – form a suggestive shape when assembled By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Boundary Detection: Local cues By: AnujKhanna(Asst. Prof.) www.uptunotes.com
Boundary Detection Finding the Corpus Callosum By: AnujKhanna(Asst. Prof.) www.uptunotes.com