590 likes | 855 Views
Interlingua Methodology. Directly obtain the meaning of the source sentence. Do target sentence generation from the meaning representation. John gave the book to Mary. Meaning representation: give-action: agent: john object: the book receiver: mary.
E N D
Interlingua Methodology Directly obtain the meaning of the source sentence. Do target sentence generation from the meaning representation. John gave the book to Mary. Meaning representation: give-action: agent: john object: the book receiver: mary
Competing approaches Direct Transfer based
Direct approach • Word replacements I like mangoes maOM AcCa laga Aama I like (root) mangoes • Morphology maOM AcCa lagata Aama I like mangoes • Syntactic re-arrangement maOM Aama AcCa lagata hO I mangoes like • Semantic embellishment mauJao Aama AcCa lagata hO I (dative) mangoes like
Transfer Based Source sentence processed for parsing, chunking etc. S VP NP V NP I like mangoes
Transfer Based Transfer structures obtained for the target sentence. S VP NP NP V I mangoes like
Transfer Based Morphology and language specific modifications S VP NP NP V mauJao AcCa lagataa hO Aama
Interlingua Relation Between the Transfer and the Interlingua Models Source language Parse tree Target Language Parse tree Interpretation generation transfer Parsing generation Target language words source language words
State of Affairs • Systran reports 19 different language pairs. • 8 alright for intended use. • Even fewer are capable of quality written or spoken text translation.
ENGLISH-SPANISH-ENGLISH • ...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province • ... en ese imperio, el arte de la cartografía logró tal perfección que el mapa de una sola provincia ocupó la totalidad de una ciudad, y el mapa del imperio, la totalidad de una provincia • ... in that empire, the art of the cartography obtained such perfection that the map of a single province occupied the totality of a city, and the map of the empire, the totality of a province Provided by Systran on 19/11/02
ENGLISH-KOREAN-ENGLISH • ...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province • 저 제국안에, 단순한 지방의 지도가 도시의 완전을 점유했다 고 Cartography의 예술은 같은 얀벽,및 제국, 지방의 완전의 지도 를 달성했다 • Inside that empire, the map of the region where it is simple occupied the perfection of the city the art of the Cartography is same, yan it attained the map of of perfection of the wall and empire and region Provided by Systran on 19/11/02
UNL Based MT: the scenario ENGLISH RUSSIAN ENCONVERSION UNL DECONVERSION HINDI FRENCH
Universal Networking Language • Common language for computers to express information written in natural language • (Uchida et. al. 2000) • Application: • Electronic language to overcome language barrier • Information Distribution System
UNL Example arrange agt obj plc residence meeting John
Components of the UNL System • Universal Word • Relation Labels • Attributes
Universal Word [saayaa] "shadow(icl>darkness)"; the place was now in shadow [laoSamaa~] "shadow(icl>iota)"; not a shadow of doubt about his guilt [saMkot] "shadow(icl>hint)" ; the shadow of the things to come [Cayaa] "shadow(icl>deterrant)"; a shadow over his happiness
Universal Word (foreign concepts) [aput] "snow(icl>thing)"; [pukak] "snow(aoj<salt like)"; [mauja] "snow(aoj<soft, aoj<deep)"; [massak] "snow(aoj<soft)"; [mangokpok] "snow(aoj<watery)";
Relation agt (agent) Agt defines a thing which initiates an action. agt (do, thing) Syntaxagt[":"<Compound UW-ID>] "(" {<UW1>|":"<Compound UW-ID>} "," {<UW2>|":"<Compound UW-ID>} ")" Detailed DefinitionAgent is defined as the relation between:UW1 - do, andUW2 - a thingwhere: UW2 initiates UW1, or UW2 is thought of as having a direct role in making UW1 happen. Examples and readingsagt(break(icl>do), John(icl>person)) John breaksagt(translate(icl>do), computer(icl>machine)) computer translates
Attributes • Used to describe what is said from the speaker's point of view. • In particular captures number, tense, aspect and modality information.
Example Attributes • I see a flower UNL: obj(see(icl>do), flower(icl>thing)) • I saw flowers UNL: obj(see(icl>do).@past, flower(icl>thing).@pl) • Did I see flowers? UNL: obj(see(icl>do).@past.@interrogative, flower(icl>thing).@pl) • Please see the flowers? UNL: obj(see(icl>do).@past.@request, flower(icl>thing).@pl.@definite)
Analysis Rules Enconverter Dictionary ni-1 ni+3 Node List ni ni+1 ni+2 C C C A A A D Node-net C B E The Analyser Machine
Strategy for Analysis • Morphological Analysis • Syntactico-Semantic Analysis
Analysis of a simple sentences << A Report of John’s genius reached King’s ears>> articleandnounare combined andattribute@indefis added to the noun. <<[Report ][of] John’s genius reached king’s ears>> Right shift to put preposition with the succeeding noun. <</Report /[of ][John’s] genius reached king’s ears>> Ram’s being a possessing noun, shift right. <</Report //of / [John’s] [genius] reached king’s ears>> These two nouns are resolved into relation pos and first noun is deleted:
Simple sentence (continued) <</Report /[of][genius] reached King’s ears>> The preposition of is then combined with noun and a dynamic attribute OFRES is added to entry of genius. <<[Report][of genius ] reached King’s ears>> Using the attribute OFRES these two nouns are resolved to relation mod and the second noun is deleted. <<[Report ][reached] King’s ears>> Shift right again and solve King’s ears, relation pof is generated. <</Report /[reached][ ears]>> Relation obj is generated here and then relation agt is generated between Report and ears <</reached />>
UNL as Interlingua and Language Divergence(Dave, Parikh, Bhattacharyya, JMT, 2003) • Stands for the discrepancy in representation due to the inherent characteristics of the languages. • Syntactic Divergence • Lexical Semantic Divergence
Issue of free word order jaIma nao caaorI krnaovaalao laD,ko kao laazI sao maara. jaIma nao laazI sao caaorI krnaovaalao laD,ko kao maara. caaorI krnaovaalao laD,ko kao jaIma nao laazI sao maara. caaorI krnaovaalao laD,ko kao laazI sao jaIma nao maara. laazI sao jaIma nao caaorI krnaovaalao laD,ko kao maara. • Use made of the fact that in Hindi post positions stay adjacent to nouns (opposed to the preposition stranding divergence). • Flexibility in parsing- hit and preserve the predicate till the end.
Conjunct and Compound verbs Typical Indian language phenomenon. Conjunct for verb-verb, compound for other POS+verb. vah gaanao lagaI She started singing H calao jaaAao E Go away. H $k jaaAao E Stop there. H Jauk jaaAao E Bend down. Possibility of combinatorial explosion in the lexicon. Possible solution: wordnet?
Use of Lexical Resources Automatic Generation of the UW to language dictionary (Verma and Bhattacharyya, Global Wordnet Conference, Czeck Republic, 2004) Universal Word generation Semantic attribute generation Heavy use of wordnets and ontologies
Conclusions • Predicate preservation strategy used for English, Hindi, Marathi, Bengali (Spanish being added). • Focus in marathi on morphology for Marathi. • Focus on kaarak (case) system for Bengali. • Extremely lexical knowledge hungry.
Conclusions • Work going on in the creation of Indian language wordnets (Hindi, Marathi in IIT Bombay; Dravidian in Anna University). • Interlingua has a the attractive possibility of being used as a knowledge representation and applying to interesting applications like summarization, text clustering, meaning based multilingual search engines.
Generation of the Hindi Case System in an Interlingua based MTFramework Debasri Chakrabarti, Sunil Kumar Dubey, Pushpak Bhattacharyya. Computer Science and Engineering Department, Indian Institute of Technology, Bombay, Mumbai, 400076, India. debasri,dubey,pb@cse.iitb.ac.in
Introduction • Role of the case marker in a language • plays an important role in the structure of a sentence • helps to impart the meaning and naturalness • Example *मोटे तौर पर कृषि भूमि की जुताई, फसलों की रुपाई, कटाई, पालतू पशु प्रजनन, पालन, दुग्ध-व्यवसाय और वनीकरण सम्मिलित होता है । In a broad sense, agriculture includes cultivation of the soil and growing and harvesting crops and breeding and raising livestock and dairying and forestry.
The Case System in Hindi • Hindi is characterized by a rich subsystem of case • Example: राम ने रवि को किताब दी। Ram Erg Ravi Dat book Nom give + pastRam gave a book to Ravi. • Hindi has the following cases nominative, ergative, accusative, instrumental, dative, genitive locative
Nominative ~ Ergative alternation in the agent position • agent of an action may bear either nominative case or ergative case • ergative case appears in Hindi • simple past form • perfective aspect
Examples • राम ने रवि को पीटा। Ram erg Ravi acc beat+past Ram beat Ravi. • राम ने रवि को पीटा था। Ram erg Ravi accbeat+past perfect Ram had beaten Ravi. • राम ने रवि को पीटा है। Ram erg Raviacc beat+present perfect Ram has beaten Ravi.
Observations • There is a correlation between the ergative case and the aspectual property of the main verb • This is morphologically overt on the verb • Simple Past Tense: पीटा • Perfective Aspect: पीटा था • Morphological Rule • Simple Past Tense: V + आ ने • Perfective Aspect: V + आ + (Tense morphology) ने
Nominative ~ Ergative Alternation • Some Complex Phenomena • nominative case on the agent with the mentioned aspectual features • IS nominative ~ ergative subject to transitivity? • language universally transitivity determines nom ~ erg • three types of patterns independent of transitivity in Hindi
Nominative ~ Ergative Alternation • Three patterns are: • only nom agents • only erg agents • either nom or erg agents • Examples of Intransitive verbs • Only nom agents i) राम गिरा। Ram fell down Ram +nomfall + past. ii) *राम ने गिरा। Ram erg f all + past
Intransitive Verbs • Only erg agents i)राम ने प्रतीक्षा की। Ram waited. Ram ergwait + past. ii)*राम प्रतीक्षा किया। Ram +nomwait + past. • Either nom or erg agents i)राम खेला। Ram played. Ram +nom play + past. ii)राम ने खेला। Ramerg play + past.
Transitive Verbs • Only nom agents i)राम शीशा लाया। Ram brought the glass. Ram +nomglass bring + past. ii) *राम ने शीशा लाया। Ramergglassbring + past. • Only erg agents i) राम ने शीशा तोड़ा। Ram broke the glass. Ram ergglassbreak + past. ii) *राम शीशा तोड़ा। Ram +nom glassbreak + past.
Transitive Verbs • Either nom or erg agents i) राम ने समझा कि घर मेरा है। Ram erg think + past thathousemineis. Ram thought that the house is mine. ii)राम समझा कि घर मेरा है। Ram think + past thathousemineis.
Inferences • Ergative case in Hindi is semantically driven • action performed deliberately : ergative case • action performed non deliberately: nominative case • Examples of deliberate and non-deliberate action राम गिरा।Ram fell down Ram +nomfall + past. राम ने मोहन कोगिराया। Ram made Mohan to fall down. Ram ergMohan acc cause to fall down
Accusative ~ Nominative Alternation in the Object • Primary objects in Hindi • either accusative : को • or nom uninflected : Ө • Examples राम ने चावल खाया। Ram ate rice Ramergrice + nom eat+ past. राम ने रावण को मारा। Ram killed Ravan. RamergRavan acckill + past.