260 likes | 413 Views
Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components. Ulrich Schäfer Language Technology Lab DFKI Germany. Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan. Presentation Outline.
E N D
Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components Ulrich SchäferLanguage Technology Lab DFKI Germany Arif BramantoroandToru Ishida Department of Social Informatics Kyoto UniversityJapan
Presentation Outline • Introduction • Language Grid • Workflow inLanguage Grid • Heart of Gold • Processing Flow in Heart of Gold • Combination • Pipelining Support Service • Conclusion
Introduction • Lots of natural language processing (NLP)architectures • Each NLP architecture has its own characteristic • Language Grid (NICT-Japan): Service-oriented architecture • Heart of Gold (DFKI-Germany): Functional-oriented architecture • To increase the number of language services in Language Grid • Why not integrating NLP architectures instead of integrating NLP tools?
Introduction (2) – A Motivation for Combination • A challenging issue: both have specific way for multi processing • Language Grid: Workflow for composite services • Heart of Gold: Processing flow for multiple linguistic processing components • Functionalities for access management is only available in Language Grid
Language Grid (2) • A new service oriented multilingual infrastructure on Internetto support intercultural activities • Language resources with complicated intellectual property can be wrapped and shared • Linked Service Grids: • Language Grid in Japan • Language Grid in Thailand • Agricultural Service Grid • Education Service Grid • etc
Language Grid BPEL Composite Service Engine Application System Service Grid Server Software Java Composite Service Engine ServiceManager Script Composite Service Engine ServiceInvoker Java Atomic Service Engine Grid Composer Service Resource Service Resource Service Resource Other Service Grid Network Program Resource Database Native Program
Workflow in Language Grid • Sample methods of workflow for composite services • Business Process Execution Language (BPEL) • Script • Java, etc • Additional technique for composite service: Constraint Satisfaction • X = {X1,…,Xn}is a set of abstract web services • D = {D1,…,Dn} • Di = {si1,...,sik}where sij is a concrete web service of the corresponding Xi • C = {C1,…,Cp}is a set of constraints
Workflow in Language Grid (2) • X = {X1, X2, X3, X4, X5} • X1 : Morphological analyzer service • X2 : ja-en translation service • X3 : en-id translation service • X4 : Community dictionary service; • X5 : Term replacement service • D = {D1, D2, D3, D4, D5} • D1 : {mecab at NTT, ICTCLAS, KLT at Kookmin University, treetTagger at IMS Stuttgart}; • D2 : {JServer at NICT, WEB-Transer at Kyoto-U, Google Translation, Translution} • D3 : {ToggleText} • D4 : {Life Science Dictionary, Natural Disasters Dictionary, Kyoto Tourism Dictionary} • D5 : {TermRepl service} Japanese Morphological Analysis Service ja->en Translation Service Community Dictionary Service en->id Translation Service Term Replacement Service • C = {C1, C2, C3} • C1 : For multi hop translation, X2.OUT = X3.IN • C2 : For specializedtranslation service with dictionary, serverLocation(X2) = serverLocation(X4) • C3 : For morphological analysis,partialAnalyzedResult(X1.OUT) ∈ X2.IN
Heart of Gold • Functional oriented middleware architecture for integrating deep and shallow Natural Language Processing (NLP) components Application XML-RPC / Java API queries results Heart of Gold Middleware MoCoMan Modules External persistent annotation database Computed annotation External NLP Components
Heart of Gold – Deep NLP • Key feature of Heart of Gold • unavailable in Language Grid • Try to apply as much linguistic knowledge as possible • Linguistic knowledge is declaratively encoded • Tom gave his son a toy past(give(Tom, his son, toy)) • Syntactic variants: ‘A toy was given by Tom to his son’ or ‘Tom gave his son the toy’
Processing Flow in Heart of Gold • 3 methods of processing flow for multiple NLP components • Varying depth of modules • Varying additional input & output annotation • Using SDL (System Description Language; Krieger, 2003) • + (sequence) • one component starts after the previous component has finished, taking its output as own input • | (parallelism) • multiple components are executed in parallel in separate threads in Java • ∗(unrestricted iteration) • a component is executed in a loop until its output remains unchanged
Processing Flow in Heart of Gold (2) input sentence RMRS result chunkiermrs = ( sprout_rmrs_morph + xslt_pos_filter +sprout_rmrs_lex+ (* xslt_nodeid_cat + sprout_rmrs_phrase ) +slt_fs2rmrsxml) sprout_rmrs_morph = SproutModulesTextDom("rmrs-morph.cfg") xslt_pos_filter = XsltModulesDomDom("posfilter.xsl", "aid", "Chunkie") sprout_rmrs_lex = SproutModulesDomDom("rmrs-lex.cfg") xslt_nodeid_cat = XsltModulesDomDom("nodeinfo.xsl", "aid", "Chunkie") sprout_rmrs_phrase = SproutModulesDomDom("rmrs-phrase.cfg") xslt_fs2rmrsxml = XsltModulesDomDom("fs2rmrsxml.xsl") SProUT-XSLT cascaded language components SProUT rmrs_morph XSLT pos_filter SProUT rmrs_lex XSLT nodeid_cat SProUT rmrs_phrase XSLT fs2rmrsxml
Combining Two Architectures • Wrapping Heart of Gold as atomic service in language resource layer of Language Grid • Service Input: language identifier, text to be analyzed, depth of analysis • Service Output: XML string Intercultural Collaboration Tools Wrapped Web Service queries XML-RPC results Language Services (specialized translation, multi-hop translation, …) Heart of Gold Middleware Language Resources (machine translations, morphological analyzers, dictionaries, …) Heart of Gold P2P Grid Infrastructure External NLP Component 1 External NLP Component n ...
Combining Two Architectures (2) • What about composite service? • Unable to run the composite service from language resource layer • Workflow & processing flow are different • Should move to upper layer: language service layer • Solution • Use processing flow in Language Grid • Use workflow in Heart of Gold • Create pipelining service
Combination of Two Flows (1) I visited the temple of the golden pavilion at Kyoto I visited The Temple of the Golden Pavilion at Kyoto <FS type="ne-location"> the temple of the golden pavilion at Kyoto </FS> Hart of Gold (SProUT) TreeTagger Processing flow J-Server en -> ja Translation Service J-Server en -> ja Translation Service Science Dictionary Service Science Dictionary Service Tourism Dictionary Service The Temple of the Golden Pavilion = − The Temple of the Golden Pavilion = Kinkakuji ChaSen ChaSen Term Replacement Term Replacement Watashi ha kyoto de goorudentenjikan no jiinwohoumonshita Watashi ha kyoto de Kinkakujiwohoumonshita a) Before Combination (Language Grid) b) After Combination(Language Grid + Heart of Gold)
Combination of Two Flows (2) • Utilizing Service as a Software • Wrap language service containing workflow as Heart of Gold component • Useful for NLPs with limited supported language (ex: ChunkieRMRS is only available for German & English) workflow Heart of Gold components workflow XML Converter XML Converter ChunkieRMRS Specialized ja-en translation service Specialized en-ja translation service input sentence in Japanese output RMRSmerge in Japanese
Supporting Service for Pipelining NLP • A service to orchestrate a new workflow containing processing flow (SDL) • by analyzing current workflow and processing flow • useful for pipelining NLP • Can be offline or online with user request Processing Flow & Workflow Integrator Service Processing Flow Analyzer Workflow Analyzer SDL Writer New Workflows + SDL Component Information Service Profile Set of Workflows Language Component Information Repository (Class, Depth, Input-Output) Language Service Information Repository (WSDL, QoS Profile) Extended Workflow Repository in Constraint Optimization
Conclusion Contribution • Composite language services & language components can be integrated • by utilizingtheir processing flow & workflow • Additional pipelining support service to modify the existing workflow • Language service is a good way to combine human and machine language processing • Flexibility for high speed pipeline: BPEL, Script, etc • Possible intra-server workflow from the integration Lesson Learned
Q & A Thank you for listening