1 / 25

Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan

Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components. Ulrich Schäfer Language Technology Lab DFKI Germany. Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan. Presentation Outline.

fauve
Download Presentation

Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components Ulrich SchäferLanguage Technology Lab DFKI Germany Arif BramantoroandToru Ishida Department of Social Informatics Kyoto UniversityJapan

  2. Presentation Outline • Introduction • Language Grid • Workflow inLanguage Grid • Heart of Gold • Processing Flow in Heart of Gold • Combination • Pipelining Support Service • Conclusion

  3. Introduction

  4. Introduction • Lots of natural language processing (NLP)architectures • Each NLP architecture has its own characteristic • Language Grid (NICT-Japan): Service-oriented architecture • Heart of Gold (DFKI-Germany): Functional-oriented architecture • To increase the number of language services in Language Grid • Why not integrating NLP architectures instead of integrating NLP tools?

  5. Introduction (2) – A Motivation for Combination • A challenging issue: both have specific way for multi processing • Language Grid: Workflow for composite services • Heart of Gold: Processing flow for multiple linguistic processing components • Functionalities for access management is only available in Language Grid

  6. Language Grid

  7. Language Grid (2) • A new service oriented multilingual infrastructure on Internetto support intercultural activities • Language resources with complicated intellectual property can be wrapped and shared • Linked Service Grids: • Language Grid in Japan • Language Grid in Thailand • Agricultural Service Grid • Education Service Grid • etc

  8. Language Grid BPEL Composite Service Engine Application System Service Grid Server Software Java Composite Service Engine ServiceManager Script Composite Service Engine ServiceInvoker Java Atomic Service Engine Grid Composer Service Resource Service Resource Service Resource Other Service Grid Network Program Resource Database Native Program

  9. Workflow in Language Grid • Sample methods of workflow for composite services • Business Process Execution Language (BPEL) • Script • Java, etc • Additional technique for composite service: Constraint Satisfaction • X = {X1,…,Xn}is a set of abstract web services • D = {D1,…,Dn} • Di = {si1,...,sik}where sij is a concrete web service of the corresponding Xi • C = {C1,…,Cp}is a set of constraints

  10. Workflow in Language Grid (2) • X = {X1, X2, X3, X4, X5} • X1 : Morphological analyzer service • X2 : ja-en translation service • X3 : en-id translation service • X4 : Community dictionary service; • X5 : Term replacement service • D = {D1, D2, D3, D4, D5} • D1 : {mecab at NTT, ICTCLAS, KLT at Kookmin University, treetTagger at IMS Stuttgart}; • D2 : {JServer at NICT, WEB-Transer at Kyoto-U, Google Translation, Translution} • D3 : {ToggleText} • D4 : {Life Science Dictionary, Natural Disasters Dictionary, Kyoto Tourism Dictionary} • D5 : {TermRepl service} Japanese Morphological Analysis Service ja->en Translation Service Community Dictionary Service en->id Translation Service Term Replacement Service • C = {C1, C2, C3} • C1 : For multi hop translation, X2.OUT = X3.IN • C2 : For specializedtranslation service with dictionary, serverLocation(X2) = serverLocation(X4) • C3 : For morphological analysis,partialAnalyzedResult(X1.OUT) ∈ X2.IN

  11. Heart of Gold

  12. Heart of Gold • Functional oriented middleware architecture for integrating deep and shallow Natural Language Processing (NLP) components Application XML-RPC / Java API queries results Heart of Gold Middleware MoCoMan Modules External persistent annotation database Computed annotation External NLP Components

  13. Heart of Gold – Deep NLP • Key feature of Heart of Gold • unavailable in Language Grid • Try to apply as much linguistic knowledge as possible • Linguistic knowledge is declaratively encoded • Tom gave his son a toy  past(give(Tom, his son, toy)) • Syntactic variants: ‘A toy was given by Tom to his son’ or ‘Tom gave his son the toy’

  14. Processing Flow in Heart of Gold • 3 methods of processing flow for multiple NLP components • Varying depth of modules • Varying additional input & output annotation • Using SDL (System Description Language; Krieger, 2003) • + (sequence) • one component starts after the previous component has finished, taking its output as own input • | (parallelism) • multiple components are executed in parallel in separate threads in Java • ∗(unrestricted iteration) • a component is executed in a loop until its output remains unchanged

  15. Processing Flow in Heart of Gold (2) input sentence RMRS result chunkiermrs = ( sprout_rmrs_morph + xslt_pos_filter +sprout_rmrs_lex+ (* xslt_nodeid_cat + sprout_rmrs_phrase ) +slt_fs2rmrsxml) sprout_rmrs_morph = SproutModulesTextDom("rmrs-morph.cfg") xslt_pos_filter = XsltModulesDomDom("posfilter.xsl", "aid", "Chunkie") sprout_rmrs_lex = SproutModulesDomDom("rmrs-lex.cfg") xslt_nodeid_cat = XsltModulesDomDom("nodeinfo.xsl", "aid", "Chunkie") sprout_rmrs_phrase = SproutModulesDomDom("rmrs-phrase.cfg") xslt_fs2rmrsxml = XsltModulesDomDom("fs2rmrsxml.xsl") SProUT-XSLT cascaded language components SProUT rmrs_morph XSLT pos_filter SProUT rmrs_lex XSLT nodeid_cat SProUT rmrs_phrase XSLT fs2rmrsxml

  16. Combination

  17. Combining Two Architectures • Wrapping Heart of Gold as atomic service in language resource layer of Language Grid • Service Input: language identifier, text to be analyzed, depth of analysis • Service Output: XML string Intercultural Collaboration Tools Wrapped Web Service queries XML-RPC results Language Services (specialized translation, multi-hop translation, …) Heart of Gold Middleware Language Resources (machine translations, morphological analyzers, dictionaries, …) Heart of Gold P2P Grid Infrastructure External NLP Component 1 External NLP Component n ...

  18. Combining Two Architectures (2) • What about composite service? • Unable to run the composite service from language resource layer • Workflow & processing flow are different • Should move to upper layer: language service layer • Solution • Use processing flow in Language Grid • Use workflow in Heart of Gold • Create pipelining service

  19. Combination of Two Flows (1) I visited the temple of the golden pavilion at Kyoto I visited The Temple of the Golden Pavilion at Kyoto <FS type="ne-location"> the temple of the golden pavilion at Kyoto </FS> Hart of Gold (SProUT) TreeTagger Processing flow J-Server en -> ja Translation Service J-Server en -> ja Translation Service Science Dictionary Service Science Dictionary Service Tourism Dictionary Service The Temple of the Golden Pavilion = − The Temple of the Golden Pavilion = Kinkakuji ChaSen ChaSen Term Replacement Term Replacement Watashi ha kyoto de goorudentenjikan no jiinwohoumonshita Watashi ha kyoto de Kinkakujiwohoumonshita a) Before Combination (Language Grid) b) After Combination(Language Grid + Heart of Gold)

  20. Combination of Two Flows (2) • Utilizing Service as a Software • Wrap language service containing workflow as Heart of Gold component • Useful for NLPs with limited supported language (ex: ChunkieRMRS is only available for German & English) workflow Heart of Gold components workflow XML Converter XML Converter ChunkieRMRS Specialized ja-en translation service Specialized en-ja translation service input sentence in Japanese output RMRSmerge in Japanese

  21. Pipelining Support Service

  22. Supporting Service for Pipelining NLP • A service to orchestrate a new workflow containing processing flow (SDL) • by analyzing current workflow and processing flow • useful for pipelining NLP • Can be offline or online with user request Processing Flow & Workflow Integrator Service Processing Flow Analyzer Workflow Analyzer SDL Writer New Workflows + SDL Component Information Service Profile Set of Workflows Language Component Information Repository (Class, Depth, Input-Output) Language Service Information Repository (WSDL, QoS Profile) Extended Workflow Repository in Constraint Optimization

  23. Conclusion

  24. Conclusion Contribution • Composite language services & language components can be integrated • by utilizingtheir processing flow & workflow • Additional pipelining support service to modify the existing workflow • Language service is a good way to combine human and machine language processing • Flexibility for high speed pipeline: BPEL, Script, etc • Possible intra-server workflow from the integration Lesson Learned

  25. Q & A Thank you for listening

More Related