1 / 11

PANACEA - Y2

PANACEA - Y2. After the 2 nd Annual Review, 28 th February 2012, Barcelona. Objectives. Join together a number of advanced interoperable tools to build a platform/factory/production line that automates the stages involved in the

vaughn
Download Presentation

PANACEA - Y2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PANACEA - Y2 After the 2nd Annual Review, 28th February 2012, Barcelona

  2. Objectives • Join together a number of advanced interoperable tools to build a platform/factory/production line that automates the stages involved in the • acquiring, processing and producing Language Resources required by MT and other Language Technologies

  3. Partners • WP1 – Management (UPF) • WP3 – The Platform (UPF) • WP4 – Corpus Acquisition & Annotation (ILSP) • WP5 – Parallel corpus & derivatives (DCU) • WP6 – Lexical Acquisition (UCAM) • WP7 – Integration & resource evaluation (ILC) • WP8 – Evaluation in industrial environment (LT) • WP2 – Dissemination and Exploitation (ELDA)

  4. Platform • The PANACEA platform is an interoperability space based on tools, guidelines, a Common Interface definition, and a “Travelling Object” specification • Tools: Taverna, BioCatalogue, myExperiment, Soaplab • Common Interface: WS interoperability • Travelling Object: XCES and GrAF • Documentation (video tutorials, how-tos, deliverables, etc. at http://www.panacea-lr.eu)

  5. Tools • Web application for deploying command line tools as WS • No coding needed! Metadata only • Services deployed by ILSP at http://nlp.ilsp.gr/ws/ SOAPLAB 2 (SOAP) Web Services • - Open source desktop application • Imports Soaplab and other types of WS • Allows for combination of WS in workflows (http://www.taverna.org.uk/) TAVERNA Workflow editor • Web application for registering and documenting WSs http://registry.elda.org • Search function • Auto-checks web services status • Annotations: tags, categories, etc. Registry BioCatalogue Social network • - Share workflows, files, data, etc. • Share opinions and comments, create work groups, etc. http://myexperiment.elda.org myExperiment

  6. Interoperability • Three levels of interoperability: • COMMUNICATION PROTOCOLS: Soap, Rest • DATA • PARAMETERS Tool A Tool A Tool B Tool B Tool B does not “understand” format N! All tools understand the previous format A B C D A B C D A B C D Y T Q Z

  7. Travelling Object • The Travelling Object (TO) is the common data and metadata format used in PANACEA to make components understand each other (syntactic interoperability) • First TO for annotations up to tagging and lemmatization • Based on XCES (XML files with p, s, and t elements) • Tools: formatConverters and stylesheets • Second TO for everything else (NER, DepParsing, etc.) • Based on GrAF (standoff annotation) • One file for primary data • One file for each annotation layer

  8. Common Interface • A Common Interface (CI) defines the mandatory parameters for every type of WS: http://panacea-lr.eu/en/info-for-professionals/documents/ http://registry.elda.org

  9. Soaplab Web Services • 28 Corpus Acquisition and Annotation Web Services • NLP WS’s focusing on sentence splitting, tokenization, tagging, lemmatization and parsing, e.g: • EN, FR: Berkeley tagger and parser (DCU) • ES: UPF tools, Freeling; IT: ILC’s DESR, Freeling • DE and EL: LT’s and ILSP’s in-house tools • WS’s for conversion from and to PANACEA’s Travelling Object (@UPF and ILC) • WS’s for alignment of parallel data (@DCU)

  10. Corpus Acquisition WS Focused Bilingual Crawler (FBC) Documentation: http://registry.elda.org/services/127 Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_bilingual_crawl_row Sample topic definition for crawling EN-FR pages in the Environment domain http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_topics/ENV_EN_FR_topic.txt Seed URL for crawling EN-FR ENV data http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_EN_FR_greenfacts.txt Focused Monolingual Crawler (FMC) Documentation: http://registry.elda.org/services/160 Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_fmc_row Topic definition for crawling EN ENV data http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_topics/ENV_EN_topic.txt List of seed URLs for crawling EN ENV http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_seeds/ENV_EN_seeds.txt

  11. Taverna Workflow Demo How can I align crawled data? Search for a DCU hosted alignment service at http://myexperiment.elda.org/workflows?query=align

More Related