110 likes | 267 Views
PANACEA - Y2. After the 2 nd Annual Review, 28 th February 2012, Barcelona. Objectives. Join together a number of advanced interoperable tools to build a platform/factory/production line that automates the stages involved in the
E N D
PANACEA - Y2 After the 2nd Annual Review, 28th February 2012, Barcelona
Objectives • Join together a number of advanced interoperable tools to build a platform/factory/production line that automates the stages involved in the • acquiring, processing and producing Language Resources required by MT and other Language Technologies
Partners • WP1 – Management (UPF) • WP3 – The Platform (UPF) • WP4 – Corpus Acquisition & Annotation (ILSP) • WP5 – Parallel corpus & derivatives (DCU) • WP6 – Lexical Acquisition (UCAM) • WP7 – Integration & resource evaluation (ILC) • WP8 – Evaluation in industrial environment (LT) • WP2 – Dissemination and Exploitation (ELDA)
Platform • The PANACEA platform is an interoperability space based on tools, guidelines, a Common Interface definition, and a “Travelling Object” specification • Tools: Taverna, BioCatalogue, myExperiment, Soaplab • Common Interface: WS interoperability • Travelling Object: XCES and GrAF • Documentation (video tutorials, how-tos, deliverables, etc. at http://www.panacea-lr.eu)
Tools • Web application for deploying command line tools as WS • No coding needed! Metadata only • Services deployed by ILSP at http://nlp.ilsp.gr/ws/ SOAPLAB 2 (SOAP) Web Services • - Open source desktop application • Imports Soaplab and other types of WS • Allows for combination of WS in workflows (http://www.taverna.org.uk/) TAVERNA Workflow editor • Web application for registering and documenting WSs http://registry.elda.org • Search function • Auto-checks web services status • Annotations: tags, categories, etc. Registry BioCatalogue Social network • - Share workflows, files, data, etc. • Share opinions and comments, create work groups, etc. http://myexperiment.elda.org myExperiment
Interoperability • Three levels of interoperability: • COMMUNICATION PROTOCOLS: Soap, Rest • DATA • PARAMETERS Tool A Tool A Tool B Tool B Tool B does not “understand” format N! All tools understand the previous format A B C D A B C D A B C D Y T Q Z
Travelling Object • The Travelling Object (TO) is the common data and metadata format used in PANACEA to make components understand each other (syntactic interoperability) • First TO for annotations up to tagging and lemmatization • Based on XCES (XML files with p, s, and t elements) • Tools: formatConverters and stylesheets • Second TO for everything else (NER, DepParsing, etc.) • Based on GrAF (standoff annotation) • One file for primary data • One file for each annotation layer
Common Interface • A Common Interface (CI) defines the mandatory parameters for every type of WS: http://panacea-lr.eu/en/info-for-professionals/documents/ http://registry.elda.org
Soaplab Web Services • 28 Corpus Acquisition and Annotation Web Services • NLP WS’s focusing on sentence splitting, tokenization, tagging, lemmatization and parsing, e.g: • EN, FR: Berkeley tagger and parser (DCU) • ES: UPF tools, Freeling; IT: ILC’s DESR, Freeling • DE and EL: LT’s and ILSP’s in-house tools • WS’s for conversion from and to PANACEA’s Travelling Object (@UPF and ILC) • WS’s for alignment of parallel data (@DCU)
Corpus Acquisition WS Focused Bilingual Crawler (FBC) Documentation: http://registry.elda.org/services/127 Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_bilingual_crawl_row Sample topic definition for crawling EN-FR pages in the Environment domain http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_topics/ENV_EN_FR_topic.txt Seed URL for crawling EN-FR ENV data http://nlp.ilsp.gr/panacea/testinput/bilingual/ENV_EN_FR_greenfacts.txt Focused Monolingual Crawler (FMC) Documentation: http://registry.elda.org/services/160 Test at http://nlp.ilsp.gr/soaplab2-axis/#ilsp.ilsp_fmc_row Topic definition for crawling EN ENV data http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_topics/ENV_EN_topic.txt List of seed URLs for crawling EN ENV http://nlp.ilsp.gr/panacea/testinput/monolingual/ENV_seeds/ENV_EN_seeds.txt
Taverna Workflow Demo How can I align crawled data? Search for a DCU hosted alignment service at http://myexperiment.elda.org/workflows?query=align