1 / 12

Interoperability and Platforms

Interoperability and Platforms. Nancy Ide Department of Computer Science Vassar College. Interoperability . Concerns for STS project: Communication between NLP tools Interoperability of results Two kinds of interoperability Syntactic : physical format

teleri
Download Presentation

Interoperability and Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interoperability and Platforms Nancy Ide Department of Computer Science Vassar College

  2. Interoperability • Concerns for STS project: • Communication between NLP tools • Interoperability of results • Two kinds of interoperability • Syntactic : physical format • Semantic : linguistic categories/labels

  3. Syntactic interoperability • Relies on specified data formats, communication protocols, and the like to ensure communication and data exchange • Systems involved can process the exchanged information, but no guarantee that the interpretation is the same

  4. Semantic interoperability • Two systems have the ability to automatically interpret exchanged information meaningfully and accurately in order to produce useful results via deference to a common information exchange reference model • The content of the information exchange requests are unambiguously defined: what is sent is the same as what is understood

  5. Interoperability concerns for STS project • Syntactic interoperability is not as much an issue • Several compatible and standard formats emerging (GrAF, NIF-RDF/OWL, etc.) • Semantic interoperability is more problematic • Issue of common labels, features • Issue of what are objects, features for communication among tools

  6. Interoperability concerns • If an architecture such as UIMA or GATE is used there are no interoperability concerns between modules • Interoperability and usability of final result could be an issue • Web services or other distributed model (plug and play) • Allows use of any modules, etc. • Must establish exchange protocols • This is being done anyway…

  7. Web service architecture • Pending final NSF approval, $2.1million grant to develop a distributed web service infrastructure for NLP • Leads: Brandeis (Pustejovsky), Vassar (Ide) • Sub-contracts: UPenn (LDC-Cieri), Carnegie-Mellon (Nyberg) • Modules to be developed include evaluation (CMU) based on “open advancement” • Plan for annual “challenges” to engage community • Could STS be one of those?

  8. Suggestions • STS would be an ideal pilot project for the larger web service platform project • Funding for this? • Contribute to development of standard exchange protocols • Syntactic interoperability: Use formats compatible with converging efforts • Linked data, ISO LAF/GrAF, etc. • Semantic interoperability: Use standard ontologies, data category registries, etc. for reference categories • ISOCat, Olia, etc.

  9. Openness!!! • Use open data (really open, not GPL or share-alike, non-commercial, etc.) • Use broad-genre data • Provide open and complete results • Would be nice if results of intermediate and final stages were openly available • Would be nice to have multiple annotations over same data • Link with other available data/annotations where possible

  10. Other Suggestions • STS is clearly a very big and complex area • Can we break it down in some way to make the problem more manageable? • What would some first steps involve? • Look at individual elements/aspects, or combinations of same? • What modules and combinations would be best to explore first? • Develop a good inventory of similarities and relations and explore each systematically? • Can we devise a map of the components of the task and their inter-relations? How well do we do on these components? Are we ready for STS when we are not yet good at, say, lexical similarity?

  11. Other Dimensions of Similarity? • Style/phrasing • More/less specific • Formality, register: lexical choice (collapsed vs fell down), phrase complexity, etc. • Creative language • Metaphor vs. literal • Shakespeare sonnet vs. Hobbs’ “meaning” To what extent do such variables contribute to meaning?

  12. Data • Multi-MASC?? • Desiderata • Corpora in multiple languages comparable to MASC • 500K over 19 genres (25K per genre) • Contemporary language (spoken and written) • Open data • Comparable, multi-layer annotations • Link annotations across languages

More Related