120 likes | 309 Views
Interoperability and Platforms. Nancy Ide Department of Computer Science Vassar College. Interoperability . Concerns for STS project: Communication between NLP tools Interoperability of results Two kinds of interoperability Syntactic : physical format
E N D
Interoperability and Platforms Nancy Ide Department of Computer Science Vassar College
Interoperability • Concerns for STS project: • Communication between NLP tools • Interoperability of results • Two kinds of interoperability • Syntactic : physical format • Semantic : linguistic categories/labels
Syntactic interoperability • Relies on specified data formats, communication protocols, and the like to ensure communication and data exchange • Systems involved can process the exchanged information, but no guarantee that the interpretation is the same
Semantic interoperability • Two systems have the ability to automatically interpret exchanged information meaningfully and accurately in order to produce useful results via deference to a common information exchange reference model • The content of the information exchange requests are unambiguously defined: what is sent is the same as what is understood
Interoperability concerns for STS project • Syntactic interoperability is not as much an issue • Several compatible and standard formats emerging (GrAF, NIF-RDF/OWL, etc.) • Semantic interoperability is more problematic • Issue of common labels, features • Issue of what are objects, features for communication among tools
Interoperability concerns • If an architecture such as UIMA or GATE is used there are no interoperability concerns between modules • Interoperability and usability of final result could be an issue • Web services or other distributed model (plug and play) • Allows use of any modules, etc. • Must establish exchange protocols • This is being done anyway…
Web service architecture • Pending final NSF approval, $2.1million grant to develop a distributed web service infrastructure for NLP • Leads: Brandeis (Pustejovsky), Vassar (Ide) • Sub-contracts: UPenn (LDC-Cieri), Carnegie-Mellon (Nyberg) • Modules to be developed include evaluation (CMU) based on “open advancement” • Plan for annual “challenges” to engage community • Could STS be one of those?
Suggestions • STS would be an ideal pilot project for the larger web service platform project • Funding for this? • Contribute to development of standard exchange protocols • Syntactic interoperability: Use formats compatible with converging efforts • Linked data, ISO LAF/GrAF, etc. • Semantic interoperability: Use standard ontologies, data category registries, etc. for reference categories • ISOCat, Olia, etc.
Openness!!! • Use open data (really open, not GPL or share-alike, non-commercial, etc.) • Use broad-genre data • Provide open and complete results • Would be nice if results of intermediate and final stages were openly available • Would be nice to have multiple annotations over same data • Link with other available data/annotations where possible
Other Suggestions • STS is clearly a very big and complex area • Can we break it down in some way to make the problem more manageable? • What would some first steps involve? • Look at individual elements/aspects, or combinations of same? • What modules and combinations would be best to explore first? • Develop a good inventory of similarities and relations and explore each systematically? • Can we devise a map of the components of the task and their inter-relations? How well do we do on these components? Are we ready for STS when we are not yet good at, say, lexical similarity?
Other Dimensions of Similarity? • Style/phrasing • More/less specific • Formality, register: lexical choice (collapsed vs fell down), phrase complexity, etc. • Creative language • Metaphor vs. literal • Shakespeare sonnet vs. Hobbs’ “meaning” To what extent do such variables contribute to meaning?
Data • Multi-MASC?? • Desiderata • Corpora in multiple languages comparable to MASC • 500K over 19 genres (25K per genre) • Contemporary language (spoken and written) • Open data • Comparable, multi-layer annotations • Link annotations across languages