1 / 20

Interoperability - where are we -

Interoperability - where are we -. Peter Wittenburg. Why care about interoperability?. e-Science & e-Humanities need to get integrated access to many data sets data sets are scattered across many repositories => (virtual) integration created by different research teams using different

oralee
Download Presentation

Interoperability - where are we -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interoperability- where are we - Peter Wittenburg

  2. Why care about interoperability? • e-Science & e-Humanities • need to get integrated access to many data sets • data sets are • scattered across many repositories => (virtual) integration • created by different research teams using different • conventions (formats, semantics) • often in bad states and quality => curation • “interoperability” most used word at various conferences

  3. What is interoperability? • Technical Interoperability (techn. encoding, format, structure, API, protocol) • Semantic Interoperability (bridging between conceptual spaces encoded in data) • also about bridging understanding between humans <köter> <dog> <hund>

  4. many layers of interop: access Enabling Technologies ID ID ID ID ID 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. 0100 0101.. ID ID ID ID ID ID ID ID Discovery metadata search resulting in Handles (PID) and some properties need a high degree of automation Access (ref. resolution, protocols, AAI) Handle (PID) resolution and you get the data ID ID ID ID ID Scientists, Data Curators, End Users, Applications here linguistics is playing a role (get schemas and semantics) Interpretation what can be automated Datasets Accessed via Repositories Reuse here linguistics is playing an even bigger role (get context information)

  5. What is interoperability? • of course focus on technical systems for interoperability • lot of time and money has been invested to create formal ontologies of different forms • BUT ontologies are extremely underused in almost all disciplines • AND ontologies are still a domain of experts • just created EUON (European Ontology network) as response • why • people want to work across collections and ignore theories • want to overcome hurdles problems in a pragmatic way • BUT ontologies are expensive and thus static • AND scientists easily claim that mapping tag sets is impossible since they are part of complex theories

  6. What is interoperability? Wikipedia: Interoperability is a property of a system, whose interfaces are completely understood, to work with other systems, present or future, without any restricted access or implementation. IEEE: Interoperability is the ability of two or more systems or components to exchange information and to use the information that has been exchanged. O’Brian/Marakas: Being able to accomplish end-user applications using different types of computer system, operating systems, and application software, interconnected by different types of local and wide-area networks. OSLC: To be interoperable one should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organization are managed in such a way as to maximise opportunities for exchange and re-use of information.

  7. examples from linguistics • 4 examples • metadata • DOBES • CLARIN

  8. metadatais kind of easy • DC/OLAC – CMDI mapping examples: • DC:languageCMDI:languageIn • DC:languageCMDI:dominantLanguage • DC:languageCMDI:sourceLanguage • DC:languageCMDI: targetLanguage • DC:dateCMDI:creationDate • DC:dateCMDI:publicationDate • DC:dateCMDI:startYear • DC:dateCMDI:derivationDate • DC:formatCMDI:mediaType • DC:formatCMDI:mimeType • DC:formatCMDI:annotationFormat • DC:formatCMDI:characterEncoding • everyone accepts now: metadata is for pragmatic purposes and not replacing the one and only one true categorization • mapping errors may influence recall and precision – but who cares really semantic mapping doable due to limited element sets and to now well-described semantics (except for recursive machines such as TEI) if mapping is used for discovery – no problem if mapping is used for statistics – well ... crucial for machine processing

  9. an text example what’s this? • Example from Kilivila (Trobriand Islands – New Guinea) • p1tr Ambeya • p1en Where do you go? • p2tr Bala bakakaya • p2w-en I will go I will take a bath • p2en I will go to have a bath • p2tr Bila bikakaya bike’ita bisisu bipaisewa • p3gl3.Fut-go 3.Fut-bath 3.Fut-come back 3.Fut-be 3.Fut-work • p2w-en He will go - he will have a bath - he will come back – he will stay - • he will work. • p2en He will take a bath and afterwards work with us. what’s this? big question: how to apply semi-automatic procedures across different corpora given such encodings

  10. Hum. Example: Multi-verb Expressions mixed glossing what’s this? what’s this? POS tagging

  11. a multimodal example Interaction Study: 12 participants + exper; per part. 7 tiers tier names from Toolbox

  12. tier names – an area of creativity

  13. 5 cross-corpora projects in DOBES • demonstratives with exophoric reference • (morpho-syntactic and discourse pragmatic analysis incl. gestures) • discourse and prosody – convergence in information structure • relative frequencies of nouns, pronouns and verbs • cross-linguistic patterns in 3-participant events • one rather large program with 13 teams covering different languages • primary topic is “referentiality” • bigger question: how to do this kind of cross-corpus work • strategy: define new tag set and add a manually created tier • yet no agreed tags – committee has been formed • now in a process to determine selection of corpora • question: will existing tags help to find spots of relevance in general: additional tagging based on specific agreements are existing annotations of any help? finally everyone works in his/her data

  14. Cross-corpus search in CLARIN • well Metadata is obvious –> Virtual Language Observatory • harvesting and mapping is not the problem • bad quality is the problem (as for Europeana etc.) • planned is f.e. distributed content search SRU/ CQL

  15. what did we try? • open RDF/OWL assertion space • some dream from it – never ready • complex ontologies (concepts and their relations) • huge investments, proper formalization requires experts • thus too static, hardly adaptable, etc. • thesauri • just reduced to hierarchical structure, still huge enterprise • similar comments • flat registries (ISOcat, vocabularies, etc) • simpler, reduced, some formalisms to allow machine op • already too complex for many - too reduced for some • AND: where to put the relations and restrictions

  16. what are we trying? • can we break dependence from experts????? • wikis • just prose, hardly to process by machines • semantic wiki (RDA DFT etc) • start simple – get more complex where needed + possible • let some start with prose (definitions) • let others add formalisms • (who are the others – do we pay them) • still: where to put relations • keep them separate (lesson learned) • personal spaces, sharable, exchangeable, etc. • it’s the quality of the framework that will count

  17. Jan’s ISOcat questions • e0: annotations are structured: “np\s/np” • e1: “JJR” -> “POS=adjective & degree=comparative” • e2: “Transitive” -> “thetavp=vp120 & synvps=[synNP] & caseAssigner=True” • e3: “VVIMP” -> “POS= verb & main verb & • mood=imperative” • where to put annotation complexity if “ontology” is simple • complexity needs to be put into schemas • who can do it – is it feasible? • mapping must be between combinations of cats or graphs • who can do it – is it feasible?

  18. what about processes? • in ISOcat we failed completely • we are good guys - so why? • very costly work – but not well funded, no career, ... • communities created their semantics – why to do it twice? why bother about semantics invented by others, ... • do people want to vote – NOT really • AND: people need to create publications en masse • why invest in something where the benefits cannot be seen immediately

  19. and now? • seems whatever we do it is too early in some domains • people don’t see the need to invest • it is still more profitable to find ad hoc solutions • the mechanisms we offer are not easy enough • etc • the decision to separate concept definitions and relations is still correct, since many relations are task dependent and not “given” • obviously the way with SemanticWiki is ok, since it combines a simple entrance to allow everyone to participate and others to curate things towards more machine readable stuff.

  20. Thanks for the attention.

More Related