1 / 24

Final Report of Working Group 5 Interoperation

Final Report of Working Group 5 Interoperation. G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg ELIIP Workshop, Salt Lake City, 12-14 Nov 2009. Interoperation. What is it?

ivory
Download Presentation

Final Report of Working Group 5 Interoperation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Report of Working Group 5Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg ELIIP Workshop, Salt Lake City, 12-14 Nov 2009

  2. Interoperation • What is it? • Interoperability is the ability for two or more systems to exchange information or services and to make satisfactory use of what is exchanged. • What does it take for this to happen: • The systems agree on standardized definitions of the concepts about which they want to share • The systems use a standardized format and protocol for information interchange

  3. Why interoperate? • It prevents a centralized service from duplicating the efforts of others • It maximizes data freshness since updates are propagated when made by the owner • It makes a centralized service more sustainable since others bear the cost of providing data • It allows multiple centralized service to add value to the same basic information

  4. Ways to build a web information service • Centralized database curation • The service is self-contained: the service defines the database, users edit the data directly, the service curates the information • Centralized database aggregation • The service has no data of its own: it uses an interoperation protocol to populate the database from other sources that curate the desired information

  5. The hybrid approach • The service uses an interoperation protocol to aggregate all information it can get from elsewhere. • The service develops a database to handle new information it will curate (whether missing data or alternative values). As a “good citizen” the service shares its unique data with others via the same protocol. • End users see a combination of the aggregated and the curated data.

  6. What does this mean for ELIIP? • For each kind of information that the centralized ELIIP service wants to offer, it must decide whether to: • Aggregate it, • Curate it, or • Do both • The answer can be different for different kinds of information

  7. What kinds of information? • Web pages about a language • Existing language documentation • Summary index of documentation level • Projects and people • Training and revitalization programs • The language situation • The genetic classification OUT OF SCOPE: Interoperation over language data (like dictionaries and interlinear texts)

  8. 1. Web pages on languages • Two low-bar approaches to interoperation: • Microformats: Harvestable metadata is embedded in the HTML coding of a page. • Predictable URL: A web site that offers information about many languages has a main page for each language with a base URL parameterized by the ISO 639-3 code

  9. ELIIP could … • Define microformats and provide a service for crawling pages on sites that use them • Identify web sites that should implement predictable URLs and provide funding to incentivize needed changes on those sites • Provide a service for registering base URLs and boilerplate metadata so that OLAC records are generated for all language codes that yield a page

  10. 2. Existing documentation • A working interoperation infrastructure already exists in OLAC • ELIIP should aggregate from OLAC to avoid duplicatin work • But there are huge gaps in the OLAC coverage • Thus ELIIP needs a hybrid approach as OLAC data provider to fill the gaps and as OLAC service provider to aggregate

  11. Filling the gapsSince … ELIIP could …

  12. 3. Documentation index • A numerical index that summarizes level of language documentation (as at AUSTLANG) is desirable • The OLAC aggregator (especially after ELIIP fills the gaps) provides a list of all the resources by linguistic data types • What’s needed is a way to convert those to a measure of extent

  13. ELIIP could … • Participate in the OLAC process to refine the linguistic data type vocabulary as needed • E.g. add “language instruction” • Participate in the OLAC process to add a new recommendation for <dc:extent> • E.g. lexicon/0, lexicon/1, lexicon/2, lexicon/3 • Promote its adoption by all OLAC participants and add curated judgments where that fails • Develop an overall numerical index that combines results over all the data types

  14. 4. People and projects • The OLAC infrastructure can support this • DCMI Type vocabulary: • Event: A time-bounded occurrence • A project can be described in an OLAC record using elements like Contributor, Language, Linguistic data type, Description • An advantage of this approach is that projects appear with all other resources in any OLAC-based service

  15. ELIIP could … • Propose a metadata refinement to distinguish a project from other kinds of “events” • Curate records that allow linguists to describe their own projects • Help players like funding agencies with databases of relevant projects to become OLAC data providers

  16. 5. Training and revitalization • The OLAC infrastructure can support this • A training course or revitalization program can be described in an OLAC record with DCMI Type = “Event” + OLAC resource type = “language instruction” + Language, Description, Identifier for a URL • This approach allows these programs to appear with all other resources for the language in any OLAC-based service

  17. ELIIP could … • Curate records that allow these programs to describe themselves • Help players who are curating databases of training events to become OLAC data providers

  18. 6. Language situation • No suitable interoperation standard yet exists for population data, etc. • Are there other projects already curating this kind of information such that interoperation is desirable? • E.g. UNESCO Atlas, Ethnologue, AUSTLANG • But interoperation will only work if all the players agree to do it

  19. ELIIP could … • During proposal phase, identify the projects that should interoperate and secure agreement in principle to participate • During the project phase, foster the process among those players to agree on standard definitions, format, and protocol • Could use the OAI protocol • “olac” payload for the metadata • “eliip” payload for the language information

  20. ELIIP could also … • Provide a feedback mechanism that allows a user to report an error back to the provider of the aggregated data • Provide a publicly viewable tracking mechanism to ensure accountability of the data providers, e.g. • Is a population in Ethnologue or UNESCO wrong because they won’t fix it when someone reports the right data, or because the person who knows won’t tell them?

  21. Nota Bene • None of the “ELIIP could” proposals up to this point would require the overhead of a governing body or regional captains to vet individual data points (though they would still have a role in recommending and vetting aggregation sources). • That threshold is crossed if ELIIP chooses to: • Curate its own version of language situation data that it judges to be the most correct

  22. 7. Genetic classification • Same story as for “language situation” information • If the set of data providers is the same as for the situation information, then this could be included in the interoperation standard as a kind of situation information • If there is a different set of players, ELIIP could foster the same process to develop an interoperation standard for classification

  23. Thought for the day • The aggregator lies at the sweet spot in the value chain of today’s web economy. • E.g. Google, Amazon, iTunes, Netflix • Cf. Chris Anderson, The Long Tail (2006)

  24. Conclusion • There are many things that ELIIP could do: • To exploit the power of interoperation • For mobilizing our community to share information about endangered languages • While minimizing what it must centrally curate • The task for the ELIIP planners is to decide which of these things they wantto do

More Related