240 likes | 376 Views
Final Report of Working Group 5 Interoperation. G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg ELIIP Workshop, Salt Lake City, 12-14 Nov 2009. Interoperation. What is it?
E N D
Final Report of Working Group 5Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg ELIIP Workshop, Salt Lake City, 12-14 Nov 2009
Interoperation • What is it? • Interoperability is the ability for two or more systems to exchange information or services and to make satisfactory use of what is exchanged. • What does it take for this to happen: • The systems agree on standardized definitions of the concepts about which they want to share • The systems use a standardized format and protocol for information interchange
Why interoperate? • It prevents a centralized service from duplicating the efforts of others • It maximizes data freshness since updates are propagated when made by the owner • It makes a centralized service more sustainable since others bear the cost of providing data • It allows multiple centralized service to add value to the same basic information
Ways to build a web information service • Centralized database curation • The service is self-contained: the service defines the database, users edit the data directly, the service curates the information • Centralized database aggregation • The service has no data of its own: it uses an interoperation protocol to populate the database from other sources that curate the desired information
The hybrid approach • The service uses an interoperation protocol to aggregate all information it can get from elsewhere. • The service develops a database to handle new information it will curate (whether missing data or alternative values). As a “good citizen” the service shares its unique data with others via the same protocol. • End users see a combination of the aggregated and the curated data.
What does this mean for ELIIP? • For each kind of information that the centralized ELIIP service wants to offer, it must decide whether to: • Aggregate it, • Curate it, or • Do both • The answer can be different for different kinds of information
What kinds of information? • Web pages about a language • Existing language documentation • Summary index of documentation level • Projects and people • Training and revitalization programs • The language situation • The genetic classification OUT OF SCOPE: Interoperation over language data (like dictionaries and interlinear texts)
1. Web pages on languages • Two low-bar approaches to interoperation: • Microformats: Harvestable metadata is embedded in the HTML coding of a page. • Predictable URL: A web site that offers information about many languages has a main page for each language with a base URL parameterized by the ISO 639-3 code
ELIIP could … • Define microformats and provide a service for crawling pages on sites that use them • Identify web sites that should implement predictable URLs and provide funding to incentivize needed changes on those sites • Provide a service for registering base URLs and boilerplate metadata so that OLAC records are generated for all language codes that yield a page
2. Existing documentation • A working interoperation infrastructure already exists in OLAC • ELIIP should aggregate from OLAC to avoid duplicatin work • But there are huge gaps in the OLAC coverage • Thus ELIIP needs a hybrid approach as OLAC data provider to fill the gaps and as OLAC service provider to aggregate
3. Documentation index • A numerical index that summarizes level of language documentation (as at AUSTLANG) is desirable • The OLAC aggregator (especially after ELIIP fills the gaps) provides a list of all the resources by linguistic data types • What’s needed is a way to convert those to a measure of extent
ELIIP could … • Participate in the OLAC process to refine the linguistic data type vocabulary as needed • E.g. add “language instruction” • Participate in the OLAC process to add a new recommendation for <dc:extent> • E.g. lexicon/0, lexicon/1, lexicon/2, lexicon/3 • Promote its adoption by all OLAC participants and add curated judgments where that fails • Develop an overall numerical index that combines results over all the data types
4. People and projects • The OLAC infrastructure can support this • DCMI Type vocabulary: • Event: A time-bounded occurrence • A project can be described in an OLAC record using elements like Contributor, Language, Linguistic data type, Description • An advantage of this approach is that projects appear with all other resources in any OLAC-based service
ELIIP could … • Propose a metadata refinement to distinguish a project from other kinds of “events” • Curate records that allow linguists to describe their own projects • Help players like funding agencies with databases of relevant projects to become OLAC data providers
5. Training and revitalization • The OLAC infrastructure can support this • A training course or revitalization program can be described in an OLAC record with DCMI Type = “Event” + OLAC resource type = “language instruction” + Language, Description, Identifier for a URL • This approach allows these programs to appear with all other resources for the language in any OLAC-based service
ELIIP could … • Curate records that allow these programs to describe themselves • Help players who are curating databases of training events to become OLAC data providers
6. Language situation • No suitable interoperation standard yet exists for population data, etc. • Are there other projects already curating this kind of information such that interoperation is desirable? • E.g. UNESCO Atlas, Ethnologue, AUSTLANG • But interoperation will only work if all the players agree to do it
ELIIP could … • During proposal phase, identify the projects that should interoperate and secure agreement in principle to participate • During the project phase, foster the process among those players to agree on standard definitions, format, and protocol • Could use the OAI protocol • “olac” payload for the metadata • “eliip” payload for the language information
ELIIP could also … • Provide a feedback mechanism that allows a user to report an error back to the provider of the aggregated data • Provide a publicly viewable tracking mechanism to ensure accountability of the data providers, e.g. • Is a population in Ethnologue or UNESCO wrong because they won’t fix it when someone reports the right data, or because the person who knows won’t tell them?
Nota Bene • None of the “ELIIP could” proposals up to this point would require the overhead of a governing body or regional captains to vet individual data points (though they would still have a role in recommending and vetting aggregation sources). • That threshold is crossed if ELIIP chooses to: • Curate its own version of language situation data that it judges to be the most correct
7. Genetic classification • Same story as for “language situation” information • If the set of data providers is the same as for the situation information, then this could be included in the interoperation standard as a kind of situation information • If there is a different set of players, ELIIP could foster the same process to develop an interoperation standard for classification
Thought for the day • The aggregator lies at the sweet spot in the value chain of today’s web economy. • E.g. Google, Amazon, iTunes, Netflix • Cf. Chris Anderson, The Long Tail (2006)
Conclusion • There are many things that ELIIP could do: • To exploit the power of interoperation • For mobilizing our community to share information about endangered languages • While minimizing what it must centrally curate • The task for the ELIIP planners is to decide which of these things they wantto do