170 likes | 258 Views
Wikis, Standards and Everything. Foreword. Wikification and standards: is this the wrong talk? Wiki: Open + free interaction on-line ISO: Dusty documents imposing ways of thinking and working Still, reusability and preservation and data
E N D
Foreword • Wikification and standards: is this the wrong talk? • Wiki: Open + free interaction on-line • ISO: Dusty documents imposing ways of thinking and working • Still, reusability and preservation and data • Requires some minimal principles about data representation • Interoperability • And there are quite a few practical standards (e.g. ISO 10646) • Background (outline) • The demonstrators: OmegaWiki • The police: ISO (International standards association) • The topic at hand: language descriptions • Highly complementary to work done here at MPI-EVA (eWALS)
Wikis for Languages • Some possible motivations: • 50% of languages are endangered (UNESCO); • large proportion of languages have no “resources” and no web presence; • discontinuity and fragmentation of research; • sustainability and curation issues • And yet….. • Capability for capturing data like never before; • Expansion of capacity of the Internet and growing pressure for an inclusive multilingual internet; • OLPC programme; • Language experts and non-experts are prepared to contribute time and resources • So, how about a Wiki-based infrastructure that allows us to form communities around languages and harmonize results?
OmegaWiki, a collaborative project to produce a free, multilingual resource in every language, with lexicological, terminological and thesaurus information World Language Documentation Centre (WLDC), currently comprising 22 experts in language technologies, linguistics, terminology standardisation, and localisation ISO, provision of the ISO 639 series of standards; focus here on 639-4 and 639-6 Wikis for Languages
ISO 12620 ISO 11179 “standards as databases” ISO 639-4 ISO 639-X standard ISO 639-6 standard Expert review Community review & infrastructure Wikis for Languages Data categories Metadata registries Co-ordination SIL, LoC, Infoterm “Auditors” ISO 639-X data ISO 639-6 data
Language Documentation via ISO 639-4: association of metadata descriptors to model interoperable with DCIF (12620) (639-4 section 9) Wikis for Languages Attribution information missing here
Wikis for Languages • Eventual inclusion of all “available” metadata
ISO standards • Language Codes Standards are growing in number and complexity • From 2 to 6 • From 400 identifiers to upwards of 30000 • From lists to databases • From tables to metadata registries • From published text documents to “published” databases • From IETF RFC to RFCs to RFCs • From a closed membership committee to an open Community initiative (OmegaWiki) • …. with accompanying (web) services and products
ISO standards • Language Codes Standards are growing in number and complexity • From 2 to 6 – eventually back to 1? • From 400 identifiers to upwards of 30000 – plus supporting metadata • From lists to databases – multiple metadata registers • From tables to metadata registries – registers + policies + “auditors” • From published text documents to “published” databases – “SAD” • From IETF RFC to RFCs to RFCs – consume, consume, consume • From a closed membership committee to an open Community initiative (OmegaWiki) – supporting infrastructure, expert review of community contributions (e-Voting?) • …. with accompanying (web) services and products – Open Source and bespoke, and secured funding as necessary
Next steps • Data and models for wiki • Structured data in necessary in scientific domains • Registering descriptors and schemas is an essential component of long-term management of such data • New types of standards • Stabilisation of knowledge • Dynamic platforms for describing knowledge • Complementary to rocket science • Back to WALS • MPI EVA and MPDL => eWALS • Generic environment for managing and linking 639-4 compliant data • Connecting the whole thing…
Further Sources • Gillam, L. (2007) "A metadata infrastructure using ISO standards". We Have to Talk about Metadata Workshop at UK e-Science Programme All Hands Meeting 2007 (AHM 2007), Nottingham, 10-13 September. Accepted. • Gillam, L., Garside, D., Cox, C. (2007) "Developments in Language Codes standards". In Rehm, Witt and Lemnitzer (eds.): Datenstrukturen fur linguistische Ressourcen und ihre Anwendungen / Data Structures for Linguistic Resources and Applications. Proc.of GLDV 2007, 11-13 April 2007, Tubingen, Germany: Gunter Narr Verlag. • Gillam, L., Garside, D., Cox, C. (2006). "Information volumes and linguitic diversity: meeting the challenges for content management". 3rd International Conference on Terminology, Standardization and Technology Transfer, 25-26 August, Beijing, PRC.