180 likes | 260 Views
. From DOBES to CLARIN and beyond. Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen. ?. . FACTS AND FIGURES. Non-profit-making foundation established unter private law based in Hanover
E N D
From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?
FACTS AND FIGURES • Non-profit-making foundation established unter private law • based in Hanover • Not affiliated with the car manufacturer of the same name • Founded by the Governments of the Federal Republic of • Germany and the State of Lower Saxony in 1961 • Objective: to support science and technology as well as the • humanities and the social sciences in research and • university teaching • Assets: about 2.45 billion euros • Funding p.a.: about 110 million euros • One of the most potent private research funding • foundations in Europe
FOCUS ON HUMANITIES AND SOCIAL SCIENCES • Current funding initiatives • (see KURZINFORMATION / BASIC INFORMATION): • about 45 to 50 % of the funds given to H&SC • Initiatives focussing on infrastructural support of H&SC: • Kulturwissenschaftliche Dokumentation (closed) • Archive als Fundus der Forschung (closed) • DOBES: Dokumentation bedrohter Sprachen • Projects including infrastructural support of H&SC • Strategy building on digitization of endangered books • Digitization of the so-called “Aschebücher” of the HAAB Weimar (in preparation)
"E-HUMANITIES": POSSIBILITIES AND PERSPECTIVES • Strong interest in innovative approaches • Funds available for projects involving activities towards • "E-Humanities" (e.g.: digitization of data, collections, • archival material) within current funding initiatives • Funding possibilities for meetings, workshops, • conferences etc. focussing on "E-Humanities" (within • the funding initiative Symposia and Summer Schools) • New perspectives on "E-Humanities" (possibly) opened up • within a new funding initiative aiming at Research in • Museums (actually in planning) including to a certain • extent digitization activities - … and not to forget the • Flagship "DOBES" ...
Concrete steps or Babylonian Tower • we don’t know exactly what eHumanities means • we feel that mechanisms in research processes are changing • rapidly with technological innovation as motor • but we can’t say: “we are now going to design eHumanities” • we probably can say: “let’s plan further concrete projects • and actions and see” • many excellent projects around – let me just refer to the good sides • of DOBES as one of these steps • (Documentation of Endangered Languages funded by VolkswagenFoundation)
What is DOBES? 44 DOBES teams working fully distributed and self-organized incl. linguists, anthropologists, musicologists, ethno-biologists, etc. In addition, VWF installed a central archive Start in 2000
What changed in DOBES? • handing over all data after a limited time to an archive was completely new • and is an explicit step, although the results will not be ready • there is a push to make data accessible to others from the beginning - also • new for many and not without conflicts • asking researchers to categorize and organize material according to • agreed metadata was also new and still requires evangelization • including multimedia in the documentation and dealing with audio/video as • basis was kind of new and requires techno-knowledge
Which infrastructure by DOBES? • a stable, reliable and open repository/archiving system handling 30 TB • data storage not encapsulated and in open formats • introduction of persistent identifiers to ensure investments in relating • fragments • a network of 12 centers worldwide included in data distribution • of these 6 copies in centers with hardware migration strategy • a number of web-based applications offering various ways to access the data
CLARIN/D-SPIN Challenges • eResearch is about global collaboration in key areas of science and the next generation of infrastructure that will enable it (J. Taylor) • goal is an open research infrastructure to overcome the huge • fragmentation of language resources and tools and to offer them to • research communities - in particular to humanities • help tackling the LARGE challenges (multilingual societies) • but also helping the individual researcher • example: align a transcription and an audio signal • how many researchers know about how to do this • see CLARIN/D-SPIN as a huge virtual marketplace of resources • and tools that can be combined due to integration and • interoperability solutions • not forget Henry Thompsons (one of the XML fathers) • don't have an agreed descriptive system in our domain
CLARIN/D-SPIN Research Infrastructure • basis of big supermarket are classification and • convincing organization principles • based on 10 years of experience we know that only • a flexible component model will be accepted • seem to go towards a Federation of LRT producers • that can make contracts with Identity Federations • just one signature necessary to get all researchers • with their home identity integrated • have already setup a first small test federation (EC-DAM-LR) • researchers dream: virtual collection building and creating • workflows flexibly - not trivial due to import/export aspects • LREC showed that we know already a lot about the problem
CLARIN/D-SPIN Network of Service Centers • need a network of strong and persistent • centers of "new" type • researchers will only adapt if they can rely • on new mechanisms • need to simplify the IPR/license situation
towards eHumanities • CLARIN has > 100 members from 32 countries • in Germany 9 well-known centers and some more will join • is an enormous challenge to make a real step ahead in CLARIN • can we all together extend to eHumanities infrastructure or are we • already close to collapse?
a few questions I • will there be a separate infrastructure for each H discipline? • NO • there will be several shared services such as a PID registration and • resolution service • however: • building a joint infrastructure has to do with community building, • trust, common language etc • too big communities would not work • so let's move on in TextGrid, DARIAH, CLARIN etc • but let's have a close and fair contact to find synergies • competition will become heavy and our competitors are the Googles • of the world!
a few questions II • will there be a single market place for the humanities? • NO • acceptance of a market place is dependent on classification and • organization principles - as already said • these are different in all disciplines • so have to start from the disciplines in our solutions • already difficult enough • leave it to Semantic Web guys to enable cross-walk
a few questions III • who will be the main players? • of course the big libraries, archives and museums • but what about the universities and big organizations such as MPG • important: • we see new requirement profiles emerging • kind of job sharing can be predicted • of course: close collaboration with innovative libraries such as • SUB etc is required highly specialized groups highly specialized MPI departments content centers a number of domain MPIs curation centers MPDL + few domain MPIs computer centers RZG, GWDG
a few questions IV • key bricks for interoperability? • we need open registries of all sort and smart registry frameworks • schema registries • concept registries (ISOcat - a creation of ISO TC37/SC4) • relation registries • etc • however: • a very complex landscape seems to emerge • how to make it usable by laymen? • how to convince researchers to work with them? • no one knows yet - we need to try out - what else?
Summary • we need initiatives again and again to stepwise advance the borders • it is now also time to transform existing knowledge into persistent • infrastructures • will need a lot of sensitivity and patience - RI building costs time • emerging landscapes will have an underlying complexity • need to offer discipline vocabulary • need to hide complexity to a certain extent • need to offer persistency • Project solutions are not per se useful as infrastructure solutions!
End in Germany we have already a good mixture with TextGrid, DOBES, eAqua, DARIAH and CLARIN/D-SPIN have to get together frequently Thanks for the attention.