330 likes | 454 Views
NARCIS, Integrating CRIS, OAI and Web Crawling Elly Dijk, Arjan Hogenaar and Marga van Meel Department of Research Information CRIS 2006 Bergen (Norway), 11-13 May 2005. Outline. KNAW Research Information NARCIS Background of the NARCIS project Content of NARCIS Advantages for the users
E N D
NARCIS, Integrating CRIS, OAI and Web Crawling • Elly Dijk, Arjan Hogenaar and Marga van Meel • Department of Research Information • CRIS 2006 • Bergen (Norway), 11-13 May 2005
Outline • KNAW Research Information • NARCIS • Background of the NARCIS project • Content of NARCIS • Advantages for the users • NARCIS techniques • End-users tests • Future plans
KNAW Research Information • National focal point research information • Dutch Research Database (NOD) – the national CRIS • Scientific communication (thematic databases, overview articles; e.g. about nanotechnology) • Research information system and repository of the Academy • Development of NARCIS…
NARCIS is a portal that combines: • Structured research information, about current research, researchers, and research institutes • Information from academic repositories, (full text) publications, and others research results • Information from websites of research institutes: datasets, digital publications, and news items • All these types of information are searchable at the same time.
Partners in the NARCIS project • Royal Netherlands Academy of Arts and Sciences (KNAW), department of Research Information • Netherlands Organisation for Scientific Research (NWO) • Association of Universities in the Netherlands (VSNU) • Information Centre of the Radboud University of Nijmegen, (RU-UCI) • Funded by the DARE programme
DARE: Digital Academic REpositories • Joint initiative by the Dutch universities, the KNAW, NWO and the National Library • DAREnet gives free access to academic research output in the Netherlands. • DAREnet contains now about 70.000 digital files from 16 institutes
Goals of the NARCIS project Giving an overview of research in the Netherlands Central place for searching all the different types of data Data collection via the already existing administrative systems of the participating institutes Registering of data once only Minimization of administrative report burden for researchers and institutes
Content of NARCIS Information on 400,000 items: Research institutes - profiles, addresses, programmes, projects Researchers - expertise, addresses, projects, publications Research activities - research programmes and projects (web) publications - metadata, full text Datasets - metadata News items - webpages of research institutes
Advantages for the users • The different types of research information are searchable at the same time (one-stop-shopping) • Free access to Dutch academic full text publications and other research results • Up-to-date information: data gathered in an early stage of the registration process • High quality information: editors select the sources of NARCIS • Overview of Dutch scientific output • Scientific information also from repositories and websites
NARCIS: part two • Technical background • User Surveys • Future Developments
Applied Techniques • METIS-NARCIS exchange schema • NWOdelfi-METIS interface • OAI-PMH • Web-crawling • Collexis categorising • RSS
METIS • Dutch Research Information System • Used by all Dutch universities • For both research management and information supply • Information on research groups, individual researchers, research output
Exchanging METIS info • XML-schema (CERIF-based) developed by KNAW, NWO and RU; Implemented in METIS • For service-provider accessible via URL • XML-export to service provider automatically generated • Data-provider no longer needs to create reports of new entered research projects in its METIS-system
Interface NWOdelfi-METIS • enables xml-data exchange between METIS and NWOdelfi (with a copy to NARCIS) • uses webservices with xml-SOAP
Advantages of the interface • minimization of administrative report burden: • NWO granted projects sent automatically to the university-METIS systems (with a CC to NARCIS) • In METIS-system entered research output data (bibliographic descriptions of publications) automatically sent to NWO (with a CC to NARCIS)
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) • Darenet is a good example • protocol for harvesting metadata descriptions (xml-documents) of full-text records in a repository
Web-crawling • used for harvesting of data not available via METIS or NWO or repositories • focused on non-university research institutes • J-spider based • spidering of news-items and of ‘web-publications’ (i.e. digital publications available on webpages of research institutes)
Categorising • indexing all the data retrievable via NARCIS via one terminology system • Dutch Research Database classification system is used • first attempt: via Lucene search engine • now: via Collexis software
METIS NWO Web-forms Categorizer NARCIS Webpages NOD Institutional Repositories
Notification tool: RSS feed • content RSS-feed normally news-provider based (see BBC RSS-feed) • In NARCIS: RSS-feed search action based • User decides which information he regularly will be notified on
Preliminary results • two user surveys conducted: • amongst repository specialists (library personnel) • amongst researchers, policy advisors and science education officers • goals of the survey: • impression and functionality of the NARCIS homepage • search and limit functionality test • imput for further improvement of the site
Outcomes user surveys :1 • NARCIS-webview appreciated • Retrieval of web publications is value adding service • Quality of research information is high • Broad range of search options and search performance
Outcomes user surveys : 2 • complexity of NARCIS-portal urges a need for extra clarification of the possibilities of the contents and search tools • limiting options too complicated • users see NARCIS as a new independent information system, instead as a shell around existing ones • presenting search results not only by relevance but also chronologically
Outcomes user surveys : 3 • presentation of information that has been spidered leads to some confusion
Future developments: technical • automatic categorisation of the contents via Collexis • working with Digital Identifiers (for authors, institutes and objects) in order to connect the Dutch CRIS systems to the repositories • including CV-information of Dutch research into NARCIS (PROMAS)
Future developments: organisational • installation of a NARCIS Advisory Board, formed by • KNAW • NWO • VSNU • plus: SURF • the Advisory Board will decide in which way NARCIS will be developed
Conclusions • national research information portal • 400,000 items • many different information types • one stop shopping • RSS-feed • new xml exhange schema
Poster presentations: during the breaks WWW.NARCIS.INFO