The RePEc model for the academic digital library

The RePEc model for the academic digital library Thomas Krichel http://openlib.org/home/krichel work partly sponsored by the Joint Information Systems Committee through its Electronic Libraries Programme

1991 Contents: Warwick Working Paper acquisitions lists in CoREJ Technology: email lists Idea: distribute the acquisitions lists through email lists leads to the foundation of BibEc 1992 Contents: Public domain software for TeX, emacs, etc Technology: anonymous ftp Idea: make papers available on public archive that are accessible on the Internet leads to the foundation of WoPEc Early history of my interest

TheFoundation of NetEc NetEc is a group of Internet-based services that help scholarly communication in Economics. It was founded in February 1993 on a gopher server at Manchester Computing. On the WWW since 1994, mirrored in Japan and the United States since 1995. The initial services were BibEc and WoPEc.

The BibEc project 1993 to 1997 • Based mainly on acquisitions data for printed economics working papers from the Documentation Center of the Economics department at the University of Montreal. • Run on a volunteer basis by Thomas Krichel and Fethy Mili • Holdings go back to the late 1980s, around 40,000 items • data is converted to html and placed on a web server

The WoPEc project 1993 to 1997 • Central collection of bibliographic data on electronic working papers • Initially unpaid volunteer work by José Manuel Barrueco Cruz and Thomas Krichel • In 1996--1998 JISC funding allows José Manuel to work full time on the project • 5,000 papers in 1997

BibEc and WoPEc 1993 to 1997 • Data converted to a whois++/IAFA like format • static gopher/web pages updated periodically • whois++ server (powered by digger of bunyip.com) with web-based fielded queries using an in-house query script • WAIS index of the full-text pages • WoPEc-announce and BibEc-announce mailing lists

Closely Related efforts 1993--1997 • EconWPA • “manually” integrated into WoPEc since 1994 • Fed in Print • “manually” integrated into BibEc and WoPEc since 1994 • departmental archives eg, Humbolt Universität, University of California San Diego • DEGREE • S-WoPEc

Related efforts: Other NetEc projects • CodEc 1994-- • Collection of computer code by Dirk Eddelbüttel • WebEc 1994-- • Collection of WWW links to resources for economists, by Lauri Saarinen joined NetEc in 1995 • JokEc 1995-- • Collection of jokes about economists, by Pasi Kuoppomäki, joined NetEc in 1997

Projects associated with NetEc They are mirrored on the NetEc sites, but are not part of NetEc: • “Resources for Economists on the Internet” by William L. Goffe • “Economics Departments, Institutions and Research Centers” (EDIRC) by Christian Zimmermann

Projects sponsored by NetEc since 1997 • RePEc (1997--) • NEP (1998--) • HoPEc (founded 1997, reformed in1999, ongoing) I will come back to these activities later.

Summary 1997 • A plethora of services, • many live through centralized collection therefore not sustainable as the data mass increases, • most have specific user interfaces to their data, • many are mirrored.

Focus on the digital academic papers • BibEc and WoPEc were centralized collections of metadata about documents held at various archives and from various providers, they needed to decentralize. • In the early days of the projects, a distributed database approach was thought to be the way forward, for example using the whois++ protocol, or Dienst • an alternative approach would to collect all papers in one archive, the approach that works successfully for arXiv.org but unsuccessfully for EconWPA • Debate on centralized versus decentralized distribution

Bill Goffe’s vision 1995 “What I would suggest is this: a distributed system with any number of sites, each mirroring each other. […] archives could "join" the system (say it was written in perl so could run on NT as well as Unix). Then you'd have the best of both worlds […] Such a system could easily grow with the profession's use of the net. Such a system would GREATLY benefit the profession.” Bill suggested a system based on a system like usenet news.

Thefoundationof RePEc • Founding fathers: the BibEc and WoPEc projects, DEGREE, S-WoPEc • two initial drafts by Thomas Krichel were revised at a meeting in Guildford in May 1997 • ReDIF, a metadata format • The Guildford protocol, a convention how to store ReDIF on ftp or http servers

RePEcprinciple • Many archives • archives offer metadata about digital objects (mainly working papers) • One database • The data from all archives forms one single logical database despite the fact that it is held on different servers. • Many services • users can access the data through many interfaces. • providers of archives offer their data to all interfaces at the same time. This provides for an optimal distribution.

Many archives decentralize the collection of data… At the end of 1999, there are more than 100 archives. Some are based with leading institutions (e.g. NBER, CEPR, US Federal Reserve Banks, OECD) and many small institutions (e.g. University of Salerno). There is some data from commercial publishers (e.g. Springer Verlag). Example: The RePEc:tky archive ftp://ftp.e.u-tokyo.ac.jp/pub/RePEc/tky managed by cirje-dp@e.u-tokyo.ac.jp

…to form one dataset... • over 80,000 items in over 1,000 series, contains working paper, published paper, software, personal and institutional data • largest distributed free source about online scientific publications, over 18,000 electronic papers • data is encoded using the purpose-built ReDIF format • all archives follow a convention called the Guildford protocol on how to store ReDIF files and other data on their servers. Therefore the archives can be mirrored.

…used in many services. • BibEc and WoPEc • EDIRC • IDEAS • Decomate Z39.50 service • NEP: New Economics Papers • Inomics • RuPEc • HoPEc

The ReDIF metadata format • relational metadata links separately described elements Author-Name: Thomas Krichel Author-Handle: RePEc:per:1965-06-05:thomas_krichel Handle: RePEc:sur:surrec:9801 Name: Thomas Krichel Author-Paper: RePEc:sur:surrec:9801 Handle: RePEc:per:1965-06-05:thomas_krichel • shipped with syntax and relational control software

Personal and institutional data • Since October 1999 the HoPEc service allows persons to claim relationships between them and the resource data in RePEc. For example a person can say that she is the editor of a series. The HoPEc project associates handles with individuals. These handles could be useful in many other circumstances, for example conferences and scholarly society membership lists. • Many registering authors are able to give the EDIRC handle of their institutional affiliation

Areas not covered by RePEc • No statistical dataset information • No overall preservation strategy • No overall usage logs across all services; this would be difficult to do • No explicit peer-review services based on RePEc data; but that will change.

The RePEc vision • It is a collaborative effort of community wide knowledge sharing by discpline champions and librarians. • Once a critical mass of data and user services is reached outsiders face strong incentives to contribute. • The relational features allow to share the burden of cataloguing and reduce the cost of keeping the collection up-to-date. • RePEc promotes free exchange of data between academics. • It fights the division of the world in information-rich and information-poor.

My ongoing work • Introduce autonomous citation analysis for RePEc papers (funding decision pending) • Build new datasets that use the same collection principles • ReLIS for Library and Information Science • ReCMaP Computing, Mathematics & Physics • ReSoS for the broad social sciences • Devise a syntax-independent and object-oriented version of ReDIF

But I can not do all this while being a lecturer in Economics... Conclusion: Hire me!

The RePEc model for the academic digital library