480 likes | 495 Views
Transforming scholarly communities with open libraries. Thomas Krichel 2005-03-03. about this talk. The talk is primarily from the “needle box” from the soap box a bit disorganized Do not hesitate to interrupt me! Three parts normative theory RePEc history rclis future ideas.
E N D
Transforming scholarly communities with open libraries Thomas Krichel 2005-03-03
about this talk • The talk is primarily • from the “needle box” • from the soap box • a bit disorganized • Do not hesitate to interrupt me! • Three parts • normative theory • RePEc history • rclis future ideas
scholarly communication • is mainly about scholars communicating • between themselves • to students, occasionally • thus it is essentially a community activity • traditionally, there have been two intermediaries acting as external agents. • libraries • publishers
when tradition ends • Two external shock • There comes the Internet and reduces distribution costs to zero • There comes computer technology and reduces storage costs somewhat • “opportunity sets” of community members and external agents increases • Proposition: the future depends much on what the community members decide. External agents have little impact.
discipline communities • Scholars of various disciplines have varying habits of research, publication, and evaluation • It is likely that the Internet will emphasize those differences rather than reducing them.
examples: disciplines with established informal publishing • Preprint communities • Physics arxiv.org • Mathematics arxiv.org, partially • Working paper communities • Computer Science CiteSeer (working paper disappearing) • Economics RePEc
change is tough • Change has to come inside the discipline. • There has to come a pioneering individual who • is technically well versed • is managerially smart • has extraordinary forward thinking • is willing to take considerable risk with her career • Ginsparg, Krichel, Giles & Lawrence are rare
and what about libraries? • Libraries do it systematically wrong • concentrate on access • concentrate on readers • concentrate on documents • They need to • move from access to impact • move from the reader to the writer • move from documents to people
example: the institutional repository • The name as attractive as a “prison toilet” • They have been set up in many universities but remaining empty • They imply a top-down, Stalin-style centralization • They are resisted as any interference with departmental affairs by administration • They set up for general purposes, and ends up pleasing nobody.
despite that: minimal communality • Every discipline has some form of more informal communication. Many times they are conferences. • Every discipline needs some formal evaluation • peer-review • overall personal review • This can not be done by computer and needs human input.
RePEc • RePEc is a freely available digital library related to Economics. • It does provide for a partial evaluative database. • It is entirely run by a virtual organization of volunteer. • I am the person who got it starting.
history • It started with me as a research assistant an in the Economics Department of Loughborough University of Technology in 1990. • A predecessor of the Internet allowed me to download free software without effort. • But academic papers had to be gathered in a painful way.
CoREJ • It was (is?) published by HMSO • Photocopied lists of contents tables recently published economics journal received at the Department of Trade and Industry • Typed list of the recently received working papers received by the University of Warwick library • The latter was the more interesting.
working papers • Early accounts of research findings • Published by economics departments • in universities • in research centers • in some government offices • in multinational administrations • Disseminated through exchange agreements • Important because of 4 year publishing delay
1991-1992 • I planned to circulate the Warwick working paper list over listserv lists • I argued it would be good for them • increase incentives to contribute • increase revenue for ILL • After many trials, Warwick refused. • During the end of that time, I was offered a lectureship, and decided to get working on my own collection.
1993: BibEc and WoPEc • Fethy Mili of Université de Montréal had a good collection of papers and gave me his data. • I put his bibliographic data on a gopher and called the service "BibEc" • I also gathered the first ever online electronic working papers on a gopher and called the service "WoPEc".
NetEc consortium • BibEc printed papers • WoPEc electronic papers • CodEc software • WebEc web resource listings • JokEc jokes • HoPEc a lot of Ec!
WoPEc to RePEc • WoPEc was a catalog record collection • WoPEc remained largest web access point • But getting contributions was tough. • Early large contributors were the libraries of the Federal Reserve system, and the Center for Economic Policy Research (but no full text).
creation of RePEc • It came about when I finally got one other partner, the Dutch DEGREE project, a library-lead consortium for working paper publication. • I also had a contact in Sweden called Sune Karlsson for whom I was instrumental in securing funding for a Swedish version of WoPEc called S-WoPEc. • I put together a protocol that would allow us to work together.
1997: RePEcprinciple • Many archives • archives offer metadata about digital objects (mainly working papers) • One database • The data from all archives forms one single logical database despite the fact that it is held on different servers. • Many services • users can access the data through many interfaces. • providers of archives offer their data to all interfaces at the same time. This provides for an optimal distribution.
WoPEc EconWPA DEGREE S-WoPEc NBER CEPR US Fed in Print IMF OECD MIT University of Surrey CO PAH RePEc is based on 440+ archives
to form a 300+k item dataset 146,000 working papers 154,000 journal articles 1,600 software components 900 book and chapter listings 6,400 author contact and publication listings 8,400 institutional contact listings
EconPapers NEP: New Economics Papers Inomics RePEc author service Z39.50 service by the DEGREE partners IDEAS RuPEc EDIRC LogEc CitEc RePEc is used in many services My concern is NEP, a human mediated current awareness service for RePEc. This could be the subject of a more academic talk…
… describes documents Template-Type: ReDIF-Paper 1.0 Title: Dynamic Aspect of Growth and Fiscal Policy Author-Name: Thomas Krichel Author-Person: RePEc:per:1965-06-05:thomas_krichel Author-Email: T.Krichel@surrey.ac.uk Author-Name: Paul Levine Author-Email: P.Levine@surrey.ac.uk Author-WorkPlace-Name: University of Surrey Classification-JEL: C61; E21; E23; E62; O41 File-URL: ftp://www.econ.surrey.ac.uk/ pub/RePEc/sur/surrec/surrec9601.pdf File-Format: application/pdf Creation-Date: 199603 Revision-Date: 199711 Handle: RePEc:sur:surrec:9601
… describes persons (RAS) template-type: ReDIF-Person 1.0 name-full: MANKIW, N. GREGORY name-last: MANKIW name-first: N. GREGORY handle: RePEc:per:1984-06-16:N__GREGORY_MANKIW email: ngmankiw@harvard.edu homepage:http://post.economics.harvard.edu/faculty/ mankiw/mankiw.html workplace-institution: RePEc:edi:deharus workplace-institution: RePEc:edi:nberrus Author-Article: RePEc:aea:aecrev:v:76:y:1986:i:4:p:676-91 Author-Article: RePEc:aea:aecrev:v:77:y:1987:i:3:p:358-74 Author-Article: RePEc:aea:aecrev:v:78:y:1988:i:2:p:173-77 ….
… describes institutions Template-Type: ReDIF-Institution 1.0 Primary-Name: University of Surrey Primary-Location: Guildford Secondary-Name: Department of Economics Secondary-Phone: (01483) 259380 Secondary-Email: economics@surrey.ac.uk Secondary-Fax: (01483) 259548 Secondary-Postal: Guildford, Surrey GU2 5XH Secondary-Homepage: http://www.econ.surrey.ac.uk/ Handle: RePEc:edi:desuruk
institutional registration • This works through a system called EDIRC. • Christian Zimmermann started it as a list of departments that have a web site. • I persuaded him that his data would be more widely used if integrated into the RePEc database. • Now he is a crucial RePEc leader.
author registration • It started when funding allowed us to hire a student programmer to write an author registration system. • The system went online as "HoPEc" in late 2000. • It has been renamed "RePEc author service" (RAS) • In 2002 grant from OSI allows for a rewrite and expansion.
RePEc author service • RePEc document data has author names as strings. • The authors register with RAS to list contact details and identify the papers they wrote. • This is classic access control, but done by the authors. • Currently one in three items in RePEc has at least one identified author
LogEc • It is a service by Sune Karlsson that tracks usage of items in the RePEc database • abstract views • downloads • There is mail that is sent by Christian Zimmermann to • archive maintainers • RAS registrants that contains a monthly usage summary.
authors' incentives • Authors perceive the registration as a way to achieve common advertising for their papers. • Author records are used to aggregate usage logs across RePEc user services for all papers of an author. • Stimulates a "I am bigger than you are" mentality. Size matters!
recently • In 2004, Peter Jasco compared RePEc services with the EconLit proprietary professional database. • IDEAS and LogEc were Peter’s pick • EconLit was Peter’s pan. • He slammed the working paper coverage of EconLit. • He could have slammed other things.
RePEc / EconLit partnership • RePEc now delivers all its working paper data to EconLit, without getting the journal data of EconLit in return. • This may seem absolutely perverse! A bunch of volunteers laboring for a multi-million $$$ concern! • In fact it serves RePEc well because it adds officialdom.
summary: keys to success • Have a small group of volunteers • Disseminate as widely as possible • Demonstrate to authors and institutions that it works for them. • institutional registration • author registration
rclis • rclis stands for Research in Computing and Library and Information Science. • It is pronounced as “reckless”. • It is a RePEc clone. • My attempt to show that the same ideas that propel RePEc also can work in that area.
technical innovation • RePEc is built on attribute: value templates. • rclis is built on a purpose built format called the Academic Metadata Format. • I set up this format. It is tailor-made to suit the needs of rclis and RePEc. • There is some usage of AMF in RePEc • RePEc OAI interface • ernad, the software feeding NEP
E-LIS • It is the largest LIS eprint archive on this planet. • It lives at http://eprints.rclis.org. • It contains over 2000 papers. • It runs in Italy but uses a system of national editors to feed in material. • I am one of the US editors. Shame on me.
DoIS • DoIS is a service based on a Spanish LIS bibliography. • It used to run at Manchester computing but moved to http://wotan.liu.edu/dois when, because of JISC regulations, we had to move from there. • It contains 13k records, 9k with free full text, but the data has many errors.
using already existing resources • There is already a very large computer science bibliography called DBLP, see http://dblp.uni-trier.de • The data has no abstracts. It has some full-text links, mainly to toll-gated sites. • I have done work to convert parts of it to AMF. • I am now searching if free full text versions of the papers exist anywhere on the Web. This is the Konz project.
the Konz project • Current state • I use Google API to search of titles. • I examine responses and download pages. • I scan the pages for PDF and Word files. • I examine the text in the file to find the title. • Limitations • pdf and word full text • conference paper data still being processed • significant hardware and disk problems.
Khabarovsk proposal • There is a generic possibility of building full-text links out of bibliographic records using search engines. • The authoritative bibliographic record can be used as a container to hold other objects that have a relationship to the “paper” • full-text instance • display page • comment • cv of author etc… • See http://openlib.org/home/krichel/proposals/ khabarovsk.pdf
DoCIS • Konz currently finds 25k papers with free versions out of the paper out of a 98k searched. Not particularly exiting. • This data is integrated with DBLP AMF data and the result forms a new service called DoCIS. • DoCIS lives at http://wotan.liu.edu/docis
DoCIS service • DoCIS is implemented in mod_perl with swish++ and therefore very fast. • The web pages are written by XSLT scripts directly from the AMF data. • The service is available to copy from the web, I am more than happy to run it on other sites. • But the most interesting thing are the service principles.
construction transparency • DoCIS is an open digital library service because it allows users to inspect exactly how the service runs • DoCIS is built using open source software. • There is a special interface http://wotan.liu.edu/strip/docis/ that allows to see almost all internal file. Non visible files are specially documented. • The hope is that it may be used for teaching purposes.
transportability • Everything in DoCIS is built is such a way that it should be easy to move the service somewhere else and establish copies. • The ideas may not make a lot of technical sense but it should increase to non-proprietary nature of the system. • Note that this has not been tested ;--)
usage transparency • All usage is logged and the logs are made public. • This it is hoped that it could be used for digital library research. • Ways will be found to aggregate usage on different physical installations.
to do list • finish a version of konz that recognizes HTML full text • integrate DoCIS and DoIS • finish conversion of DBLP to AMF • open institutional registration for rclis • open author registration for rclis • open a NEP-like service for rclis
collaboration is welcome! http://openlib.org/home/krichel Thank you for your attention!