560 likes | 574 Views
Travis Brooks SPIRES Scientific Databases Manager Stanford Linear Accelerator Center Pat Kreitz Director, Technical Information Services Stanford Linear Accelerator Center Thanks to Ann Redfield, Michael Peskin, Louise Addis, Heath O’Connell, and Georgia Row for useful input.
E N D
Travis Brooks SPIRES Scientific Databases Manager Stanford Linear Accelerator Center Pat Kreitz Director, Technical Information Services Stanford Linear Accelerator Center Thanks to Ann Redfield, Michael Peskin, Louise Addis, Heath O’Connell, and Georgia Row for useful input. Breaking and remaking peer review with the SPIRES databases: Our Experience Travis Brooks-Trieste
Topics • Part I • History and current situation of SPIRES, arXiv, and Journals • Part II • Citation counting: our experiences and views • Part III • Speculation for the future Travis Brooks-Trieste
Part I Some history, some current data, and some guesses Travis Brooks-Trieste
What is SPIRES? • Bibliographic records for over half a million papers • Entire literature of High-Energy Physics (HEP) • Many papers from related fields • Citations for e-prints and journal articles • Over 25,000 searches a day • Main site and personnel at SLAC • DESY, FNAL, Durham U., Kyoto U, IHEP (Moscow) Travis Brooks-Trieste
arXiv • Since 1991: • Makes full-text available for download • Links to SPIRES citation lists • Allows revisions • Divides content into hep-th, hep-ph, hep-ex and many other categories Travis Brooks-Trieste
hep-th vs. hep-ex • Sharp distinction between Theory and experiment • Different from other disciplines • Difference between the publishing cultures of the HEP theorist and the HEP experimentalist Travis Brooks-Trieste
Experiment: Large Collaborations (>500 authors) Difficult to referee Reporting results Theory (my focus): Small collaborations (<10 authors) Self-contained papers Conversational hep-th and hep-ph similar th vs. ex Publishing Travis Brooks-Trieste
hep-th (Pr)eprints: A Timeline • Mid 1960’s preprints sent by authors to select groups • 1969 SLAC library began ppf (preprints in particles and fields) list • Created demand for distribution • Legitimized preprints/preprint libraries • Led to anti-ppf list Travis Brooks-Trieste
hep-th (Pr)eprints: A Timeline • 1974 SPIRES-HEP database indexed preprints • Allowed more general, worldwide, distribution and retrieval of preprint titles • Still needed papers by mail • Preprints used conversationally • On WWW in 1991 Travis Brooks-Trieste
hep-th (Pr)eprints: A Timeline • 1991 arXiv.org allowed immediate and universal electronic access to full-text of preprints • Preprints became eprints • Demise of all HEP journals predicted Travis Brooks-Trieste
Preprints not new… • arXiv is a logical extension of the movement towards preprints, not a “bolt from the blue” • Preprints have a long history of use • Preprints are more easily distributed today Travis Brooks-Trieste
History of hep-th arXiv • arXiv is busy • Over 90% of papers published in Phys. Rev. D after 1995 were submitted to arXiv • But authors still publish! • 75% of hep-th papers (prior to 2002) have been published Travis Brooks-Trieste
When are eprints published? • Difference between Phys. Rev. D publication time and eprint appearance time • 6,000 articles from June 1997-2003 • Mode at 5 months • 17 negative times not shown Travis Brooks-Trieste
When are they published? • What caused the negative times? • Are the large delays from “testing the waters?” • Do researchers wait for peer review to determine if an article is worth reading? Travis Brooks-Trieste
When are papers read? • Q:When does most citing occur? • A:Plot the citations a published hep-th article receives after its arXiv submission • 8000 published papers in sample • Includes citations from journal papers and arXiv papers (essentially the same set) Travis Brooks-Trieste
Eprints, not journals • Journal lag time 5 months • Citation peak occurs after eprint release, not journal release • Inference:HEP theorists don’t wait for the journal. Travis Brooks-Trieste
Current hep-th situation • Researchers read the arXiv to find out the latest scientific information • They base their work on what is in the arXiv • Scientific priority is given by arXiv time stamp, not journal submission date • They barely notice if it is published Travis Brooks-Trieste
HEP theorist’s viewpoint • arXiv is for immediate communication • A running scientific conversation • Overheard about a paper not sent to hep-ph: “He didn’t publish it, he just sent it toPhys. Rev. D” Travis Brooks-Trieste
Journals Irrelevant? • 75% of hep-th papers (prior to 2002) have been published • Correlation between large cite counts and publication • Journals are still very much alive Travis Brooks-Trieste
Why do authors publish?(4 guesses) • 1-Inertia • There is no other system as developed or as trusted • Journals are ingrained in researchers’ psyches • But journals don’t appear to be going away (quickly) Travis Brooks-Trieste
Why do authors publish? • 2-Feedback • Refereeing is useful for this paper and the next • The paper is already on arXiv while it is being refereed • But arXiv submissions generate comments and revisions as well Travis Brooks-Trieste
Why do authors publish? • 3-Professional Advancement • Do tenured/secure faculty publish fewer of their eprints? • Anecdotally: Witten seven 50+ cited papers as eprints only • In general: interesting question to think about… • If professional advancement is the sole purpose of peer-review, could we not do better? • Are we using the peer review process as a substitute for performance evaluation? Travis Brooks-Trieste
Why do authors publish? • 4-Archival value • Do authors believe that arXiv is a good archive? • Will arXiv only eprints still be around (readable, accessible) in 100 years? • Perception, not reality, matters here • E-only journals appear no different • Centralization, not media, should be the concern Travis Brooks-Trieste
Part II Cite counts and the future Travis Brooks-Trieste
Cite Counting • Cite counts present a data-driven picture of the hep-th eprint culture • Much work already (by many here today) • Cites to HEP eprints from journal articles are high and rising (Brown 2001, Youngen 1998, others) • arXiv impact factor is similar to journals (Fabbrichesi and Montolli, 2001) • Many other studies (often using SPIRES-HEP data) Travis Brooks-Trieste
Cite Counting • Cite counting for bibliometric purposes seems reasonable (perhaps) • Cite counting for peer review purposes? • Services like SPIRES (free) and ISI (fee) make cite counts available to other researchers, hiring committees, and tenure review boards. Travis Brooks-Trieste
Cite Counts = Peer Review? • Are citations the electronic answer to refereed journals? • Currently the only answer • Only one widely available • But not a very good answer • arXiv + SPIRES cite counts are not Phys. Rev. Lett. Travis Brooks-Trieste
Cites: Pros and Cons • SPIRES has been making citations available for over 25 years • We have noticed a few things about the process • Some good • Some bad • Some merely interesting Travis Brooks-Trieste
Advantages-Dynamic • Cite counts change with the field • Classics • New papers • Newly discovered classics • Ex:Weinberg’s Standard Model paper • Few cites initially • Over 5,000 now • Ex:M. Peskin’s topcite reviews Travis Brooks-Trieste
Advantage-Fast • Cite counts begin immediately after appearance • Electronic publishing means peer review is the lag time • Lag time makes journals archivists rather than communicators • Led to the replacement of this function by arXiv/SPIRES/etc. Travis Brooks-Trieste
Advantage-Easy • SPIRES tracks citations with 4 staff members • Total staff is about 8 • We are not that technically sophisticated • We are not even especially clever! • Still it is non-trivial Travis Brooks-Trieste
Disadvantage-Accuracy • Speed, ease rely on electronic processing • Accuracy or speed? • Reference lists in a paper change over an article’s life • What counts as a cite? • Which version of the paper? Travis Brooks-Trieste
Disadvantage-Relevance • Theory:Citations are a measure of what scientists read • But Does Citing = Reading ? • Simkin & Roychowdhury (cond-mat/0212043 and cond-mat/0305150) • Students, general public Travis Brooks-Trieste
Disadvantage-Relevance • Theory:Cites are a mark of quality • What about brilliant papers out of the mainstream? • Are papers really even referenced for scientific reasons? • Or are they referenced for sociologic reasons? • Or are references simply copied? Travis Brooks-Trieste
Disadvantage-Relevance • Tongue-in-cheek reasons for not citing prior work (humorous, but not far off…) • “If it’s old, foreign—or—old and foreign” • “They don’t cite us either” • “Rain forest preservation through paper-saving” • “I figured if you’re smart enough to read this paper, you already knew that!” from The Scientist Travis Brooks-Trieste
Interesting-Importance • People take it seriously • Funding, careers, reputations, etc. are perceived to depend in some way on SPIRES citation data Travis Brooks-Trieste
Interesting-Importance • We receive ~50 emails a day, most of them revolving around incorrect, incomplete, or missing references • Usually from an author whose paper was cited but missed • Often marked “URGENT” • Occasionally with panicked explanations including the date that the review committee is meeting • Sometimes accusing SPIRES of sabotage, or otherwise expressing outrage at a missed citation Travis Brooks-Trieste
Importance is helpful… • Importance shows that cite counting is useful (or at least used!) • Users of the information are motivated to help maintain it • SPIRES is almost open source • We help eliminate authors’ typos, they help eliminate our errors Travis Brooks-Trieste
…helpful… • SPIRES can replace bad cites with the correct ones • Corrects our errors • Corrects author errors • Even helps limit propagation of errors • Ex: a Witten article with 1,300 cites had 100 incorrect cites, all the same typo Travis Brooks-Trieste
…but also worrisome • Responsibility lies with the maintainers of the citation counts • Previously in the hands of referees and editors • Self-citation • Boost counts artificially • Deception • We have had it happen Travis Brooks-Trieste
Citation Counts: Summary • We do it, and it works • Fast, Easy, and Fluid • Valued by the Community • It is more than imperfect • Relevance and Accuracy • Does not yet replace traditional peer review Travis Brooks-Trieste
Part III What would it take to truly change peer review? Travis Brooks-Trieste
To change peer review • Stakeholders in the peer review system • Editors • Referees • Authors • Readers • Fundamental differences between disciplines • hep-th and hep-ex are different in their adoption of eprints Travis Brooks-Trieste
To change peer review • Functions of peer review when divorced from communication • One must replace (or discard) all of these • Metrics for papers • Metrics for scientists • Metrics for truth? Travis Brooks-Trieste
Peer review = “good science” ? • Peer review gives a seal of approval • Laypeople • Medicine, Environmental Science, etc. • Refereeing process is filled with examples of weakness • Yet it feels fundamentally sound • Publishers have taken this role of “vetting” science Travis Brooks-Trieste
Truth is more complex • Community acceptance determines scientific truth • “Yesterday’s sensation, today’s calibration” • The “test of time” is longer than the 6 month lag time for journal articles • Immediacy is needed for communication and conversation • But deliberation is needed for context and community judgment Travis Brooks-Trieste
An Opportunity • Place an article in the context of the surrounding work • Reference linking only a baby step • Degree to which a finding has been verified or contradicted by earlier or later work • Ex: M. Peskin’s Topcites reviews at SLAC • The numbers are amusing • Context is the real value Travis Brooks-Trieste
Context • Another Example: Particle Data Group • Reports data from all HEP experiments • Sorts and combines data • References to comments on validity • References to interpretations of the data Travis Brooks-Trieste
PDG Example Travis Brooks-Trieste
Opportunities • Intense scrutiny not possible for journals • Context is important • Amazon and google • Personalized and dynamic • Citebase • Torii Travis Brooks-Trieste