180 likes | 620 Views
CrossRef. Mission: a non-profit membership association to enable easy identification and use of trustworthy electronic content, by promoting the cooperative development of a sustainable digital infrastructure. Services to date:Cross-publisher reference linkingCross-publisher cited-by linkingCross-publisher full-text searchCross-publisher metadata feeds to third parties.
E N D
1. CrossCheck: CrossRef’s “originality verification” pilot
2. CrossRef Mission: a non-profit membership association to enable easy identification and use of trustworthy electronic content, by promoting the cooperative development of a sustainable digital infrastructure Services to date:
Cross-publisher reference linking
Cross-publisher cited-by linking
Cross-publisher full-text search
Cross-publisher metadata feeds to third parties
3. Cross-publisher means… …an organizational structure that obviates the need for thousands of bilateral negotiations between publishers, or between a third-party and individual publishers
4. Why plagiarism detection? For the research community
Plagiarism appears to be on the rise
Increasingly difficult to identify trustworthy/original content
For publishers
Show that a robust cross-publisher PD system does not require open access (nor do FT search, mining, or linking)
A way for publishers to augment the editorial/selection process, add value, and protect copyrighted material
For CrossRef
Harnessing existing basis of publisher co-operation, CrossRef can achieve widespread participation
Shift in focus toward providing editorial tools
5. The PD process:
1) You have a manuscript that you want to check
2) You submit it to the PD system, which breaks the text up into small chunks and creates a “fingerprint” for the document.
3) It then searches a database of content to see if the fingerprints are shared by any of the documents already added to the database
4) The PD system then produces an “originality report” which shows the “percentage overlap” between documents. This report has to be *interpreted* by the person checking the document- and that is where the real work isThe PD process:
1) You have a manuscript that you want to check
2) You submit it to the PD system, which breaks the text up into small chunks and creates a “fingerprint” for the document.
3) It then searches a database of content to see if the fingerprints are shared by any of the documents already added to the database
4) The PD system then produces an “originality report” which shows the “percentage overlap” between documents. This report has to be *interpreted* by the person checking the document- and that is where the real work is
6. The PD system will allow you to compare the document you are checking with database documents that share the same fingerprint. So the submitted manuscript is on the left and the database “hit” is on the right. The similar passages are highlighted. The similarity might be due to:
0) An attempt at plagiarism
1) Incorrect citation
2) Acceptable or unacceptable “self-plagiarism”
3) Reverse plagiarism (e.g. somebody copied an early manuscript and got it published first)
4) Coincidence
5) Acceptable copying....For instance...The PD system will allow you to compare the document you are checking with database documents that share the same fingerprint. So the submitted manuscript is on the left and the database “hit” is on the right. The similar passages are highlighted. The similarity might be due to:
0) An attempt at plagiarism
1) Incorrect citation
2) Acceptable or unacceptable “self-plagiarism”
3) Reverse plagiarism (e.g. somebody copied an early manuscript and got it published first)
4) Coincidence
5) Acceptable copying....For instance...
7. Bibliographies might often include large percentages of “duplicated” content and this is perfectly legitimate. Similarly- a mathematics paper might contain up to two-thirds quoted material and only a few lines of “original material”- but this might be perfectly acceptable in the context of completing or extending a proof, etc. And speaking of “mathematics”- the user of the system also has to understand some of its limits...Bibliographies might often include large percentages of “duplicated” content and this is perfectly legitimate. Similarly- a mathematics paper might contain up to two-thirds quoted material and only a few lines of “original material”- but this might be perfectly acceptable in the context of completing or extending a proof, etc. And speaking of “mathematics”- the user of the system also has to understand some of its limits...
8. Limitations of PD technology (clockwise from top left). These are all “black holes” to the PD system.
Photographs
Graphs & tables
Formulae
But perhaps the most important limitation is the database...Limitations of PD technology (clockwise from top left). These are all “black holes” to the PD system.
Photographs
Graphs & tables
Formulae
But perhaps the most important limitation is the database...
9. Remember the database? This is the critical part of the equation. If it doesn’t hold relevant data, it isn’t going to be helpful to you.Remember the database? This is the critical part of the equation. If it doesn’t hold relevant data, it isn’t going to be helpful to you.
10. And this is where CrossRef can immediately helpAnd this is where CrossRef can immediately help
14. How publishers will participate Include their full-text in a PD service
Two core use cases:
Check submissions against cross-publisher PD database of published content
Check for duplicate submissions against a cross-publisher manuscript database
15. CrossRef’s role/timeline Phase 1
First help create robust database of pre-indexed content for use by multiple PD vendors
Establish acceptable terms and conditions for publisher participation, including recommended business model
Phase 2: Pilot checking submissions against published content database
Later: Explore checking for duplicate submissions & other uses, such as tracking copies post publication
16. Work-flow integration issues When to check documents?
prior to submission (author driven)
on submission
at some triage point (editor driven)
immediately prior to acceptance
after publication (check of back-files)
The human factor
anecdotal evidence that interpretation of originality report can take up to 45 minutes
Cost/benefit analysis
check earlier: less editorial investment, more documents; more deterrence?
check later: more editorial investment, fewer documents; less dir. cost
17. Explain three major stages.
Point out that “triage” step hides much complexity and variation between authors.
Cost of processing manuscript goes up as you go through the process
Number of manuscripts goes down as you go through the processExplain three major stages.
Point out that “triage” step hides much complexity and variation between authors.
Cost of processing manuscript goes up as you go through the process
Number of manuscripts goes down as you go through the process
18. Other issues Some business questions
What business models will balance uptake and sustainability?
What is risk of disruption in the PD market?
Some policy questions
How will inter-publisher access work?
How will system work for detecting duplicate submissions?
What are implications for author/publisher relationship?
How do publishers avoid accusations of discrimination in use of PD tools?
19. abrand@crossref.org