90 likes | 239 Views
Harvesting Data. Mark Doyle APS AAHEP 7 – April 2, 2014. OAI-PMH. + Widely used + Relatively easy to implement (server and client) + Allows consumer to easily keep up-to-date or re-harvest as necessary + Self-identifying (available formats, etc.) - Responses require XML
E N D
Harvesting Data Mark Doyle APS AAHEP 7 – April 2, 2014
OAI-PMH • + Widely used • + Relatively easy to implement (server and client) • + Allows consumer to easily keep up-to-date or re-harvest as necessary • + Self-identifying (available formats, etc.) • - Responses require XML • Have to embed metadata XML into response • - Non-XML data difficult • DIDL (complex!) or URLs in responses
RESTful API • + Simple – just HTTP requests • + Data can be in any format (JSON) • + Pagination based on Link: HTTP header (borrowed from GitHub API) • + Return data in a zip file (BagItformat) • Includes manifest with checksums • - No real data model for file types/names
curl 'http://harvest.aps.org/content/journals/articles?from=2014-02-20&until=2014-02-28’ [ … {"doi":"10.1103/PhysRevB.88.235414", "metadata_last_modified_at":"2014-02-24T19:00:00-0500", "last_modified_at":"2013-12-11T10:24:59-0500", "bagit_urls":{ "complete":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/complete", "apsxml":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/apsxml", "adsfulltext":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/adsfulltext", "pdfxml":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/pdfxml"} … ]
curl 'http://harvest.aps.org/bagit/articles/10.1103/PhysRevLett.106.014301/apsxml' >! PhysRevLett.106.014301.zip unzip -l PhysRevLett.106.014301.zip Archive: PhysRevLett.106.014301.zip Length Date Time Name -------- ---- ---- ---- 74 03-13-12 08:11 manifest-md5.txt 82 03-13-12 08:11 manifest-sha1.txt 64 03-13-12 08:11 bag-info.txt 55 03-13-12 08:11 bagit.txt 0 03-13-12 08:11 data/ 0 03-13-12 08:11 data/PhysRevLett.106.014301/ 60948 03-13-12 08:11 data/PhysRevLett.106.014301/fulltext.xml
{ "identifier":[ { "type":"doi", "id":"10.1103/PhysRevD.89.042001" } ], "link":[ { "url":"http://link.aps.org/doi/10.1103/PhysRevD.89.042001" } ], "type":"article", "title":"Darkmatterconstraintsfrom observations of 25 MilkyWay satellite galaxies with the Fermi Large Area Telescope", "journal":{ "id":"PRD", "name":"PhysicalReview D", "shortcode":"Phys. Rev. D" }, "volume":"89", "issue":"4", "pages":"042001",
"author":[ { "collaboration":"Fermi-LAT Collaboration" }, { "name":"M. Ackermann", "firstname":"M.", "lastname":"Ackermann", "affiliations":[ "a1" ] }, { "name":"A. Albert", "firstname":"A.", "lastname":"Albert", "affiliations":[ "a2" ] },
"affiliation":[ { "id":"a1", "name":"DeutschesElektronen Synchrotron DESY, D-15738 Zeuthen, Germany" }, { "id":"a2", "name":"W. W. Hansen Experimental Physics Laboratory, Kavli Institute for Particle Astrophysics and Cosmology, Department of Physics and SLAC National Accelerator Laboratory, Stanford University, Stanford, California 94305, USA" },
CrossRef TDM (née Prospect) • Support for authenticated text data mining • Via tokens • Click through licenses • Rate limiting API • Example: Researcher with ORCID at subscribing institution