120 likes | 225 Views
OAI Implementation Notes for LTRS, NACA and Open Video. Michael L. Nelson NASA Langley Research Center & University of North Carolina mln@ils.unc.edu http://www.ils.unc.edu/~mln/ OAI Open Meeting, Washington DC, January 23, 2001. Collections Represented. NASA
E N D
OAI Implementation Notes for LTRS, NACA and Open Video Michael L. Nelson NASA Langley Research Center & University of North Carolina mln@ils.unc.edu http://www.ils.unc.edu/~mln/ OAI Open Meeting, Washington DC, January 23, 2001
Collections Represented • NASA • LTRS (Langley Technical Report Server) • ~2300 reports, begun in 1992 • http://techreports.larc.nasa.gov/ltrs/ • OAI: http://techreports.larc.nasa.gov/ltrs/oai/ • NACA (National Advisory Committee for Aeronautics) • NACA was the predecessor organization to NASA, operating from 1917-1958 • ~6300 reports, begun in 1996 • http://naca.larc.nasa.gov/ • OAI: http://naca.larc.nasa.gov/oai/
Collections Represented • University of North Carolina • The Open Video Project • ~ 200 public domain video segments, project begun in 1998 • http://www.open-video.org/ • OAI: http://buckets.dsi.internet2.edu/openvideo/oai/ • Open Video contents and OAI services still strictly experimental
NASA: Why is OAI Important? • NASA builds DLs out of necessity, but ultimately NASA is a publisher • Interested in maximum exposure of and accessibility to its “unrestricted, unlimited” contents • In the NASA DLs, we left our “dark matter” partially exposed • individual reports were spidered by robots anyway… • OAI provides a more formal interface & protocol for exposing contents
UNC: Why is OAI Important? • goal is to grow Open Video into a TREC-like corpus for video segments to share with the research community • a standard collection of short (10 seconds – 1 hour) video segments on which to perform video content based retrieval • variability in video types: color/b&w, sound/no sound, high/low motion, etc. • currently in MPEG-1 • others formats in the future
OAI Implementation • Protocol only specifies CGI stub • many implementations possible • I used a “bucket” for each: LTRS, NACA & Open Video • buckets are aggregative, computational entities normally used for data storage • generally, 1 bucket per “report” • buckets = metadata + data + methods
OAI Bucket Structure Bucket index.cgi _method.pkg _http.pkg _log.pkg _tc.pkg oai source files for methods http dependency files terms and conditions oai.pl element is a support library that defines access for the specific DL logs _md.pkg _state.pkg metadata bucket state bucket payload is DL specific support library default bucket packages in addition to the ~ 30 bucket methods each OAI verb is implemented as a separate method
NACA OAI Implementation normal WWW use OAI requests NACA file system OAI responses built from examining structure of NACA filesystem OAI Server 1917 1918 . . . 1958 . . . . . . naca-tn-1 LTRS, NACA, Open Video have different file structures, metadata formats,etc. refer metadata thumbnail GIFs full size GIFs index.cgi
Implementation • Did not implement sets • possible set candidates: • NACA: years, report type • LTRS: NASA STI subject classification • Only supporting Dublin Core • DC not sufficient for targeted applications • Did not implement resumptionToken
if load > 0.05 redirect request http://blah/oai/?verb=ListIdentifiers OAI Server harvester HTTP Status Code 302 naca.larc.nasa.gov/oai/ http://blah/oai/?verb=ListIdentifiers <?xml version=“1.0” encoding=“UTF-8”?> … <ListIdentifiers> … </ListIdentifiers> OAI Server buckets.dsi.internet2.edu/naca/oai/ 302 Load Balancing • Interactive users on main DL machine should not be impacted by metadata harvesting • don’t take deliveries through the front door
Metadata Quality • XML is very brittle – 1 bad character in the metadata and an entire ListIdentifiers mesg can be damaged • yes, my DLs should be more diligent about scrubbing their metadata, but… • author contributed metadata particularly a problem (e.g. control characters from copy-n-paste) • one advantage of resumptionToken is that it compartmentalizes bad data
OAI Impact • Can use OAI to build our own generalized services • updates, alerts • Finally have a clean method to export metadata, both to: • the general community for unrestricted data • closed communities with restricted data • Los Alamos, Air Force Research Laboratory, NASA