1 / 21

OAIster: What’s with the Weird Name?

OAIster: What’s with the Weird Name?. Kat Hagedorn UM Library Information Technology November 28, 2005. What is OAIster?. Is/was a means for UM to test the OAI protocol… (hence the name) A method for sharing metadata among institutions and groups of people

sue
Download Presentation

OAIster: What’s with the Weird Name?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

  2. What is OAIster? • Is/was a means for UM to test the OAI protocol… (hence the name) • A method for sharing metadata among institutions and groups of people • A means of developing a search service for end-users worldwide

  3. Basics of OAI

  4. What does OAIster collect? • Harvests all metadata from all OAI data providers (within reason) • Only keeps metadata that points to digital objects, e.g., articles, photographs, datasets, etc. in digitized form • All available via search service…

  5. Searching OAIster • Time to show off OAIster… • http://www.oaister.org/

  6. A little history • Service is now 3.5 years old • Started with 66 data providers and a little over 200K records • Now have 572 data providers and “a little” over 6 million records • 37% US, 63% international

  7. Visibility of OAI • Surprising who hasn’t made their metadata shareable through OAI • Harvard, Yale, Stanford…the big ones • Initially perplexing, but now clearer: • always done at the end • only recently thought of at initiation of projects • truthfully, many institutions not collaborative…

  8. Examples of data providers • Many data providers are huge, e.g., • arXiv: physics preprint and postprint articles • pubmed: medical articles, although restricted • pictureaustralia: images from govt and academic institutions in Australia • lcoa: Library of Congress digital archives • usc: U South California census data

  9. Examples of data providers • Most are small, though • Many around 100 records • Value of making their records available • increased visibility • inclusion in bigger search service than theirs • incorporation in Yahoo! Search

  10. Yahoo! Search • Two years ago, collaborated with team at Yahoo! Search to send our metadata to them for indexing • e.g., “gardens at albury” in Yahoo! Search • know it’s not static html roboting • <dc:relation>IspartOf Victorian Railways collection.</dc:relation> • Many, many more hits • Also send metadata to Google

  11. System design XSL stylesheets (per source type) UM harvester XSLT transformation tool OAI-enabled DC records Record storage Non-OAI-enabled DC records Search interface (XPAT) BibClass indexes

  12. Transformation of metadata • Most metadata needs to be brushed off • adding an http:// to the front of URLs • Or raked • removing instances of <![CDATA[ • Or wrung out • instead of “Where’s Waldo,” it’s “Where’s the incorrect UTF-8 character?” • And should be normalized…

  13. Why normalize? • Sample date values <date>2-12-01</date> <date>2002-01-01</date> <date>0000-00-00</date> <date>1822</date> <date>between 1827 and 1833</date> <date>18--?</date> <date>November 13, 1947</date> <date>SEP 1958</date> <date>235 bce</date> <date>Summer, 1948</date>

  14. Why use a CV? • Sample subject values <subject>30,51,52</subject> <subject>1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson].</subject> <subject>Slavery--United States--Controversial literature</subject> <subject>view of interior with John Henry sculpture</subject> <subject>Particles (Nuclear physics) -- Research.</subject>

  15. Best practices • Fixing more than half of the data providers is cumbersome • Individuals at OAI-enabled institutions started a “Best Practices” group to inform data providers what they ought to do • http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents

  16. 2nd phase OAI • “Best Practices” group sponsored by the Digital Library Federation, which also… • Sponsors our latest grant • Better and more easily calculated statistics • Search interface improvements • Clustering / classification techniques • Using richer metadata

  17. Clustering / classification • Using automated means to take a selection of metadata and determine “what it’s about” • Working with Emory University (one of our grant partners) to test their tool • Results will be integrated into search so can search in smaller group of OAIster records

  18. Using richer metadata • Data providers must use simple Dublin Core • Very sparse schema for describing objects • dc:title must contain main title, sorted title and alternative titles • dc:subject doesn’t distinguish between geographical, hierarchical, temporal…

  19. Using richer metadata • Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) from LOC • Developed testbed for grant deliverables • currently only shows MODS work… • http://www.hti.umich.edu/m/mods/

  20. Other stuff • Well, make it smaller somehow… • Clean up Boolean interface • squinch fields together • include more normalization • Make it available through federated search • Proselytize sharing metadata • Test, test, test

  21. Contact me • Kat Hagedorn • UM Library Information Technology • khage@umich.edu • www.oaister.org

More Related