740 likes | 951 Views
Building Digital Libraries Made Easy: Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002. Edward A. Fox (with Hussein Suleman, Ming Luo) fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL …
E N D
Building Digital Libraries Made Easy:Toward Open Digital Libraries ICADL 2002 – Singapore – Dec. 2002 Edward A. Fox (with Hussein Suleman, Ming Luo) fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL … Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected) • Sponsors: ACM, Adobe, DLF, IBM, Mellon Foundation, Microsoft, NSF (Grants CDA-9312611; DUE-0121741, 0136690, 0121679; IIS-0080748, 0086227, 0002935, and 9986089), OCLC, SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, … • Faculty/Staff (now): Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy Frumkin, Joe Futrelle, Lee Giles, Martin Halbert, Rex Hartson, John Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Muhammad Zubair, … • Students: Fernando Das Neves, Marcos Goncalves, Rohit Kelapure, Aaron Krowne, Paul Mather, Ryan Richardson, Priya Shivakumar, Wensi Xi, Liang Xu, Baoping Zhang, …
Outline • Overview, Problem • Experience: Case Study Projects • Open Archives Initiative • Hussein Suleman Dissertation • DL in a Box, OCKHAM • Summary and Conclusion
Overview We • address the problem of how to develop DLs; • build on experience in building many DLs; • strive for simplicity as per OCKHAM initiative; • build upon the Open Archives Initiative; • demonstrate our approach in diverse situations; • and invite all to • use DL-in-a-box and • help build Open Digital Libraries.
Problem Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are: • The library budget won’t allow purchase of a commercial DL system. • Unless the development effort is local, there won’t be any control. • DLs are extensions of DBMSs, so they are simple applications to develop. • Since DLs operate on the Web, one must adopt the newest W3C proposal.
Problem – cont’d • Since technology moves so quickly, it is essential to follow the latest fad. • CS students always develop from scratch. • This team knows it can do it better. • This system must have more capabilities than any other system. • This DL has to be more flexible and extensible. • This is the right system architecture – at last!
Outline • Overview, Problem • Experience: Case Study Projects • Open Archives Initiative • Hussein Suleman Dissertation • DL in a Box, OCKHAM • Summary and Conclusion
Experience: Case Study Projects • AmericanSouth.org • NDLTD • CSTC • JERIC • CITIDEL • NSDL • Digital Library in a Box
AmericanSouth.org • Domain: culture and history of the southern region of America (USA) • Genre: diverse distributed collections at a dozen universities • Submission & Collection: local sites Emory University (for SOLINET)
Networked Digital Library of Theses and Dissertations (NDLTD) • Domain: graduate education and research • Genre: electronic theses and dissertations (ETDs) • Submission & Collection: local sites www.ndltd.org, www.theses.org
Computer Science Teaching Center (CSTC) • Domain: teaching computer science • Genre: courseware • Submission & Collection: www.cstc.org
CS Teaching Center (CSTC): Lessons Learned • Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. • Learners benefit from having well-crafted modules that have been reviewed and tested. • Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
ACM Journal of Educational Resources in Computing (JERIC) • Domain: teaching computer science • Genre: courseware, scholarly articles • Submission & Collection: CSTC, ACM Digital Library
JERIC • Journal of Educational Resources in Computing • Accessible from www.cstc.org and www.acm.org and www.citidel.org • ACM and SIGCSE support • Refereed and interactive • Part of ACM Digital Library
Computing and Information Technology Interactive Digital Educational Library (CITIDEL) • Domain: computing / information technology • Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, technical reports, … • Submission & Collection: sub/partner collections www.citidel.org
CITIDEL Team • An NSDL Collection Track project • Led by Virginia Tech, with co-PIs: • Fox (director, DL systems) • Lee (history) • Perez (user interface, Spanish support) • Partners • College of New Jersey (Knox) • Hofstra (Impagliazzo) • Villanova (Cassel) • Penn State (Giles)
CITIDEL Collection Sources include ACM CSTC Research Index IEEE-CS … Experts’ finding aids NCSTRL include include include metadata fulltext NEC’s data data processed w. R.I. Borner’s info viz software repository include include ACM DL SIGCSE proceedings JERIC
CITIDEL Collection Building thru Nominating Submitting include after after or thru Creating include after Searching, Browsing Crawling Composing thru aided by using using GetSmart Classifying Crawlifier VIADUCT
Digital library architecture for local and interoperable CITIDEL services
National Science Digital Library (NSDL) • Domain: undergraduate and K-12 education, etc. • Genre: educational resources • Submission & Collection: sites of 90 projects www.nsdl.org
referenced items & collections referenced items & collections Special Databases Portals & Clients Portals & Clients Portals & Clients NSDL Services NSDL Services Other NSDL Services NSDL Collections NSDL Collections NSDL Collections Core Services: information retrieval CI Services browsing CI Services authentication Core Services: metadata gathering CI Services personalization Core Collection- Building Services protocols CI Services discussion Core Collection- Building Services harvesting CI Services annotation NSDL Information ArchitectureDeveloped by the Technical Infrastructure Workgroup User Interfaces CoreNSDL “Bus” Usage Enhancement Collection Building
Digital Library in a Box • Domain: helping DL projects • Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA) • Software and Documentation: http://dlbox.nudl.org
Outline • Overview, Problem • Experience: Case Study Projects • Open Archives Initiative • Hussein Suleman Dissertation • DL in a Box, OCKHAM • Summary and Conclusion
Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org
Metadata harvesting The World According to OAI Service Providers Discovery Current Awareness Preservation Data Providers
Technical Umbrella for Practical Interoperability… Metadata Harvesting Reference Libraries Museums Publishers E-PrintArchives …that can be exploited by different communities
Tiered Model of Interoperability Mediator services Metadata harvesting Document models
OA 1 OA 2 OA 4 OA 3 OA 5 OA 6 OA 7 OAI – Black Box Perspective Services: Search Browse Summarize Visualize Metadata: Docs: DO DO DO DO DO DO DO
Archive Lite Sites NCSTRL Eprints Own: History, ResearchIndex, CSTC, … CITIDEL Active Aggregation throughOAI Harvesting IEEE-CS, ACM, …
Protocol for Metadata Harvesting • Service Requests • Identify • ListMetadataFormats • ListSets • GetRecord • ListIdentifiers • ListRecords • Metadata Multiplicity • Date/Time Ranges • Sets (with semantics depending on local data providers) • Resumption Tokens
Outline • Overview, Problem • Experience: Case Study Projects • Open Archives Initiative • Hussein Suleman Dissertation • DL in a Box, OCKHAM • Summary and Conclusion
Open Digital Library (ODL) Hypothesis (Hussein Suleman) • Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as • networks of extended Open Archives, where • each extended Open Archive is a • source of data and/or a provider of services.
Example Architecture (NDLTD) Virginia Tech User Interface PhysNet Humboldt Search Browse Recent Duisburg CalTech Union Catalog Dresden MIT Filter User Interface OAI/ODL archive OAI/ODL protocol legend MIT
Hussein Suleman’s Thesis Summary • Open Digital Libraries (DLs) • Open Archives Initiative (OAI) • Protocol for Metadata Harvesting (PMH) • Extending OAI-PMH provides the glue for building componentized DLs. • Lightweight protocols connect the components to support modular systems with good efficiency.
Research in a Nutshell • We build extensible modular systems with customizable services. • This supports interoperability and allows distributed development. • This is in use in www.cstc.org, AmericanSouth.org, www.citidel.org, … • Components include search, browse, annotate, editorial support, union, filter, whats-new, submit, rate, recommend, …
Image Video Video Video Image Image Program Program Program 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Document Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? users digital objects
Program Video Video Image Image Program Program Video Image ? 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? ? ? ? ? ? ? ? ? ? ? ? Document ? Document Document ? 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? 1010100101010010101010010101010101010101 ? ? ? ? ? ? componentized digital library