1 / 21

Enabling Interaction and Quality in a Distributed Data DRIS

Enabling Interaction and Quality in a Distributed Data DRIS. D. Scott Brandt Associate Dean for Research Michael Witt Senior Research Systems Administrator Purdue University Libraries. CRIS 2006 Bergen, Norway May 11, 2006. Background: Purdue University.

jubal
Download Presentation

Enabling Interaction and Quality in a Distributed Data DRIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling Interaction and Quality in a Distributed Data DRIS D. Scott BrandtAssociate Dean for ResearchMichael WittSenior Research Systems Administrator Purdue University Libraries CRIS 2006 Bergen, Norway May 11, 2006

  2. Background: PurdueUniversity Nine Colleges: Agriculture, Consumer & Family Sciences, Education, Engineering, Liberal Arts, Management, Pharmacy/ Nursing/Health Sciences, Technology, Vet Medicine 73 Departments, several cross-disciplinary: e.g. Agricultural & Biological Engineering

  3. Purdue University Libraries 2004 initiative for Librarians (faculty) to collaborate with other faculty across campus—apply library science knowledge and expertise to various research data problems: collect, organize, describe, curate, archive, disseminate data/information

  4. Strategic directions • University: “interdisciplinaryand collaborative endeavorsgrounded in the strengths of academic disciplines” • Libraries: Libraries faculty are integrated into campus research agenda

  5. Discovery Learning Center Earth & Atmospheric Science English IT at Purdue Mechanical Engineering Technology Regenstrief Center Agronomy Biology Cancer Center Center for the Environment Chemical Engineering Chemistry Cyber Center Areas of research collaboration

  6. Current areas of participation • E. Coli K-12 Model Organism Resource NIH proposal (B. Wanner, Biology, PI, D. Scott Brandt, Libraries, Co-PI) : create archival process for curated database, assist in applying ontologies for data representation and annotation • An Expert System Multimedia Tutorial for Locating Technical Information, Purdue University TLT Digital Content grant (Megan Sapp, PI, Amy Van Epps and Michael Fosmire, co-PIs, with Bruce Harding, Mechanical Engineering Technology): develop tutorial for MET102 course in using and applying standards • URL-based Search Interface to the Distributed Institutional Repository Purdue University Graduate School (Michael Witt, Libraries, PI, Darcy Bullock, Civil Engineering, Co-PI): develop toolkit to deploy customized searching of dissertations by school, advisor, etc. • AquaEcon Web Library: An Electronic Resource on Economics-Related Literature on Aquaculture, NOAA (K. Quagrainie, Agricultural Economics PI, Hal Kirkwood, Libraries, as co-PI) : build and populate database

  7. Progression towards CRIS • Institutional repository (IR) • Distributed institutional repository (DIR) • Interactions related to DIR leading to CRIS-like applications • Leverage DIR for DRIS/CRIS

  8. Distributed Institutional Repository e-prints archival collections MetadataRepository grid resources Applications data archive native databases OAI Service Provider OAI Data Providers

  9. A systems-based approach to Libraries supporting research: linear inputs experimentation outputs Data repositories Document repositories CRIS A repository of well-described data resulting from research processes is preserved and shared for repurposing A current research information system links people engaged in research with funding and other resources such as interdisciplinary collaborators Journal article pre-prints, post-prints, conference and working papers, dissertations and other e-prints represent research outputs in a document repository

  10. A systems-based approach to Libraries supporting research: cyclical CRIS data repository e-print repository

  11. An example application: SRU • Linking to electronic theses and dissertations (ETD) • URL-based search interface to DIR running as a web service • $16,000 Strategic Development Initiative award for fellowship and server

  12. Getting to the datasets: SRB • The Storage Resource Broker • Developed by the San Diego Supercomputer Center • Uniform access to heterogeneous, distributed storage • Metadata catalog (MCAT) and preservation functionality • TeraGrid, collaboration with Information Technology at Purdue and Rosen Center for Advanced Computing

  13. Apache Tomcat Server H A R V E S T E R OAISRB OAI- PMH Interface (OAICat) SRB Client (Jargon) HTTP MCAT (SRB) XML An example systems interaction • OAISRB: provides an OAI-PMH interface to the SRB to expose metadata from resources on a data grid to OAI service providers Data grid

  14. Sample OAISRB config #### OAI Handler Base URL Format OAIHandler.baseURL=http://128.210.126.231:8080/OAISRB/OAIHandler #### SRB Connection Parameters SRB.HOST=orion.sdsc.edu SRB.PORT=7620 SRB.USERNAME=mwitt SRB.PASSWORD=nyah SRB.HOMEDIRECTORY=/dspace/home/mwitt.purdue SRB.MDASDOMAINNAME=purdue SRB.DEFAULTSTORAGERESOURCE=dspace-fs1 SRB.MCATZONE=dspace #### SRB Collection Count and SRB Collection Names SRB.root=/TGzone/home/lars.itap SRB.maxcollections=1 SRB.collection1=LARSDATA #### Custom Parameters for SRB GRID SRBRecordFactory.repositoryIdentifier=mwitt.purdue Display.MaxListSize=50 #### Custom Identify response values Identify.repositoryName=SRB Data Grid Identify.adminEmail=mailto:mwitt@purdue.edu Identify.earliestDatestamp=2000-01-01T00:00:00Z Identify.deletedRecord=no #### Crosswalk (in this example, FGDC-to-unqualified Dublin Core) DC.Identifier=title DC.Description=purpose DC.Title=title DC.Format=File Format DC.Creator=address DC.Subject=metprof

  15. Metadata research • Metadata librarian worked for four months analyzing metadata needs and processes for several data sets • Results included DC descriptions, enhanced with thesaurus headings, and a basic crosswalk • Also: metadata descriptions from scratch are too manually intensive…

  16. Metadata- Water Quality • A flat file with only “system” metadata • Began with Dublin Core • Enhanced subjects with thesaurus from NAL (US National Agriculture Library) • Looked at DIF (Dir. Interchange Format) • Looked at cross-walk with FGDC (Federal Geographic Data Comm.) format

  17. Next steps: Metadata • Articulate metadata workflow to imbed metadata into the process • Review automating all data • Determine how/where to validate and automate descriptive metadata

  18. Conclusions and Questions • Use existing, native metadata whenever possible • Automate and periodically assess processes to ensure quality • Diminishing returns: we settled on discovery and collection-level metadata • Crosswalks are useful but can truncate or distort the original meaning • The importance of interactions, among people and systems • How do we implement CRIS/CWIS/DRIS in our environment? • What is the role of the Libraries in such?

  19. Michael Witt mwitt@purdue.edu D. Scott Brandt techman@purdue.edu Takk (thank you)

More Related