210 likes | 326 Views
Enabling Interaction and Quality in a Distributed Data DRIS. D. Scott Brandt Associate Dean for Research Michael Witt Senior Research Systems Administrator Purdue University Libraries. CRIS 2006 Bergen, Norway May 11, 2006. Background: Purdue University.
E N D
Enabling Interaction and Quality in a Distributed Data DRIS D. Scott BrandtAssociate Dean for ResearchMichael WittSenior Research Systems Administrator Purdue University Libraries CRIS 2006 Bergen, Norway May 11, 2006
Background: PurdueUniversity Nine Colleges: Agriculture, Consumer & Family Sciences, Education, Engineering, Liberal Arts, Management, Pharmacy/ Nursing/Health Sciences, Technology, Vet Medicine 73 Departments, several cross-disciplinary: e.g. Agricultural & Biological Engineering
Purdue University Libraries 2004 initiative for Librarians (faculty) to collaborate with other faculty across campus—apply library science knowledge and expertise to various research data problems: collect, organize, describe, curate, archive, disseminate data/information
Strategic directions • University: “interdisciplinaryand collaborative endeavorsgrounded in the strengths of academic disciplines” • Libraries: Libraries faculty are integrated into campus research agenda
Discovery Learning Center Earth & Atmospheric Science English IT at Purdue Mechanical Engineering Technology Regenstrief Center Agronomy Biology Cancer Center Center for the Environment Chemical Engineering Chemistry Cyber Center Areas of research collaboration
Current areas of participation • E. Coli K-12 Model Organism Resource NIH proposal (B. Wanner, Biology, PI, D. Scott Brandt, Libraries, Co-PI) : create archival process for curated database, assist in applying ontologies for data representation and annotation • An Expert System Multimedia Tutorial for Locating Technical Information, Purdue University TLT Digital Content grant (Megan Sapp, PI, Amy Van Epps and Michael Fosmire, co-PIs, with Bruce Harding, Mechanical Engineering Technology): develop tutorial for MET102 course in using and applying standards • URL-based Search Interface to the Distributed Institutional Repository Purdue University Graduate School (Michael Witt, Libraries, PI, Darcy Bullock, Civil Engineering, Co-PI): develop toolkit to deploy customized searching of dissertations by school, advisor, etc. • AquaEcon Web Library: An Electronic Resource on Economics-Related Literature on Aquaculture, NOAA (K. Quagrainie, Agricultural Economics PI, Hal Kirkwood, Libraries, as co-PI) : build and populate database
Progression towards CRIS • Institutional repository (IR) • Distributed institutional repository (DIR) • Interactions related to DIR leading to CRIS-like applications • Leverage DIR for DRIS/CRIS
Distributed Institutional Repository e-prints archival collections MetadataRepository grid resources Applications data archive native databases OAI Service Provider OAI Data Providers
A systems-based approach to Libraries supporting research: linear inputs experimentation outputs Data repositories Document repositories CRIS A repository of well-described data resulting from research processes is preserved and shared for repurposing A current research information system links people engaged in research with funding and other resources such as interdisciplinary collaborators Journal article pre-prints, post-prints, conference and working papers, dissertations and other e-prints represent research outputs in a document repository
A systems-based approach to Libraries supporting research: cyclical CRIS data repository e-print repository
An example application: SRU • Linking to electronic theses and dissertations (ETD) • URL-based search interface to DIR running as a web service • $16,000 Strategic Development Initiative award for fellowship and server
Getting to the datasets: SRB • The Storage Resource Broker • Developed by the San Diego Supercomputer Center • Uniform access to heterogeneous, distributed storage • Metadata catalog (MCAT) and preservation functionality • TeraGrid, collaboration with Information Technology at Purdue and Rosen Center for Advanced Computing
Apache Tomcat Server H A R V E S T E R OAISRB OAI- PMH Interface (OAICat) SRB Client (Jargon) HTTP MCAT (SRB) XML An example systems interaction • OAISRB: provides an OAI-PMH interface to the SRB to expose metadata from resources on a data grid to OAI service providers Data grid
Sample OAISRB config #### OAI Handler Base URL Format OAIHandler.baseURL=http://128.210.126.231:8080/OAISRB/OAIHandler #### SRB Connection Parameters SRB.HOST=orion.sdsc.edu SRB.PORT=7620 SRB.USERNAME=mwitt SRB.PASSWORD=nyah SRB.HOMEDIRECTORY=/dspace/home/mwitt.purdue SRB.MDASDOMAINNAME=purdue SRB.DEFAULTSTORAGERESOURCE=dspace-fs1 SRB.MCATZONE=dspace #### SRB Collection Count and SRB Collection Names SRB.root=/TGzone/home/lars.itap SRB.maxcollections=1 SRB.collection1=LARSDATA #### Custom Parameters for SRB GRID SRBRecordFactory.repositoryIdentifier=mwitt.purdue Display.MaxListSize=50 #### Custom Identify response values Identify.repositoryName=SRB Data Grid Identify.adminEmail=mailto:mwitt@purdue.edu Identify.earliestDatestamp=2000-01-01T00:00:00Z Identify.deletedRecord=no #### Crosswalk (in this example, FGDC-to-unqualified Dublin Core) DC.Identifier=title DC.Description=purpose DC.Title=title DC.Format=File Format DC.Creator=address DC.Subject=metprof
Metadata research • Metadata librarian worked for four months analyzing metadata needs and processes for several data sets • Results included DC descriptions, enhanced with thesaurus headings, and a basic crosswalk • Also: metadata descriptions from scratch are too manually intensive…
Metadata- Water Quality • A flat file with only “system” metadata • Began with Dublin Core • Enhanced subjects with thesaurus from NAL (US National Agriculture Library) • Looked at DIF (Dir. Interchange Format) • Looked at cross-walk with FGDC (Federal Geographic Data Comm.) format
Next steps: Metadata • Articulate metadata workflow to imbed metadata into the process • Review automating all data • Determine how/where to validate and automate descriptive metadata
Conclusions and Questions • Use existing, native metadata whenever possible • Automate and periodically assess processes to ensure quality • Diminishing returns: we settled on discovery and collection-level metadata • Crosswalks are useful but can truncate or distort the original meaning • The importance of interactions, among people and systems • How do we implement CRIS/CWIS/DRIS in our environment? • What is the role of the Libraries in such?
Michael Witt mwitt@purdue.edu D. Scott Brandt techman@purdue.edu Takk (thank you)