260 likes | 278 Views
Faculty Research Data: Informatics and Archiving. Sarah M. Pritchard University Librarian University of California, Santa Barbara. Informatics: A Definition. The study of the structure and behavior of natural and artificial systems designed to process data
E N D
Faculty Research Data:Informatics and Archiving Sarah M. Pritchard University Librarian University of California, Santa Barbara ECURE 2005, Phoenix, AZ
Informatics: A Definition • The study of the structure and behavior of natural and artificial systems designed to process data • Development of tools to ingest and interpret large stores of data in heterogeneous and distributed systems • Integration of data (numeric, textual, image, spatial) with tools for modeling, trend analysis, mapping, image processing, etc. • Business applications not studied in this context ECURE 2005, Phoenix, AZ
Informatics at UCSB • Emergence of informatics as a specialty in several academic departments, notably environmental sciences • Highly interdisciplinary faculty • Development of unique stand-alone systems for managing collaborative research data • No ongoing mechanisms for communication and technical coordination • Campus and consortial projects emerging for digital publications and for instructional support but not yet for research data ECURE 2005, Phoenix, AZ
Faculty Research Data • Large numeric data sets from physical sciences and laboratory research • Imaging – geosciences, neurosciences • Fieldwork – environmental, archaeological • Customized interpretive and manipulation tools • Drafts, correspondence, notes ECURE 2005, Phoenix, AZ
UCSB Computing Environment • One of the original nodes of the Internet • No centralized academic computing organization • Offices for networking, and for instructional support • Individual colleges and departments have developed own servers and support for research data and teaching tools • High-level campus policy board for IT issues brings some coordination ECURE 2005, Phoenix, AZ
UCSB Library Context • Alexandria Digital Library (www.alexandria.ucsb.edu) • Extension into new disciplinary applications • Heterogeneous metadata ingest • Extensive backup and archiving architecture • Long record of faculty collaboration • NDIIPP • California Digital Library (www.cdlib.org) • Digital preservation initiatives for published documents and for (under development) government information web sites • eScholarship program to support publication of online journals, preprint archives • Online Archive of California – special collections support • Other faculty support • Electronic reserves including streaming audio reserves • Digital document delivery to the desktop ECURE 2005, Phoenix, AZ
What questions emerge from this? • Why are faculty building informatics systems? • Is valuable research time and funding being spent on tangential work? • Are there commonalities across informatics applications and disciplines? • Is there redundancy in tool development? • Can data be openly accessed or shared? • Are digital library concerns (metadata, IP rights, archiving) incorporated? ECURE 2005, Phoenix, AZ
Informatics Project Goals • Create stronger linkages among relevant faculty research projects • Identify components and needs in informatics and the management of research data • Assess the degree of commonality in informatics tools and functionality • Determine whether more support is needed for data archiving, metadata, interfaces, IP • Develop a planning agenda for informatics in a distributed environment • Inform the design of facilities and services ECURE 2005, Phoenix, AZ
Project Components • Background research in current informatics work in academic disciplines • Structured interviews and site visits with selected faculty • Matrix of system characteristics and issues • Informal roundtables for faculty working in these areas • Collaboration with related IT units • White paper for campus discussion of futures ECURE 2005, Phoenix, AZ
UCSB Informatics: Participants • Faculty chosen on the basis of • Innovative science • Data intensive work • Interdisciplinary research • Recommended by the Office of Research, colleagues, department heads, IT offices and librarians. • Control Group: Non-science faculty • Select group of technologically innovative faculty in other disciplines were used as a control to determine whether trends were specific to sciences • About 40 people interviewed ECURE 2005, Phoenix, AZ
Sample Questions for Faculty • How do you store research information? • Do you do any cataloging, indexing, or metadata? • How are your data maintained on an on-going basis? • Is there something special about the way that you manage your data compared to colleagues within the field? • Do you write or borrow scripts/tools? For what purpose? • Are you having difficulty managing your data collection? Are there services that you wish others would provide? • How is IP and sharing of datasets/information handled in your field? • When you collaborate with others through the web what kinds of tools, if any, do you use? • What are your plans for this research in the next five years? Are there service requirements that you will need then? ECURE 2005, Phoenix, AZ
Findings: Growth of Systems • The sophistication of informatics arrangements is determined by the amount of data collected and how labor-intensive it is to collect. • Change happens when the following converge: • Data size increases exponentially • Research questions encompass broad range of specialties • Funding agencies require change for funding • Guiding principles seem to be: • “What is the smallest group of people that I can have do the work, and still do the [work]” • “What is the least amount of indirect work [e.g., informatics] related to the research that I can do, and still do the [work]” ECURE 2005, Phoenix, AZ
Findings: Data Preservation Perceived Long-term Preservation Need of Faculty and Staff Researchers ECURE 2005, Phoenix, AZ
Findings: Data Preservation • Some science fields have national and international data centers where data deposit is required for grant funding. • Where data centers do not exist, backup depends on: • Length of a grant • Length of time primary researcher on campus • Perception that data has maximum value for 12-18 months after publication, and negligible value after 5-10 years. • Departments lack personnel and support for long-term preservation of data. • Faculty store data on the “removable media of the day” and forget about it, until it becomes difficult or impossible to access • More complex systems, same number of people to manage them, leads to less time to devote to “meta-issues” • Critical impact: research collaboration and long term historical data analysis suffer ECURE 2005, Phoenix, AZ
Data Preservation Practices ECURE 2005, Phoenix, AZ
Findings: Data Organization • Most common organizing mechanism – directory structure, spreadsheets, and word processing software • Databases (with or without metadata) are uncommon. Viewed as time/labor-intensive, unnecessary drain on research time. • Portals built by tech specialists within a field are well utilized. • Storage space is adequate for now. Over half the people contacted were in the process of upgrading. • Most departments did not have strictly enforced limits on email, data storage, and personal storage • Though much on their servers is “garbage,” memory is thrown at the problem; little support in most departments for data management • “Not a solved problem.” While actual memory might be cheap, tape, labor, and other equipment to ensure that data are maintained is NOT. ECURE 2005, Phoenix, AZ
Findings: Metadata issues • Metadata is discipline specific; commonalities exist, but key requirements of a discipline vary. • Metadata structures and subject taxonomies reflect the way faculty in a discipline think • While organizational structure is an important issue in metadata use, other considerations are: • Services available in one’s discipline • Acceptance and standardization in the discipline • Usage in key portals, data centers, and repositories • One worldwide metadata format is not likely at this time • Interdisciplinary metadata issues and crosswalks ECURE 2005, Phoenix, AZ
Metadata Usage ECURE 2005, Phoenix, AZ
Findings: Intellectual Property • Intellectual property protocols that faculty follow after creating software, portals or databases are highly correlated to the discipline. • In disciplines where things move quickly, the ideal method is to open source one’s tool to obtain an audience, then later align oneself with a company, or start one; • In disciplines where there is a lot of money there is pressure to ensure patents are filed. • Databases, portals and data centers on campus typically all have legal waiver forms, allowing release of the data sets to other researchers as part of the process to ingest the data. • Disciplines vary in the extent to which they support an ethic of data sharing. ECURE 2005, Phoenix, AZ
Have not yet encountered Prefer to create open source issues, 8% products to avoid intellectual property issues, 22% Intellectual property issues affect my research significantly, 30% Practices and Procedures in industry are well tested and accepted - no major issues, 16% Occasional minor issues with an individual collaborator or publisher, 24% Digital Rights Management Practices ECURE 2005, Phoenix, AZ
Findings: Data Support Needs • Some needs and services were mentioned across disciplines regardless of current arrangements: • Informatics “point person” or clearinghouse for information on tools, expertise, and research knowledge on campus and nationally • Long term archiving of research data especially during the gap in coverage between publication and obsolescence • Tiered support services for database development, cataloging, conversion, emulation, migration, web development, metadata, pre-planning for technology grants ECURE 2005, Phoenix, AZ
Trends Shaping Future Demand • Growth in complex data objects • Improved data mining • Policies of funding agencies • National repositories • New cyberinfrastructure initiatives • Prevalence of campus repositories for text • Tech-intensive academic programs • Need for rapid and global data exchange • Steady or decreasing staffing ECURE 2005, Phoenix, AZ
Key System Characteristics • Flexibility to customize control, interfaces and security • Secure access worldwide • Metadata-agnostic design • Interoperability with scholarly communication, archiving and rights management systems • Clearinghouse functions • Advanced services for migration, emulation, long-term digital archiving ECURE 2005, Phoenix, AZ
Topics for Campus Discussion • Where are the gaps in current offerings? • How do technology services on campus interact, and are new organizational models needed? • What are faculty priorities for various services? • What kinds of research data should be high priority for preservation, and how much is at risk? • What are incentives for faculty participation? • What is the impact of tenure and promotion structures in encouraging “data maintenance work?” ECURE 2005, Phoenix, AZ
Possible outcomes • Everything stays as is • More peer-to-peer sharing of resources and expertise • Policies are established • Intellectual property rights at several levels • Use of metadata and digital object standards • Ensure data sustainability • Organizational approaches are considered • IT offices, the library, consortial systems support, disciplinary groups, or a combination • New services are offered • Database design • Metadata creation • Consulting • Clearinghouse functions • Full digital archiving and migration ECURE 2005, Phoenix, AZ
Further Information • UCSB Informatics Project web site: http://www.library.ucsb.edu/informatics/ • ECAR Research Bulletin, vol. 2005, Issue 2: “Informatics and Knowledge Management for Faculty Research Data,” Jan. 18, 2005 Contact: • Sarah M. Pritchard, University Librarian pritchard@library.ucsb.edu • Larry Carver, Director of Library Technologies and Digital Initiatives, carver@library.ucsb.edu • Special thanks to Smiti Anand, Project Analyst ECURE 2005, Phoenix, AZ