390 likes | 552 Views
e-Science, Information Infrastructure, and Digital Libraries. Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05). Outline of talk.
E N D
e-Science, Information Infrastructure, and Digital Libraries Libraries in the Digital Age Dubrovnik 30 May 2005 Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles & Oxford Internet Institute (2004-05)
Outline of talk • What are we building, for whom, and why? • e-Science / e-Social Science / Cyberinfrastructure • Digital libraries • Case studies • Alexandria Digital Earth Prototype (ADEPT) • Center for Embedded Networked Sensing (CENS) • Summary of issues
Goals of Cyberinfrastructure • Cyberinfrastructure (U.S. initiative)* • “technology … now make(s) possible a comprehensive “cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.” • “NSF must ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad, long-term access by scientists everywhere.” *Atkins, D., Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon panel on Cyberinfrastructure. 2003, January. Executive summary. http://www.communitytechnology.org/nsf_ci_report/
Goals of e-Science • e-Science (U.K. initiative) • “e-Science: large scale science … carried out through distributed global collaborations enabled by the Internet.” * • “collaborative scientific enterprises … require access to very large data collections, very large scale computing resources and high performance visualisation” * • “e-Science data generated from sensors, satellites, high-performance computer simulations, high-throughput devices, scientific images … will soon dwarf all of the scientific data collected in the whole history of scientific exploration.””** *About the UK e-Science Programme. http://www.rcuk.ac.uk/escience/ **Hey & Trefethen, 2003, The data deluge: an E-science perspective.
Digital Libraries • Systems that support searching, use, creation of content • Institutions with people, digital collections, and services • Repositories of digital data and documents, as a component of cyberinfrastructure, e-research, e-science, e-social science, e-learning… • Institutional repositories • Open archives • Data collections
Digital libraries and e-Science: Architecture • JISC/NSF working group on digital libraries and e-science / cyberinfrastructure • Layered model of cyberinfrastructure • Venn diagram model Images courtesy of Norman Wiseman, JISC
Cyberinfrastructure: AnEvolvingView Applications Space Content Digital Libraries User Interfaces & Tools Scientific DBs Information & knowledge layer Middleware services layer ITC Infrastructure Processors, memory, network
The Focus for DL Infrastructure:Version 2 E-Science Or E-Research E-Learning E-resources: common to all, need demonstrators of how it will work Hybrid Libraries
Digital libraries and e-Science: Content • Scholarly communication: Secondary resources • Conference proceedings • Journal articles • Working papers, manuscripts, pre-prints • Books • Research data: Primary sources • Raw data from research • Instrumented data collection (labs, sensor networks) • Field notes • Archival sources • Unique documents • Records of individuals and organizations • Value chain: linking data, documents, and related sources E-bank image courtesy of Jeremy Frey and Liz Lyon; Crystallography image courtesy of Tony Hey
eBank Project Virtual Learning Environment Reprints Peer-Reviewed Journal & Conference Papers Technical Reports LocalWeb Preprints & Metadata Institutional Archive Publisher Holdings Certified Experimental Results & Analyses Data, Metadata & Ontologies Undergraduate Students Digital Library Graduate Students E-Scientists E-Scientists E-Scientists Grid 5 E-Experimentation Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning
Crystallographic e-Prints • Direct Access to Raw Data from scientific papers Raw data sets can be very large and these are stored at National Datastore using SRB server
Digital libraries and e-Science: Curation • Curation: maintenance of data over its entire life-cycle and over time for current and future generations of users. Involves active management and appraisal… adding value… to a trusted body of information.* • Scholarly communication: Secondary resources • Libraries • Disciplinary repositories (arXiv) • Institutional repositories (ePrints, Dspace) • Publishers • Research data: Primary sources • Archives (arts and humanities) • Disciplinary repositories (e.g., IVOA, BADC, IRIS, BIRN, NEON, GEON) • Research groups *What is digital curation? (2004). Edinburgh, Digital Curation Center. 2004. Image of British Atmospheric Data Centre courtesy of Bryan Lawrence
Simulations Assimilation Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre British Oceanographic Data Centre
Digital libraries: Uses of secondary resources • Community orientation of researchers • “open science” depends upon sharing, transparency, rapid dissemination • documents may be self-archived, deposited, shared with peers • incentive and reward system based on publication • researchers contribute to digital collections via publication • Individual orientation of students • searchers of digital collections, not contributors • reliant upon search mechanisms and bibliographic control • Scholarly publications: inputs and outputs of research • Digital libraries: “boundary objects” between experts and novices
Digital libraries: Uses of primary sources • Community orientation of researchers • e-Science goal to facilitate creation and use of research data by collaborating groups • Sharing practices vary widely by research topic and discipline • Agreements made for access to data, credit for publications • Data: input and output of research • Providing context to interpret data • Scholarly publications provide context • Digital libraries remove context
Digital libraries: Sharing primary sources • Incentives to share data • Tradition of “open science” to promote access, transparency • Establish trust and reciprocity within a research group • Ability to mine large data sets, compare results • Ability to replicate experiments, studies • Requirement of some funding agencies • Incentives not to share data • Rewards for publication, not for data management • Benefits of contributing data may accrue to other parties • Risks of misinterpretation of your data • Risks of losing control over data • Risks of loss of intellectual property
Digital libraries, e-Science and e-Learning • Case studies in the design of scientific digital libraries that will support research and teaching applications • Alexandria Digital Earth Prototype (ADEPT) • http://is.gseis.ucla.edu/adept/ • Center for Embedded Networked Sensing (CENS) • http://www.cens.ucla.edu
Museum Artifacts Earth Art Alexandria DL of Distributed Spatial Information Objects Other Digital Archives Zoological Habitat Study If it has a latitude and longitude then it can be in ADL Botanical Study Ocean Science Data Archeological Dig What information do you have about her?
Data models for habitat monitoring and sensor networks Contaminant Transport Group • Multimedia, Multiscale problems (time and space) • Multidisciplinary (current and as yet unknown) problems • Management, visualization, exploration of massive, heterogeneous data streams
Center for Embedded Networked Sensing:Education and Data Management Projects • Goals • Make sensor data useful for scientists • Make sensor data useful for teaching high school science • Facilitate inquiry learning with scientific data • User communities • Research scientists (habitat ecology, seismology) • High school science teachers (biology and physics) • High school students • Activities to be supported • Scientific data management by scientists • Enable students to formulate and test hypotheses • Enable students to design experiments “tasking” sensors
Some CENS Results-1 • CENS committed in principle to sharing data • Data management practices vary widely by discipline • Seismic: High experience, use of community repository (IRIS) and metadata format (SEED) • Habitat ecology: Low experience, exploring new community repository (Morpho) and metadata (Environmental Metadata Language) • Avian biology (localization of birdsongs): High experience across multiple disciplines • Education: Standards exist for educational modules but not for data
Some CENS Results-2 • No single metadata model or crosswalk will serve all CENS scientific applications • Metadata for individual disciplines, e.g., • Environmental Metadata Language: biocomplexity data • SEED: seismic data • Metadata for technical devices, e.g., Sensor Markup Language • Geospatial coordinates required for most applications • Geospatial data standards exists for 2D points • Context descriptors also needed (distance from sea level, local distance from ground, above/below leaf, north/south side of tree)
METADATA FOR SENSOR DATA FOR HABITAT MONITORING METADATA FOR EDUCATION MODULES FOR HABITAT MONITORING CENS Schema SensorML EML 2.0 LOM GEM ADN CENS_Node.Node_Name Name of Node Sml:IdentifiedAs (2.2.2) CENS_Node.Node_Desc Description of Node AssetDescription: sml:description (2.2.12) CENS_Location.Location_ID Unique location ID CrsID (2.2.5) Eml-Coverage(2.4.4) CENS_Location.X_Pos (Position on X axis) HasCRS (2.2.5) ObjectState (3.3.6) Eml-Coverage- GeographicCoverage (2.4.4) CENS_Location.Time_Recorded Time location was captured Eml-Coverage- TemporalCoverage (2.4.4) CENS_Location.Time_Type_ID Refers to type of time of Time_Type ID table Eml-Coverage (2.4.4) Educational-Typical Age Range (5.7) Audience-Age Audience Life Cycle-Contribute (2.3) Creator Resource Creator General-Coverage (1.6) Coverage-Spatial, Temporal Coverage (spatial and temporal) Life Cycle-Date (2.3.3) DateTime (8) Date Creation dateAccession date General-Description (1.4) Description Description Educational (5) Pedagogy Educational
CENS Research Directions • Infrastructure goals for CENS • Support scientists in collecting, managing, preserving, sharing data • Develop modular, extensible metadata architecture (XML-based) • Develop filtering tools to extract and visualize scientific data for educational use • Conduct behavioral studies of scientists, teachers, and students • How do they determine their data requirements? • What are their criteria for selecting, preserving data? • How do they use scientific data? • How do their uses evolve over time? • What are their incentives and disincentives to contribute data to repositories?
Summary • Technical infrastructure of e-Science under construction • Social aspects of e-Science infrastructure poorly understood • Value chain of e-Science requires linkage between data and documents • Scholarly communication infrastructure evolving, but exists • Data infrastructure fragmented across disciplines, communities • Digital libraries: intersection of e-science, e-learning, and libraries • Data practices vary more widely than do publication practices • Communities may represent, organize, and use the same data differently • Individuals may use the same content differently for different purposes • Research on digital libraries and e-learning can identify some of the challenges for building an e-science infrastructure
Acknowledgements • ADEPT is funded by National Science Foundation grant no. IIS-9817432, Terence R. Smith, University of California, Santa Barbara, Principal Investigator.http://is.gseis.ucla.edu/adept/ • ADEPT collaborators: Gregory Leazer, Anne Gilliland, Richard Mayer • CENS is funded by National Science Foundation Cooperative Agreement #CCR-0120778, Deborah L. Estrin, UCLA, Principal Investigator. http://www.cens.ucla.edu • CENS collaborators: Jonathan Furner, William Sandoval, Noel Enyedy, Michael Hamilton
ADEPT Project: Geospatial digital libraries • Goals • Add services to Alexandria Digital Library for teaching undergraduate courses in geography • Facilitate inquiry learning by providing access to primary sources • User communities • Faculty, as researchers • Faculty, as teachers of undergraduate courses • Undergraduate students • Activities to be supported • Information searching and retrieval • Composing lectures that incorporate text, concepts, and objects • Constructing learning modules in which students can formulate and test hypotheses
Some ADEPT Results (1999-2004)-1 • Technology use by geographers • Research: sophisticated users of IT; mapping software, atmospheric simulations, sensor networks • Teaching: minimal users of IT; rely on overhead projectors, chalk, paper maps, boxes of rocks, lecturing from hand-written notes • Information seeking by geographers • Research: typical library use, online searching • Teaching: irregular, non-directed, often a by-product of research activities • Information resources used by geographers • Research: varies by specialty; all want maps and images • Teaching: varies by course; all want maps and images
Some ADEPT Results (1999-2004)-2 • Search queries of geographers • Research: concept, place (place name, latitude/longitude) • Teaching: concept, place, process (examples of erosion, population movements, etc.) • Selection of digital resources for teaching • Clarity of presentation more important than familiarity of location • Desire to annotate, highlight, resize, for presentation • Use of primary data in instruction • Preference for use of own research data • Tools to manage own research data would make DL teaching services more attractive • Most prefer to hold their data privately; want control over personal digital libraries of their teaching and research resources