1 / 18

SDSC Data and Knowledge Systems

. . . . Storage hardware. Grid StorageDatabase Systems. Data Mining/Analysis. Applications: Bioinformatics,Ecoinformatics, Geoinformatics, . Advanced Query Processing Knowledge-Based Integration. FilesystemsNetworked Storage. Visualization. Data Technology Layers. Real-time data streams: Sens

ramiro
Download Presentation

SDSC Data and Knowledge Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. SDSC Data and Knowledge Systems Chaitan Baru Program Co-Director

    2. Data Technology Layers

    3. Information Integration Projects Motivated by application needs Bioinformatics, Neuroscience, Geosciences, Environmental Science/Ecology, Digital Government, … The Biomedical Informatics Research Network (BIRN) Knowledge-based integration of human/mouse brain, structural/functional MRI Digital Government Integration of geospatial and statistical data (I2T project) Long-Term Ecological Research (LTER) Integration of distributed weather and hydrology data, ClimDB GEON: The Geosciences Network Integration of multi-disciplinary Earth Science databases SEEK: Scientific Environment for Environmental Knowledge Modeling frameworks, semantic integration, workflow systems for environmental modeling

    4. Information Integration Approaches Data Warehouses Bring data into central location. Proven technology. Database Integration Ability to query across data sources. Technology exists. Not yet “seamless”. Can be done with ODBC-level connectivity. Application Integration (Application) Object-level integration, typically Java. Works well, but needs significant effort. Requires “deep” access. Semantic Integration Integrate across disciplinary databases. Technology is in research domain, but making progress. Being developed in Web context and using Web technologies. Model-based Integration A new grand challenge. Driven by domain (science) models.

    5. SDSC Information Integration Testbed Funded by NSF Digital Government & ITR Data from: Sociology Workbench, SDSU ICPSR, U.Michigan And: Centers for Disease Control EPA SANDAG

    6. Long-Term Ecological Research (LTER) ClimDB: An LTER Project An application of the I2T system Move from “harvesting” to on-line query model Developed as a prototype for other LTER projects, e.g. HydroDB

    7. Web Services “fabric” Working with Blue Titan to provide robust infrastructure for execution of Web services and Web service workflows

    8. Web Services Architecture at SDSC with Blue Titan

    9. Computational Grids “versus” Data Grids Computational grids Provide distributed supercomputing (MPI) Focus on: account management, security, job scheduling Data grids Provide access to large distributed data sets Focus on: naming issues, hiding file access latency via HSM and remote I/O techniques Distributed databases Provide ability to query and update data in distributed DBMS’s Focus on: schema integration, query optimization, transaction processing support

    10. Virtualization Open Grid Services Architecture (OGSA) Everything is a service System “transparencies” for data on the Grid Location/Name: could be anywhere on the Grid Distribution: could be distributed across multiple locations Replication: could be in more than one location(s) Heterogeneity: could be in more than one format—file or DBMS, points or polygons, etc… (the difficult one) Ownership & Costing: should not have to negotiate with each source individually

    11. Grid Database Standards “Data Access and Management Services on the Grid”, IBM/SDSC position paper, presented at GGF5, Edinburgh, Scotland Working towards a single paper/spec at GGF6, October, Chicago SDSC and IBM are beginning an open source activity with ISI/Globus group to define Grid database services Data replication service Discussion re. services-based architecture for SRB

    12. GEON: The Geosciences Network NSF ITR Project IT: SDSC with Penn State, San Diego State University Geosciences: Arizona State University, Bryn Mawr College, Cornell University, Rice University, UNAVCO, University of Arizona, University of Idaho, University of Missouri, University of Texas El Paso, University of Utah, Virginia Tech Education and Outreach: DLESE, Cornell, UNAVCO Agency Partner: USGS

    13. GEON IT Issues Close collaboration between geoscientists and IT to interlink databases and Grid-enable applications “Deep” data modeling of 4D data Situating 4D data in context—spatial, temporal, topic, process XML-based standards for data exchange Semantic integration of Geosciences data Logic-based formalisms to represent knowledge and map between ontologies Begin to define UGLS (Unified Geosciences Language System), a la UMLS in medicine Metathesaurus, Semantic Network, Lexicon

    14. GEON IT Issues Learning from the BIRN project The GEON Grid: heterogeneous networks, compute nodes, storage capabilities Deploy grid and cluster software across GEON SDSC SRB, ROCKS, Globus Leverage BIRN, TeraGrid experience Sharing data, tools, and compute resources, SETI@home model

    15. GEON IT Issues Advanced visualization capability Visualization of semantic structures for knowledge discovery Augmented reality facilities Remote visualization using Visualization Center at Scripps and SDSU Viz lab

    17. Challenges Developing open standards at all levels: data, information, knowledge Data: structured binary formats, XML Metadata: XML and its dialects (e.g. WSDL) Knowledge: knowledge representation techniques, ontologies/terms GEON is working with social scientists who want to study the interactions and dynamics Moving from individual PI-oriented research to collaborative research (or from individual dept./agency to inter-agency) How to deal with “re-purposing” of data and information? Incentives for sharing and cooperation

    18. Challenges: Understanding how best to interact with science communities Interacting with scientists and “data managers”

More Related