180 likes | 337 Views
. . . . Storage hardware. Grid StorageDatabase Systems. Data Mining/Analysis. Applications: Bioinformatics,Ecoinformatics, Geoinformatics, . Advanced Query Processing Knowledge-Based Integration. FilesystemsNetworked Storage. Visualization. Data Technology Layers. Real-time data streams: Sens
E N D
1. SDSC Data and Knowledge Systems Chaitan Baru
Program Co-Director
2. DataTechnology Layers
3. Information Integration Projects Motivated by application needs
Bioinformatics, Neuroscience, Geosciences, Environmental Science/Ecology, Digital Government,
The Biomedical Informatics Research Network (BIRN)
Knowledge-based integration of human/mouse brain, structural/functional MRI
Digital Government
Integration of geospatial and statistical data (I2T project)
Long-Term Ecological Research (LTER)
Integration of distributed weather and hydrology data, ClimDB
GEON: The Geosciences Network
Integration of multi-disciplinary Earth Science databases
SEEK: Scientific Environment for Environmental Knowledge
Modeling frameworks, semantic integration, workflow systems for environmental modeling
4. Information Integration Approaches Data Warehouses
Bring data into central location. Proven technology.
Database Integration
Ability to query across data sources. Technology exists. Not yet seamless. Can be done with ODBC-level connectivity.
Application Integration
(Application) Object-level integration, typically Java. Works well, but needs significant effort. Requires deep access.
Semantic Integration
Integrate across disciplinary databases. Technology is in research domain, but making progress. Being developed in Web context and using Web technologies.
Model-based Integration
A new grand challenge. Driven by domain (science) models.
5. SDSC Information Integration TestbedFunded by NSF Digital Government & ITR Data from:
Sociology Workbench, SDSU
ICPSR, U.Michigan
And:
Centers for Disease Control
EPA
SANDAG
6. Long-Term Ecological Research (LTER)ClimDB: An LTER Project An application of the I2T system
Move from harvesting to on-line query model
Developed as a prototype for other LTER projects, e.g. HydroDB
7. Web Services fabric Working with Blue Titan to provide robust infrastructure for execution of Web services and Web service workflows
8. Web Services Architecture at SDSC with Blue Titan
9. Computational Grids versus Data Grids Computational grids
Provide distributed supercomputing (MPI)
Focus on: account management, security, job scheduling
Data grids
Provide access to large distributed data sets
Focus on: naming issues, hiding file access latency via HSM and remote I/O techniques
Distributed databases
Provide ability to query and update data in distributed DBMSs
Focus on: schema integration, query optimization, transaction processing support
10. Virtualization Open Grid Services Architecture (OGSA)
Everything is a service
System transparencies for data on the Grid
Location/Name: could be anywhere on the Grid
Distribution: could be distributed across multiple locations
Replication: could be in more than one location(s)
Heterogeneity: could be in more than one formatfile or DBMS, points or polygons, etc
(the difficult one)
Ownership & Costing: should not have to negotiate with each source individually
11. Grid Database Standards Data Access and Management Services on the Grid, IBM/SDSC position paper, presented at GGF5, Edinburgh, Scotland
Working towards a single paper/spec at GGF6, October, Chicago
SDSC and IBM are beginning an open source activity with ISI/Globus group to define Grid database services
Data replication service
Discussion re. services-based architecture for SRB
12. GEON: The Geosciences NetworkNSF ITR Project IT: SDSC with Penn State, San Diego State University
Geosciences: Arizona State University, Bryn Mawr College, Cornell University, Rice University, UNAVCO, University of Arizona, University of Idaho, University of Missouri, University of Texas El Paso, University of Utah, Virginia Tech
Education and Outreach: DLESE, Cornell, UNAVCO
Agency Partner: USGS
13. GEON IT Issues Close collaboration between geoscientists and IT to interlink databases and Grid-enable applications
Deep data modeling of 4D data
Situating 4D data in contextspatial, temporal, topic, process
XML-based standards for data exchange
Semantic integration of Geosciences data
Logic-based formalisms to represent knowledge and map between ontologies
Begin to define UGLS (Unified Geosciences Language System), a la UMLS in medicine
Metathesaurus, Semantic Network, Lexicon
14. GEON IT Issues Learning from the BIRN project
The GEON Grid: heterogeneous networks, compute nodes, storage capabilities
Deploy grid and cluster software across GEON
SDSC SRB, ROCKS, Globus
Leverage BIRN, TeraGrid experience
Sharing data, tools, and compute resources, SETI@home model
15. GEON IT Issues Advanced visualization capability
Visualization of semantic structures for knowledge discovery
Augmented reality facilities
Remote visualization using Visualization Center at Scripps and SDSU Viz lab
17. Challenges Developing open standards at all levels: data, information, knowledge
Data: structured binary formats, XML
Metadata: XML and its dialects (e.g. WSDL)
Knowledge: knowledge representation techniques, ontologies/terms
GEON is working with social scientists who want to study the interactions and dynamics
Moving from individual PI-oriented research to collaborative research (or from individual dept./agency to inter-agency)
How to deal with re-purposing of data and information?
Incentives for sharing and cooperation
18. Challenges: Understanding how best to interact with science communities Interacting with scientists and data managers