1 / 19

Introduction to Scientific Data Grid

Introduction to Scientific Data Grid. Kai Nan Computer Network Information Center, CAS nankai@sdb.ac.cn. What is Scientific Data Grid. one-sentence statement

yachi
Download Presentation

Introduction to Scientific Data Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Scientific Data Grid Kai Nan Computer Network Information Center, CAS nankai@sdb.ac.cn

  2. What is Scientific Data Grid • one-sentence statement • a grid which focuses on sharing multi-discipline scientific data and advancing cooperative research based on the utilization of scientific data • more words • built upon the Scientific Databases of CAS • started in 2001 • plan to provide service by 2004-2005 • for academic and research • built by CAS, open to the world CANS '2002 Shanghai

  3. Scientific Databases (SDB) • SDB is a project funded by CAS since 1986 • SDB is a collection of scientific databases, which cover multiple disciplines including chemistry, biology, geography, astronomy, ecology, … • By 2005, SDB will be • 40+ member institutions across China • 300+ databases • data volume 10TB+ • Distributed & Heterogeneous CANS '2002 Shanghai

  4. Information Power Grid Grid Computing CANS '2002 Shanghai

  5. why SDG – motivation • resource level – sharing and development • make the scientific data more accessible • data integration • data – information – knowledge • app level – emerging scientific applications • do what we can’t do before • rely on data • cross multiple databases / cross-disciplinary • demand more resources (cycle, storage, bandwidth, instrument, sensor, …) CANS '2002 Shanghai

  6. Requirements • Identification • Provenance • Metadata • technical/context/content/management • Access Control • Universal Access Interface • Publishing/Discovery/Retrieval • Data Lifecycle • … CANS '2002 Shanghai

  7. Simplified 3 steps • find the data • and get related info. (metadata) • obtain proper rights towards the data • access the data • maybe multiple distributed and heterogeneous databases involved within one request • maybe not just data, but processing and/or analysis • these steps seem to be easy, but … CANS '2002 Shanghai

  8. Tasks • Testbed • One data center • Three subject centers • Middleware • Information Service • Security System • Data Access Interface • Application • chemistry/biology/astronomy/geoscience/… CANS '2002 Shanghai

  9. Bio CenterCluster 8 nodes1-2TBBeijing SDG Resources: 20 TB 4 PC Clusters CSTNET 1000M Data Center (CNIC)Cluster 16 nodes15TB 155M 1000M Geo CenterCluster 8 nodes1-2TBBeijing Chemistry CenterCluster 8 nodes1-2TBShanghai CANS '2002 Shanghai

  10. SDG Data Center CA Server PortalServer Mass Storage Database Application Server MDS Server Supercomputers at CNIC ~2 TFLOPS CANS '2002 Shanghai

  11. Grid Middleware • Globus • Resource Management (GRAM) • Information Service (MDS) • Data Management (GridFTP) • Security (GSI) • Storage Request Broker (SRB) • SDSC’s solution for data grid CANS '2002 Shanghai

  12. SDG Middleware Application applications xMDS app-oriented, unified program interface GAPI coordinated access to multiple data resources DRB universal access interface to single data resource UAI GSI local data management system, could be DBMS or file system Local DBMS databases CANS '2002 Shanghai

  13. Use case (1) Node H MDS App X GAPI DRB UAI Node A DBMS App GAPI DRB UAI DBMS CANS '2002 Shanghai

  14. Use case (2) • Single sign-on • Query MDS • AppGAPIDRB • DRB(H)UAI(A, B, C) • UAIlocal DBMSDB MDS Node H App X GAPI DRB UAI DBMS Node A Node C Node B App App App GAPI GAPI GAPI DRB DRB DRB UAI UAI UAI DBMS DBMS DBMS CANS '2002 Shanghai

  15. Node Z App X GAPI DRB UAI DBMS Use case (3) MDS Node H App X GAPI DRB UAI DBMS Node A Node C Node B App App App GAPI GAPI GAPI DRB DRB DRB UAI UAI UAI DBMS DBMS DBMS CANS '2002 Shanghai

  16. Projects • CAS • the Tenth Five-year Program (2001-2005)– funded (37M RMB) • 863 Program (by MOST) • a special program for grid – proposed CANS '2002 Shanghai

  17. Milestones • mid 2003 • testbed built • end 2003 • middleware developed • 2004 • deployment and test run • 2005 • applications developed and production run CANS '2002 Shanghai

  18. Collaboration • PRAGMA • APGrid • SDSC • KISTI • ASCC • Texas A&M Univ. • … CANS '2002 Shanghai

  19. Thank you ! CANS '2002 Shanghai

More Related