220 likes | 371 Views
SDSC and CIEG Overview CIEG Workshop April, 2007. Anke Kamrath Division Director, San Diego Supercomputer Center kamratha@sdsc.edu. ~400 Staff Production HPC and Data Staff Numerous Science Research Projects and Computational Scientists Software & Technology R&D.
E N D
SDSC and CIEG OverviewCIEG WorkshopApril, 2007 Anke Kamrath Division Director, San Diego Supercomputer Center kamratha@sdsc.edu
~400 Staff Production HPC and Data Staff Numerous Science Research Projects and Computational Scientists Software & Technology R&D Data and Knowledge Systems Grids Science Research and Development Next-Generation Storage SDSC HPC User Services and Training SDSC Overview
GAMESS Geosciences Data Managementand Mining Astronomy Physics QCD Modeling and Simulation Science is a Team Sport Life Sciences
Cyberinfrastructure – A Unifying Concept Cyberinfrastructure= resources(computers, data storage, networks, scientific instruments, experts, etc.) + “glue”(integrating software, systems, and organizations). NSF’s “Atkins Report” provided a compelling vision for integrated Cyberinfrastructure
Next GenerationBiology Workbench GPFS / SYNTHESIS CENTER Empowering Science and Engineering Communities • Empowering Scientific Communitiesinvolves deep community collaborations and the development of software tools for transforming data to discovery
Data from instruments Data from sensors Data from simulations Data from analysis A Deluge of Data • Today data comes from everywhere • “Volunteer” data • Scientific instruments • Experiments • Sensors and sensornets • Computer simulations • New devices (personal digital devices, computer-enabled clothing, cars, …) • And is used by everyone • Researchers, educators • Consumers • Practitioners • General public • Turning the deluge of data into usable information for the research and education community requires an unprecedented level of integration, globalization, scale, and access Volunteer data
SRB Summer Institute IT Using Data as a Driver: SDSC Cyberinfrastructure Community Databasesand Data Collections,Data management, mining and preservation Data-oriented HPC, Resources, High-end storage, Large-scale data analysis, simulation, modeling Biology Workbench SDSCData Cyberinfrastructure Data-oriented Tools, SW Applications, and Community Codes Data- and Computational Science Education and Training Collaboration, Service and Community Leadership for Data-oriented Projects
SDSC Production Resources • SDSC DATA COLLECTIONS, ARCHIVAL AND STORAGE SYSTEMS • 2.4 PB Storage-area Network (SAN) • 25 PB StorageTek/IBM tape library • HPSS and SAM-QFS archival systems • DB2, Oracle, MySQL • Storage Resource Broker • Supporting servers: IBM 32-way p690s, • 72-CPU SunFire 15K, etc. • http://datacentral.sdsc.edu/ Support for community data collections and databases Data management, mining, analysis, and preservation SDSC HIGH PERFORMANCE COMPUTING SYSTEMS • DataStar • 15.6 TFLOPS Power 4+ system • 7.125 TB total memory • Up to 4 GBps I/O to disk • 115 TB GPFS filesystem • Blue Gene Data • First academic IBM Blue Gene system • 17.1 TF • 1.5 TB total memory • 3 racks, each with 2,048 PowerPC processors and 128 I/O nodes • TeraGrid Cluster • 524 Itanium2 IA-64 processors • 2 TB total memory • Also 16 2-way data I/O nodes http://www.sdsc.edu/ user_services/ • SDSC SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES • User Services • Application/Community Collaborations • Education and Training • SDSC/Cal-IT2 Synthesis Center • Data-oriented Community SW, toolkits, portals, codes • http://www.sdsc.edu/
Getting an Allocation – It’s Free! • Open to researchers affiliated with U.S. academic and non-profit research institutions • Proposals reviewed quarterly • Several types of allocations: • Development Allocations • Quick turnaround • Up to 10,000 service units (CPU-hours) • Medium Allocations • Reviewed quarterly • Between 10,000-500,000 service units • Large Allocations • Reviewed twice a year • Over 500,000 service units • Getting Started: http://www.sdsc.edu/user_services/ • SDSC Data Allocations: • Getting Started: http://datacentral.sdsc.edu
SDSC Strategic Applications Collaborations (SAC) Program • Goal: • Make significant impact in enabling and enhancing HPC, Data, Vis for user community. • Approach: • SDSC’s domain science and HPC/data/viz expert staff paired with PIs for projects lasting 3-12+ months • Recruit new users from traditional fields and new users from non-traditional fields • Generalize solutions applicable to wider user community • Examples: • “Scale up” newly recruited users 100sK – 1000sK SU users • Optimize parallel algorithms, I/O performance, parallel scaling (extreme scaling for petascale), single processor performance;
SAC Example: DNS Turbulence • DNS (Direct Numerical Simulation) code used for years to simulate a range of phenomena in turbulence and turbulent mixing. • Over the years PI had millions of allocated SUs on SDSC and other NSF center’s machines • Original code was limited in scalability by N processors for N^3 grid problem. SAC improvements increased to N^2 processor scalability. • Allows significantly bigger problems • Allows faster time to solution • Now computing at 2048^3 grid resolution on DataStar. Would like to reach the grid size capabile on the Earth Simulator i.e. 4096^3 grid resolution, to better understand physics at micro scales
SAC Example: DNS Turbulence (cont) • Reimplemented in 2-D parallel decomposition of the compute-intensive part (3D FFT) • Now capable of scaling up to N2 processors • 16M processors for 4096 grid • New code successfully tested on 32,768 BG processors at IBM Watson lab • Achieved 4096^3 grid • First ever in US! • By-product: optimized library for scalable 3D FFT, for use in other codes. Beta version available at SDSC Web site. The library was used in another turbulence code (PI; Krishnana, U. Minn); other PIs and IBM also interested. The execution speed (# of steps per second of execution), normalized by the problem size, is plotted on the Y-axis.
Radiologists and neurosurgeons at Brigham and Women’s Hospital, Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment Transmission repeated every hour during 6-8 hour surgery. Transmission and output must take on the order of minutes Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons Better Neurosurgery Through Cyberinfrastructure • PROBLEM:Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue • Brain deforms during surgery • Surgeons must align preoperative brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation
Community Data Repository: SDSC DataCentral • Provides “data allocations” on SDSC resources to national science and engineering community • Data collection and database hosting • Batch oriented access • Collection management services • First broad program of its kind to support research and community data collections and databases • Comprehensive resources • Disk:300 TB accessible via HPC systems, Web, SRB, GridFTP • Databases:DB2, Oracle, MySQL • SRB:Collection management • Tape:25 PB, accessible via file system, HPSS, Web, SRB, GridFTP • 24/7 operations, collection specialists DataCentral infrastructure includes: Web-based portal, security, networking, UPS systems, web services and software tools
Sampling of Public Data Collections Hosted in DataCentral Earth Sciences Nexrad ERESE UCI ESMF Earthref.org ERDA ERR Tsunami Data Biology AfCS Molecule Pages Bee Behavior Biocyc (SRI) CKAAPS CIPRES DigEmbryo Encyclopedia of Life Gene Ontology Interpro Mirror JCSG Data PDB TreeBaseYeast Regulatory Network Apoptosis Database Networking Backbone Header Traces Backscatter Data HPWREN IMDC Skitter Seismology 3D Ground Motion Collection Terashake CyberShake Astronomy NVO - Digsky SLOAN Hayden Planetarium LUSciD/ENZO Galactic ALDA HI Survey Neuroscience Salk Neural Basis of Visual Perception Human Brain Dynamics Resource Education Merced Library Transana NSDL Physics AMANDA Stripe Glasses Higgs Boson at LHC
Identifying Brain Disorders Remote visualization allows analysis of multi-TB brains without high data transfer costs, expanding productivity by more than ten-fold New information about the Heavens Aggregate information from the world’s largest telescopes compared to provide new information on the existence and behavior of astronomical objects Simulating the Universe from First PrinciplesLarge-scale ENZO runs enable spatial mapping and simulated sky surveys; 26 TB output Data Fundamental Component of New Discovery in Science and Engineering
How much Data is there?* iPod Shuffle (up to 120 songs) = 512 MegaBytes Printed materials in the Library of Congress = 10 TeraBytes 1 human brain at the micron level= 1 PetaByte SDSC HPSS tape archive =25 PetaBytes 1 novel = 1 MegaByte All worldwide information in one year = 2 ExaBytes 1 Low Resolution Photo = 100 KiloBytes * Rough/average estimates
Data Systems SAM/QFS HPSS GPFS SRB Data Services Data migration/upload, usage and support (SRB) Database selection and Schema design (Oracle, DB2, MySQL) Database application tuning and optimization Portal creation and collection publication Data analysis (e.g. Matlab) and mining (e.g. WEKA) DataCentral Data-oriented Toolkits and Tools Biology Workbench Montage (astronomy mosaicking) Kepler (Workflow management) Vista Volume renderer (visualization), etc. SDSC Services, Tools, and Technologies for Data Management and Synthesis
Cyberinfrastructure Experiences for Graduate Students (CIEG) Program • Preparing Students for high-end computational and data science and engineering • Using Cutting-Edge Resources • Anticipating Future Technology Directions and their applicability to your field • 10-week Summer Program that Partners Students with SDSC experts on SAC Team (Compute, Data, Vis) From NSF Announcement • “help foster a generation of researchers for whom such tools are incorporated naturally into advancing the research field.” • “expand the community of researchers with the necessary skills and experience to conduct sophisticated research involving cyberinfrastructure.”
Thank You kamratha@sdsc.edu www.sdsc.edu