480 likes | 636 Views
Computing and Storage Resources at the San Diego Supercomputer Center. Natasha Balac, Ph.D. UC Associates Program August 3, 2004. What is SDSC?. Founded in 1985 One of three NSF-funded supercomputer centers
E N D
Computing and Storage Resources at the San Diego Supercomputer Center Natasha Balac, Ph.D. UC Associates Program August 3, 2004
What is SDSC? • Founded in 1985 • One of three NSF-funded supercomputer centers • Provides resources to national academic and non-profit community above and beyond what an individual university can provide • Peer-review proposal system, no cost to academics/non-profits • But we are much more
SDSC • Employs nearly 400 researchers, staff and students • Leading edge site for NSF’s National Partnership for Advanced Computational Infrastructure (NPACI) • One of 5 sites of NSF’s TeraGrid/ETF project • Home of many associated activities including • Protein Data Bank • Alliance for Cell Signaling • Cooperative Association for Internet Data Analysis (CAIDA) • High Performance Wireless Research and Education Network (HPWREN) • Geosciences Network (GEON) • Joint Center for Structural Genomics • Protein Kinase Resource, etc.
SDSC’s Mission • To develop and use technology to advance science • We do this through our use of • Hardware • Software • Expertise / Personnel in computation, data management and visualization
A Range of Hardware Resources • 14TFlops of aggregate compute power • Nearing 100 Tflops across all NSF centers • 5.7TB aggregate memory • 500TB SAN file systems • 6 Petabytes of tape archive • Thousands of active users • Part of U.S. TeraGrid initiative • 40Gb/s backbone connects the center to other supercomputer centers • Wide range of HPC applications
75 TB Storage 750 4pAlpha EV68 Quadrics 4 32p EV7 Marvel 20p Vis 16 2p (ER) Madison Quadrics TeraGrid Compute Resources 4 Lambdas LA CHI 96 GeForce4 Graphics Pipes 100 TB DataWulf 96 Pentium4 64 2p Madison Myrinet 32 Pentium4 52 2p Madison 20 2p Madison Myrinet 20 TB Caltech ANL 1.1 TF Power4 Federation 128 2p Madison 256 2p Madison Myrinet 256 2p Madison 667 2p Madison Myrinet 500 TB FCS SAN 230 TB FCS SAN SDSC NCSA PSC
TeraGrid Application Targets • Usage exemplars • “traditional” supercomputing made simpler • remote access to data archives and computers • distributed data archive access and correlation • remote rendering and visualization • remote sensor and instrument coupling
SDSC’s Mission • To develop and use technology to advance science • We do this through our use of • Hardware • Software • Expertise / Personnel in computation, data management and visualization
Production Software in many research areas • www.npaci.edu/Applications • Applications in variety of research areas: • Biomolecular Structure • Molecular Mechanics/Dynamics • Quantum Chemistry • Eng. Structural Analysis • Finite Element Methods • Fluid Dynamics • Numerical Libraries • Linear Algebra • Differential Equations • Graphics/Scientific Visualization • Grid Computing • Data Mining and Analysis
BLAST CLUSTALW Biology WorkBench CNS NAMD Amber NWChem CHARMM Parallel MOPAC GAMESS Gaussian DataCutter Variety of scientific software installed and maintained • We’re happy to install your favorite package
SDSC’s Mission • To develop and use technology to advance science • We do this through our use of • Hardware • Software • Expertise / Personnel in computation, data management and visualization
Expertise integrating technology and applications • 400 personnel • Experts in wide range of fields • Deploying the largest supercomputers and networks • Building clusters • Designing storage area networks • Portal design • Bioinformatics • Web Services • Cross-disciplinary expertise • Ability to map applications onto hardware efficiently • Understanding of hardware and science
A Range of Personnel Resources Expertise • parallelizing/optimizing code • portal-based access • grid computing • data mining • web services • vizualization • Peer-review process • www.paci.org • Multi-year awards possible • Database/Data collection hosting/persistent archiving
Visualization Serviceshttp://vis.sdsc.edu/ • Scalable Visualization Toolkit and Applications • Visualization Service Grid • Cancer Center Visualizations • SAC Visualizations • NPACI Visualization Software • Scientific imaging and animation production • Customized visualization solutions • Gaming Grid for Research and Education • Visualization Training • OpenDX workshops • Maya workshops and short courses
VGrid Portal Visualization Services Grid Grid Farm SRB Archive VGrid Gallery SVT Grid Services Workstation Alley
UCSD Cancer Center Visualizations James R. Feramisco -Digital Imaging Resource Leader
Volume Visualization of the Orion Nebula The San Diego Supercomputer Center andThe American Museum of Natural History Hayden Planetarium Hubble Space Telescope images of the Orion Nebula and the HST-10 proplyd.
Astronomical Visualization Visualization of An Emission Nebulae from 3 Terabytes of Simulation Data Credits: American Museum of Natural History Dave Nadeau SDSC Erik Engquist SDSC
Center for Visualization Prototypes Bringing physical prototyping technology into the visualization mainstream Contact mjb@sdsc.edu for details orgo to http://cvp.sdsc.edu
DAKS – Data and Knowledge Systems • DAKS creates data and knowledge cyber infrastructure for scalable, end-to-end knowledge discovery pipelines in data-intensive computing • Integrated enabling technologies include: • data gathering and data grid tools • web services • massive storage • large-scale databases • data mining • knowledge integration • publishing in digital libraries • long-term preservation in persistent archives
Major Projects • GEON: GEOsciences Network – Integration of multi-disciplinary Earth Science databases • SEEK: Science Environment for Ecological Knowledge – Modeling frameworks, semantic integration, workflow systems for environmental modeling • National Archives and Records Administration (NARA)– Persistent archives and electronic records management • GeoGRID – Research on integrating geospatial information from multiple, heterogeneous sources, including studying the metadata necessary to describe the geospatial content & services, as well as accuracy-aware query processing techniques • Keck Graduate Institute– Twin framework to analyze, model & design robust, complex networks using biological & computational principals
Major Projects • Grid Benchmarks– Defining metrics to measure performance of Grid applications and architectures and to rate their functionality and efficiency • I2T: Information Integration Testbed – Set of tools and technologies that are being developed to provide a testbed for information integration • GriPhyN: Grid Physics Network– Develop & build production-scale data grids • SciDAC/Scientific Data Management – Five-year initiative to develop scientific computing infrastructure for terascale computers to advance research programs in basic energy sciences, biological & environmental research, fusion energy sciences, and high-energy and nuclear physics
Major Projects • Southern California Earthquake Data Center (SCEDC) – Primary archive of earthquake data for southern California • Bridges– Integrated framework for health monitoring of highway bridges & civil infrastructure • INGREIN – Integrated, Geo-Referenced Environmental Information Network • National Virtual Observatory – Federation of over 100 terabytes of astronomical data from more than 50 collections • NPACI Neuroscience – Infrastructure to support the study of brain structure • SRB: Storage Resource Broker – Middleware providing a uniform API to access heterogeneous distributed storage resources
Major Projects—DAKS Involvement • TeraGrid – Multi-year effort to build and deploy the world's largest, most comprehensive distributed infrastructure for open scientific research • BIRN: Biomedical Informatics Research Network – Standardizing imaging protocols, developing database schemas around their data, defining processing pipelines for upload & analysis of data, and assembling large imaging caches • OptIPuter– Cyber “infostructure” to support data-intensive scientific research and collaboration • LTER:Long-Term Ecological Research Network – Investigation of long-term ecological phenomena in U.S. • WIISARD: Wireless Internet Information System for Medical Response in Disasters – Sophisticated wireless technology to coordinate and enhance care of mass casualties in a terrorist attack or natural disaster
Major Projects—DAKS Involvement • BorderSafe – Infrastructure for sharing and evaluation of information between local law enforcement and the Department of Homeland Security • Gene Regulatory Networks – integrating data from disparate sources to create an interaction graph of gene and protein regulation. Performing graph queries to reveal interesting unkown interactions • Cell-Centered Database – On-line resource for high resolution cell-centered data • ROADNet: Real-time Observatories, Applications, and Data management Network – Integrated information management system & wireless networks to deliver seismic, oceanographic, ecological, hydrological, and physical data to end users in real-time
BIRN NIH NCRR award to … • Enhance access to 4T – 8T MRI & other imaging modalities for neuroscience studies • Enhance telecommunications & telemedicine efforts for GCRC sites, co-located with NCRR imaging centers • Develop bioinformatics tools & data fusion for PET, CAT, EEG, MRI • Extend to all NCRR Resource sites & expand model to other areas Partners • National Institutes of Health (NIH) • National Science Foundation (NSF) • UCSD • SDSC
BorderSafe Enabling Intelligent, Policy-based Information Sharing Technology • Framework for Automated Sharing and Analysis • Service Oriented Architecture for integrating new and legacy data and analytical resources. • Explore and implement strong auditing using evolving Web Service security standards • Crime Analysis Tool Evaluation • Evaluation of current law enforcement analysis and data mining tools • Exploration of intelligent data analysis tool extensions SAN DIEGO SUPER- COMPUTER CENTER, • Policy-based Information Sharing Research • Leverage policy-based resource discovery and sharing research done in Grid Services and apply them to Homeland Security Domain • Inter-organization Law Enforcement Community Building • Discovering and addressing data integration and analysis needs by working directly with domain experts
GEON Enabling Integrated Views of the Earth System by Finding Answers to Vital Questions Estimating flood and landslide potential, groundwater problems, volcanic activity, and soil quality—all with the best data available Accelerating science Through the GEON Portal, researchers will be able to discover relationships of the type that led to plate tectonics in days instead of years Democratizing Grid Technologies Building cyberinfrastructure for a wide range of users, from scientists and educators to policymakers and engineers Building Reusable Cyberinfrastucture A model for the Earth Sciences and beyond
Federation AfCS BIRN EOL JCSG PDB PKR ToL Blast GenBank FED_SDSC Vendor API SQL DB2 DB2 D B 2 D B 2 Oracle Oracle Flat File(s) Flat File(s) MS SQL MS SQL MySQL MySQL BLAST BLAST XML File(s) XML File(s) Advanced Database Projects Research & system development • Infrastructure for data mining, data warehousing, and query processing • Grid data services to make data available to researchers via traditional methods and API's which allow simple storage and retrieval of data regardless of type, size, and physical Federation Ontology Grid Data Service Federation Master Ontology Source Specific Ontology Federation issues • Performance, reliability, authentication • Data location • Network connections between users & data • Need for replication • Google-like tool to search grid data services for data & access content
Advanced Query Processing Volume algebra for Neuroscience Research & system development • Data Modeling services for scientific applications – volumetric data, multimedia, … • Ontology construction and searching for large-scale problems – query evaluation techniques • Techniques for management of graph-structured information – representation and query language development, browsing tools • Data Warehousing for Interaction Networks – graph views over relational data • Information Integration techniques over multiple data models (spatial, relational, graph…) with Ontologies • Simulation of process networks using Hybrid Functional Petri Nets Modeling and integrating biological pathways Co-browsing ontologies, atlases, data
Distributed Data Management Storage Resource Broker Collections at SDSC • Data collecting • Sensor systems, object ring buffers and portals • Data organization • Collections, manage data context • Data sharing • Data grids, manage heterogeneity • Data publication • Digital libraries, support discovery • Data preservation • Persistent archives, manage technology evolution • Data analysis • Processing pipelines, manage knowledge extraction Project Size Files Users Agency
(( )) sensor WS Knowledge and Information Discovery In Environment (TeraBridge, LTER Network, PRAGMA) Research & system development • Data mining and machine learning • Services-based knowledge discovery infrastructure • Analysis of complex data – real-time streams, sensor networks, remote sensing imagery, microarray data, large text collections • Decision support systems – environmental monitoring, law enforcement, forensics, and homeland security • Support for knowledge discovery projects at SDSC and beyond SAN DIEGO SUPER- COMPUTER CENTER, In Homeland Defense (Border Safe)
Strategic Applications Collaborations • SDSC staff paired with domain scientists for 3-12 month projects • Past successes include • Biomedical imaging (U Michigan) • Brain mapping/computational anatomy (Johns Hopkins) • Computational modeling of the cochlea (U Michigan) • Molecular dynamics in large biomolecular systems (UCSF, TSRI) • SEQUEST (U Washington) • SCWRL (Fox Chase Cancer Center) • Protein Structure prediction (UCSD) • Protein fold recognition and classification (UCSD)
Academic Associate Program for UC Campuses • Resources dedicated to UC campuses to support research endeavors • Program provides University of California (UC) researchers access to the vast array of computational resources at the San Diego Supercomputer Center (SDSC) • Any qualified UC researcher can request supercomputing time, free of charge, through the AAP administrator for his or her respective campus
Academic Associate Program for UC Campuses • High-Performance Computing • Storage • Access to specialized Software, Databases and Archives • Technical Support • Training/Documentation • Early accessto new SDSC systems
Academic Associate Program for UC Campuses Computing • DataStar: 10 teraflops IBM Power4-based system with total memory of 4.2 terabytes • TeraGrid: 4.3 teraflops IA-64 system Storage • Petabyte-scale archival storage system • SAN disk array with a total capacity of more than 500 terabytes • http://datacentral.sdsc.edu/
Academic Associate Program for UC Campuses Access to specialized software/Databases/Archives • A variety of powerful software applications covering a range of disciplines including • Medicine, bioscience, physics, astronomy, chemistry, etc. • Large-scale data libraries, such as the • Protein Data Bank (PDB) • National Virtual Observatory (NVO) • 2-Micron All Sky Survey (2MASS) • User-friendly software for accessing large data collections • NPACKage – mature middleware and applications for grid computing, communication and archiving • A variety of data analysis, mining and visualization tools
Academic Associate Program for UC Campuses Technical support • From SDSC Scientific Computing Services, including consulting for parallel programming, optimization, porting, etc. Training • Quarterly workshops on parallel computing • 1-2 day on-site workshops • Priority seating at SDSC workshops • Web-based training • Special week-long summer institute • Focused work on participant’s projects • Student expenses paid • Data-intensive and grid applications focus this year Early access to new SDSC systems (Data Star)
Academic Associate Program for UC Campuses • How to apply? • Any UC researcher can request supercomputing time online http://www.sdsc.edu/aap.html • Campus representatives are available at each UC campus to help researchers with any questions or problems they may have regarding the Academic Associates program
Some collaboration ideas • Development of new algorithms • Optimization and parallelization of code • Development of portal interfaces to applications and other resources • Assistance writing successful proposals for SDSC resources • Assistance with cluster setup and maintenance • Hardware acquisitions, benchmarking and performance analysis • Data services: databases, data collections and data mining
And still more • Visualization assistance • Automation of data collection from wet labs • expertise in sensor/instrument data collection, CAL IT2 • Hosting of visiting scientists and students • Joint proposals • NSF’s ITR programs • TeraGrid participation • NIH initiatives
We’re interested in working together • Q&A • natashab@sdsc.edu • http://www.sdsc.edu • UC Academic Associates program • http://www.sdsc.edu/aap.html