250 likes | 355 Views
TeraGrid Science Gateways. Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu. Gateway use increased in 2008. .5M hours used on community accounts in 2007 2.5M hours used on community accounts in 2008 Big users SCEC tera3d
E N D
TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu TeraGrid ARCH meeting, January 29, 2009
Gateway use increased in 2008 • .5M hours used on community accounts in 2007 • 2.5M hours used on community accounts in 2008 • Big users • SCEC tera3d • Over 1M hours for hazard map calculations • GridChem • Computational chemistry • Robetta • Protein structure prediction using David Baker’s award winning Rosetta code • Up and coming users with large awards • SIDGrid, 1M hours TeraGrid ARCH meeting, January 29, 2009
SCEC using gateway to produce hazard map • PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years. • High resolution map, significant CPU use TeraGrid ARCH meeting, January 29, 2009
Social Informatics Data Grid • Heavy use of “multimodal” data. • Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov TeraGrid ARCH meeting, January 29, 2009
Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others. • SIDGrid enables a number of capabilities. • Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact. • Geographically distant researchers can collaborate on the analysis of the same data set. • Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts. • All researchers now have access to the highest quality computational resources • SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding (decoding and encoding between compression formats) algorithms for pitch analysis of audio tracks and fMRI image analysis • SIDGrid is unique among social science data archive projects • Focused on streaming data which change over time • Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously • Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK TeraGrid ARCH meeting, January 29, 2009
New Gateway Inquiries • Dr. Robert Boissy, Center for Genomic Sciences (CGS), Allegheny-Singer Research Institute, Allegheny General Hospital • CGS is early adopter of DNA sequencing platform from 454 Life Science Inc./Roche • Significant upgrade to this sequencing platform • Massively parallel, clone-free DNA pyrosequencing technology well suited for a variety of applications http://www.roche-applied-science.com/publications/multimedia/genome_sequencer/flx_presentation/wbt.htm TeraGrid ARCH meeting, January 29, 2009
FLX platform upgrade requires sites to support a 64-bit off-instrument cluster • Sequencing instrument run is controlled by a dedicated computer on the FLX sequencing instrument • Outputs relatively large set of binary files (several tens of GB uncompressed) • Off-loading of imaging data to home cluster for image-processing and signal-processing • Outputs relatively small set of text tiles (a few GB uncompressed) • Post-signal-processing steps such as genome assembly on home cluster • Currently, steps (1) and (2) are carried out "on-the-fly" on the FLX sequencing instrument during the course of a sequencing run. TeraGrid ARCH meeting, January 29, 2009
Purchase and maintenance of home cluster prohibitive for smaller sites • TeraGrid DAC a reasonable alternative • One-way data transfer, with time-limited archiving of the raw instrument output on TG • Roche/454 off-instrument software ported to TG compute resources • Advanced Support requested for this • Wenjun Wu at U Chicago working with Dr. Boissy’s team • Roche supportive of collaboration • "We wholeheartedly support your idea of using TeraGrid resources to process data generated by our system. Please let us know what we can do to help you achieve your goal - we have permissive licensing arrangements which will not stand in the way of your project. In addition, if the [image processing and signal processing] software needs some customization (though I am quite certain that will not be required), we are prepared to provide assistance in this respect." TeraGrid ARCH meeting, January 29, 2009
Significant benefit to the biomedical research community • Standardized work flow for the processing and storage of Roche/454 pyrosequencing data • Potentially saving granting agencies such as the NSF and their grant recipients millions of dollars • Enhancing the productivity of biomedical research scientists TeraGrid ARCH meeting, January 29, 2009
Analysis of Analytical Ultracentrifugation Data • Dr. Borries Demeler, The University of Texas Health Science Center at San Antonio Dept. of Biochemistry • The Center for Analytical Ultracentrifugation of Macromolecular Assemblies • Assist researchers with solution-state characterization of biological macromolecules and macromolecular assemblies by means of analytical ultracentrifugation. • Analytical ultracentrifugation services to outside investigators in both academia and industry TeraGrid ARCH meeting, January 29, 2009
UltraScan represents a comprehensive data analysis software package for hydrodynamic data from analytical ultracentrifugation experiments • Integrated data editing and analysis environment • Portable graphical user interface. • Beowulf module for Monte Carlo analysis • MySQL database backend for data management TeraGrid ARCH meeting, January 29, 2009
Laboratory Information Management System (LIMS) portal • Management of analytical ultracentrifugation (AUC) data for single users or entire facilities • Support for both the storage and the analysis of data • Retrieval of experimental data, results and for submitting analysis requests to the supercomputer, sharing data with collaborators • Seamless integration with UltraScan software • HPC facilities for 2-dimensional spectrum analysis and genetic algorithm analysis • 32 active institutions today [12/17/08] • International collaboration • Supercomputers in Europe to assist with the load • Technische University of Munich • Juelich Supercomputing Center TeraGrid ARCH meeting, January 29, 2009
Dr. Saul Kravitz, Director of Bioinformatics Software, J. Craig Venter Institute • DAC awarded • Portal to NIAID Bioinformatics Resource Centers • Computation services such as Annotation, Homology Search, etc to the 4 bioinformatics resource centers, or to members of their research community • Developed infrastructure that should easily port to Teragrid for large scale parallel execution TeraGrid ARCH meeting, January 29, 2009
Urgent Computing Interest • NASA ROSES Research Proposal • “Spatial Decision Support System for Wildfire Emergency Response and Evacuation”, Dr. Douglas Stow, SDSU • The SDSS Web portal will provide two major functions for emergency evacuation services: • Seamless data/mapping resource integration and • High performance fire spread and evacuation modeling capability. • Automate the data collection, data input formatting, GIS model processing, and rendering of model results on 2D maps and 3D globes • FARSITE (Fire Area Simulator), developed by United States Department of Agriculture (USDA) Forest Service • Integrates existing models for surface fire, crown fire, spotting, post-frontal combustion, and fire acceleration into a two-dimensional fire growth model (http://www.fire.org). TeraGrid ARCH meeting, January 29, 2009
TeraGrid Pathways Activities • 2 Gateway components • Adapt gateways for educational use by underrepresented communities • GEON – SDSC, Navajo Tech • Teach participants from underrepresented communities how to build gateways • PolarGrid – IU, ECSU TeraGrid ARCH meeting, January 29, 2009
Navajo Technical College and gateways • Incorporating the use of gateways in their curricula • GEON, GISolve areas of initial interest • Early work includes installation • of GEON node at NTC, adaptations • for faculty datasets: • GIS instructor • GIS for network layouts • Maps of animal locations and • population spread • DNA data for environmental science • faculty TeraGrid ARCH meeting, January 29, 2009
PolarGrid • Cyberinfrastructure Center for Polar Science (CICPS) • Experts in polar science, remote sensing and cyberinfrastructure • Indiana, ECSU, CReSIS • Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland • Most existing ice sheet models, including those used by IPCC cannot explain the rapid changes http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v Source: Geoffrey Fox TeraGrid ARCH meeting, January 29, 2009
Components of PolarGrid • Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster • Prototype and two production expedition grids feed into a 17 Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training. • Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system • Access to expensive data • High-end resources for analysis • TG Pathways efforts • Extend gateway and teach ECSU students how to build a gateway TeraGrid ARCH meeting, January 29, 2009 Source: Geoffrey Fox
Oct-Dec 08 activities • Gateway documentation moves into production • www.teragrid.org/gateways • PI and developer pages • Rollover “on this page” • Getting started guide • Success stories (need more) • Rotating mages (need more) • Recommended software area, anyone can add to this list • Original primer content from Anurag Shankar at IU includes many “best practices” • Gateway volunteer Ian Stokes-Rees from Harvard • Structural biology to OSG and TG • Potential international collaboration with Chinese Academy of Sciences • Shaowen Wang, GISolve TeraGrid ARCH meeting, January 29, 2009
Oct-Dec 08 activities • SimpleGrid deployed on Quarry’s gateway hosting environment • SC08 megajob BOF • GCE08 workshop at SC08 • Java Script Community Grid (Cog) kit demo from Gregor Von Laszewski, Rochester Institute of Technology • If you are interesting in using Java Script CoG for your gateway, Gregor has graduate students he can assign to help, gregor at rit.edu • http://www.collab-ogce.org/gce08) • Eucalyptus demo from Rich Wolski • OGCE overview from Marlon Pierce TeraGrid ARCH meeting, January 29, 2009
40 institutional members • 9 foreign affiliates • Researchers request synthetic seismograms for any given earthquake • Allows scientists to understand the ground motion associated with any given earthquake • Requested and received advanced support from TeraGrid TeraGrid ARCH meeting, January 29, 2009
CIG Update part I • CIG is a software repository (specfem3d and others) • runs forward simulation from description of seismic event - get seismograms automatically delivered from real earthquakes • symmetric tensor (6 numbers) can describe an earthquake • few numbers go into large run, well-suited to wrapping for the web, users specify stations for output • grid up the earth, timesteps, usual types of simulation components • Can choose from a few competing models, can also evaluate accuracy of different models • how accurate are the models for different scenarios? • data from all earthquakes everywhere over last 30-40 years • use results to back into the right earth model TeraGrid ARCH meeting, January 29, 2009
CIG Update part 2 • people log on to portal, but everyone sees all runs, not a high sensitivity data set • runs on lonestar, 2.5 hours, depends on how fine a mesh is used - workflow very similar to batch mode, output looks like list of batch jobs • global portal in a month or two - web-based interface to batch computation, but also RPC-based interface, • development of new earth models - proof-carrying code, way to allow end users to inject code into a system, - can make restricted sets of the Python language that are safe - no forks, no opening files - signed .o files? TeraGrid ARCH meeting, January 29, 2009
Dynamic Accounts and the viz gateway – part 1 • Released in production just before SC • Lack of data segregation with community account was a problem • Gateway used for many different domains • File save and upload capabilities meant that users could potentially overwrite one another's data • wanted individual accounts on back end to protect individual's data • Use Globus dynamic account service • individual credentials for each of the users - dynamic accounts mapped to gateway community account on the back end, dynamic accounts never recycled • can create unix logins on the fly or have a pool of accounts (the latter is what viz gateway does) • some places create an account for each job that was run, same user could use all sorts of different accounts • viz gateway doesn't do this, a dynamic account stays with a given user forever though files are purged after 30 days of inactivity TeraGrid ARCH meeting, January 29, 2009
Dynamic Accounts and the viz gateway – part 2 • Users can only start standard set of jobs • Revocation service? • yes, can disable from within dynamic account service • Filter in AIME packet processing, recognizes that dynamic accounts are used (PBS thinks this runs as a dynamic account) and attributes usage to community account in AIME packet that goes to TGCDB • keep track of local and global jobids for each job and user so they can go back and map usage to the right user • Services available • Proxy manager, ile management, third party transfer • Paraview portlet, around for years, since beginning of viz gateway - when you log in, you see only services you can use • Volume rendering, used in anatomy course at U Chicago, outside class students can't explore anatomy datasets • Will add dynamic accounts to gateway documentation, high level description and pointers to more extensive documentation TeraGrid ARCH meeting, January 29, 2009