1 / 25

TeraGrid Science Gateways

TeraGrid Science Gateways. Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu. Gateway use increased in 2008. .5M hours used on community accounts in 2007 2.5M hours used on community accounts in 2008 Big users SCEC tera3d

Download Presentation

TeraGrid Science Gateways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu TeraGrid ARCH meeting, January 29, 2009

  2. Gateway use increased in 2008 • .5M hours used on community accounts in 2007 • 2.5M hours used on community accounts in 2008 • Big users • SCEC tera3d • Over 1M hours for hazard map calculations • GridChem • Computational chemistry • Robetta • Protein structure prediction using David Baker’s award winning Rosetta code • Up and coming users with large awards • SIDGrid, 1M hours TeraGrid ARCH meeting, January 29, 2009

  3. SCEC using gateway to produce hazard map • PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years. • High resolution map, significant CPU use TeraGrid ARCH meeting, January 29, 2009

  4. Social Informatics Data Grid • Heavy use of “multimodal” data. • Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov TeraGrid ARCH meeting, January 29, 2009

  5. Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others. • SIDGrid enables a number of capabilities. • Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact. • Geographically distant researchers can collaborate on the analysis of the same data set. • Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts. • All researchers now have access to the highest quality computational resources • SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding (decoding and encoding between compression formats) algorithms for pitch analysis of audio tracks and fMRI image analysis • SIDGrid is unique among social science data archive projects • Focused on streaming data which change over time • Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously • Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK TeraGrid ARCH meeting, January 29, 2009

  6. New Gateway Inquiries • Dr. Robert Boissy, Center for Genomic Sciences (CGS), Allegheny-Singer Research Institute, Allegheny General Hospital • CGS is early adopter of DNA sequencing platform from 454 Life Science Inc./Roche • Significant upgrade to this sequencing platform • Massively parallel, clone-free DNA pyrosequencing technology well suited for a variety of applications http://www.roche-applied-science.com/publications/multimedia/genome_sequencer/flx_presentation/wbt.htm TeraGrid ARCH meeting, January 29, 2009

  7. FLX platform upgrade requires sites to support a 64-bit off-instrument cluster • Sequencing instrument run is controlled by a dedicated computer on the FLX sequencing instrument • Outputs relatively large set of binary files (several tens of GB uncompressed) • Off-loading of imaging data to home cluster for image-processing and signal-processing • Outputs relatively small set of text tiles (a few GB uncompressed) • Post-signal-processing steps such as genome assembly on home cluster • Currently, steps (1) and (2) are carried out "on-the-fly" on the FLX sequencing instrument during the course of a sequencing run. TeraGrid ARCH meeting, January 29, 2009

  8. Purchase and maintenance of home cluster prohibitive for smaller sites • TeraGrid DAC a reasonable alternative • One-way data transfer, with time-limited archiving of the raw instrument output on TG • Roche/454 off-instrument software ported to TG compute resources • Advanced Support requested for this • Wenjun Wu at U Chicago working with Dr. Boissy’s team • Roche supportive of collaboration • "We wholeheartedly support your idea of using TeraGrid resources to process data generated by our system. Please let us know what we can do to help you achieve your goal - we have permissive licensing arrangements which will not stand in the way of your project. In addition, if the [image processing and signal processing] software needs some customization (though I am quite certain that will not be required), we are prepared to provide assistance in this respect." TeraGrid ARCH meeting, January 29, 2009

  9. Significant benefit to the biomedical research community • Standardized work flow for the processing and storage of Roche/454 pyrosequencing data • Potentially saving granting agencies such as the NSF and their grant recipients millions of dollars • Enhancing the productivity of biomedical research scientists TeraGrid ARCH meeting, January 29, 2009

  10. Analysis of Analytical Ultracentrifugation Data • Dr. Borries Demeler, The University of Texas Health Science Center at San Antonio Dept. of Biochemistry • The Center for Analytical Ultracentrifugation of Macromolecular Assemblies • Assist researchers with solution-state characterization of biological macromolecules and macromolecular assemblies by means of analytical ultracentrifugation. • Analytical ultracentrifugation services to outside investigators in both academia and industry TeraGrid ARCH meeting, January 29, 2009

  11. UltraScan represents a comprehensive data analysis software package for hydrodynamic data from analytical ultracentrifugation experiments • Integrated data editing and analysis environment • Portable graphical user interface. • Beowulf module for Monte Carlo analysis • MySQL database backend for data management TeraGrid ARCH meeting, January 29, 2009

  12. Laboratory Information Management System (LIMS) portal • Management of analytical ultracentrifugation (AUC) data for single users or entire facilities • Support for both the storage and the analysis of data • Retrieval of experimental data, results and for submitting analysis requests to the supercomputer, sharing data with collaborators • Seamless integration with UltraScan software • HPC facilities for 2-dimensional spectrum analysis and genetic algorithm analysis • 32 active institutions today [12/17/08] • International collaboration • Supercomputers in Europe to assist with the load • Technische University of Munich • Juelich Supercomputing Center TeraGrid ARCH meeting, January 29, 2009

  13. Dr. Saul Kravitz, Director of Bioinformatics Software, J. Craig Venter Institute • DAC awarded • Portal to NIAID Bioinformatics Resource Centers • Computation services such as Annotation, Homology Search, etc to the 4 bioinformatics resource centers, or to members of their research community • Developed infrastructure that should easily port to Teragrid for large scale parallel execution TeraGrid ARCH meeting, January 29, 2009

  14. Urgent Computing Interest • NASA ROSES Research Proposal • “Spatial Decision Support System for Wildfire Emergency Response and Evacuation”, Dr. Douglas Stow, SDSU • The SDSS Web portal will provide two major functions for emergency evacuation services: • Seamless data/mapping resource integration and • High performance fire spread and evacuation modeling capability. • Automate the data collection, data input formatting, GIS model processing, and rendering of model results on 2D maps and 3D globes • FARSITE (Fire Area Simulator), developed by United States Department of Agriculture (USDA) Forest Service • Integrates existing models for surface fire, crown fire, spotting, post-frontal combustion, and fire acceleration into a two-dimensional fire growth model (http://www.fire.org). TeraGrid ARCH meeting, January 29, 2009

  15. TeraGrid Pathways Activities • 2 Gateway components • Adapt gateways for educational use by underrepresented communities • GEON – SDSC, Navajo Tech • Teach participants from underrepresented communities how to build gateways • PolarGrid – IU, ECSU TeraGrid ARCH meeting, January 29, 2009

  16. Navajo Technical College and gateways • Incorporating the use of gateways in their curricula • GEON, GISolve areas of initial interest • Early work includes installation • of GEON node at NTC, adaptations • for faculty datasets: • GIS instructor • GIS for network layouts • Maps of animal locations and • population spread • DNA data for environmental science • faculty TeraGrid ARCH meeting, January 29, 2009

  17. PolarGrid • Cyberinfrastructure Center for Polar Science (CICPS) • Experts in polar science, remote sensing and cyberinfrastructure • Indiana, ECSU, CReSIS • Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland • Most existing ice sheet models, including those used by IPCC cannot explain the rapid changes http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v Source: Geoffrey Fox TeraGrid ARCH meeting, January 29, 2009

  18. Components of PolarGrid • Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster • Prototype and two production expedition grids feed into a 17 Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training. • Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system • Access to expensive data • High-end resources for analysis • TG Pathways efforts • Extend gateway and teach ECSU students how to build a gateway TeraGrid ARCH meeting, January 29, 2009 Source: Geoffrey Fox

  19. Oct-Dec 08 activities • Gateway documentation moves into production • www.teragrid.org/gateways • PI and developer pages • Rollover “on this page” • Getting started guide • Success stories (need more) • Rotating mages (need more) • Recommended software area, anyone can add to this list • Original primer content from Anurag Shankar at IU includes many “best practices” • Gateway volunteer Ian Stokes-Rees from Harvard • Structural biology to OSG and TG • Potential international collaboration with Chinese Academy of Sciences • Shaowen Wang, GISolve TeraGrid ARCH meeting, January 29, 2009

  20. Oct-Dec 08 activities • SimpleGrid deployed on Quarry’s gateway hosting environment • SC08 megajob BOF • GCE08 workshop at SC08 • Java Script Community Grid (Cog) kit demo from Gregor Von Laszewski, Rochester Institute of Technology • If you are interesting in using Java Script CoG for your gateway, Gregor has graduate students he can assign to help, gregor at rit.edu • http://www.collab-ogce.org/gce08) • Eucalyptus demo from Rich Wolski • OGCE overview from Marlon Pierce TeraGrid ARCH meeting, January 29, 2009

  21. 40 institutional members • 9 foreign affiliates • Researchers request synthetic seismograms for any given earthquake • Allows scientists to understand the ground motion associated with any given earthquake • Requested and received advanced support from TeraGrid TeraGrid ARCH meeting, January 29, 2009

  22. CIG Update part I • CIG is a software repository (specfem3d and others) • runs forward simulation from description of seismic event - get seismograms automatically delivered from real earthquakes • symmetric tensor (6 numbers) can describe an earthquake • few numbers go into large run, well-suited to wrapping for the web, users specify stations for output • grid up the earth, timesteps, usual types of simulation components • Can choose from a few competing models, can also evaluate accuracy of different models • how accurate are the models for different scenarios? • data from all earthquakes everywhere over last 30-40 years • use results to back into the right earth model TeraGrid ARCH meeting, January 29, 2009

  23. CIG Update part 2 • people log on to portal, but everyone sees all runs, not a high sensitivity data set • runs on lonestar, 2.5 hours, depends on how fine a mesh is used - workflow very similar to batch mode, output looks like list of batch jobs • global portal in a month or two - web-based interface to batch computation, but also RPC-based interface, • development of new earth models - proof-carrying code, way to allow end users to inject code into a system, - can make restricted sets of the Python language that are safe - no forks, no opening files - signed .o files? TeraGrid ARCH meeting, January 29, 2009

  24. Dynamic Accounts and the viz gateway – part 1 • Released in production just before SC • Lack of data segregation with community account was a problem • Gateway used for many different domains • File save and upload capabilities meant that users could potentially overwrite one another's data • wanted individual accounts on back end to protect individual's data • Use Globus dynamic account service • individual credentials for each of the users - dynamic accounts mapped to gateway community account on the back end, dynamic accounts never recycled • can create unix logins on the fly or have a pool of accounts (the latter is what viz gateway does) • some places create an account for each job that was run, same user could use all sorts of different accounts • viz gateway doesn't do this, a dynamic account stays with a given user forever though files are purged after 30 days of inactivity TeraGrid ARCH meeting, January 29, 2009

  25. Dynamic Accounts and the viz gateway – part 2 • Users can only start standard set of jobs • Revocation service? • yes, can disable from within dynamic account service • Filter in AIME packet processing, recognizes that dynamic accounts are used (PBS thinks this runs as a dynamic account) and attributes usage to community account in AIME packet that goes to TGCDB • keep track of local and global jobids for each job and user so they can go back and map usage to the right user • Services available • Proxy manager, ile management, third party transfer • Paraview portlet, around for years, since beginning of viz gateway - when you log in, you see only services you can use • Volume rendering, used in anatomy course at U Chicago, outside class students can't explore anatomy datasets • Will add dynamic accounts to gateway documentation, high level description and pointers to more extensive documentation TeraGrid ARCH meeting, January 29, 2009

More Related