1 / 27

Science Gateways and their tremendous potential for science

Science Gateways and their tremendous potential for science. Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center wilkinsn@sdsc.edu. Overview. What are Science Gateways? What is TeraGrid? Why TeraGrid and Gateways? Examples of Success

ponce
Download Presentation

Science Gateways and their tremendous potential for science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Science Gatewaysand their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center wilkinsn@sdsc.edu

  2. Overview • What are Science Gateways? • What is TeraGrid? • Why TeraGrid and Gateways? • Examples of Success • How Does This Help Me?

  3. Phenomenal Impact of the Internet on Scientific ResearchOnly 15 years since the release of Mosaic! • Very rapid changes in how science is conducted • 1988, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today • 1992 Mosaic web browser developed • 1995 “International Protein Data Bank Enhanced by Computer Browser” • 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program • Ensuing explosion of digital information • Need for analysis in a growing number of scientific areas

  4. Very Rapid Changes in Web Usability • First generation • Static Web pages • Second generation • Dynamic, database interfaces, cgi • Lacked the ease of use of desktop applications • Third generation • True networked and internetworked applications that enable dynamic two-way, even multi-way, communication and collaboration on the Web. • These new applications will enable remarkable new uses of the Web in the organizational workplace and on the Internet • Fourth generation • Web 2.0 • Source: Screen Porch White Paper, The University of Western Ontario (1998)

  5. Gateways are a Natural Extension of Internet Developments • 3 common types of gateway • Web portal with users in front and services in back • Client server model where application programs running on users' machines (i.e. workstations and desktops) and accesses services • Bridges across multiple grids, allowing communities to utilize both community developed grids and shared grids • Continued rapid changes ahead, must be adaptable, gateways can provide some nimbleness

  6. Arden BementSenate Testimony, April 19, 2007 “Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.” “In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.” Gateways are a terrific example of interfaces that can support transformative science

  7. Gateway Idea Resonates with Scientists • Capabilities provided by the Web are easy to envision because we use them in every day life • Researchers can imagine scientific capabilities provided through a familiar interface • Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities • But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities • Scientists know they can undertake more complex analyses and that’s all they want to focus on • But this seamless access doesn’t come for free. It all hinges on very capable developers

  8. Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? • Resource discovery • Accounting • Security • Proposal-based requests for resources (peer-reviewed access) • Code scaling and performance numbers • Detailed justification of resource request • Citations, metrics of success • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge • Small number of developers can impact thousands of scientists • But need a way to train and fund those developers and provide them with appropriate tools

  9. What is the TeraGrid? 300+ Teraflops Computation Visualization 20+ Petabytes Storage Dedicated cross-country network NSF-funded facility to offer high end compute, data and visualization resources to the nation’s academic researchers

  10. TeraGrid Resources Available to Academic Researchers at No Cost • TeraGrid creates integrated, persistent, and pioneering computational resources that significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems • Proposal-based access, researchers can use resources at no cost • Targeted support available as well

  11. Implementing Common Gateway Requirements • Web Services • GT4 deployment, identification of remaining capabilities • Information services, WebMDS • Auditing • Need to retrieve job usage info on production resources • GRAM audit deployed in test mode in September, inclusion in CTSSv4 • Community Accounts • Policy finalized, security approaches being tested by RPs • Attribute-based authentication testing • Allocations • Changes in allocation procedures, the mechanisms used to evaluate science impact, and models for identity management, authentication and authorization that are more tuned to virtual organizations. • Scheduling • Metascheduling RAT • On-demand via SPRUCE framework • Outreach • Talks, Schools/workshops (NVO, GISolve), major project demonstrations (LEAD) • SURA, HASTAC, GEON, CI-Channel, SC, Grace Hopper, MSI-CI2, Lariat, Science Workflows and On Demand Computing for Geosciences Workshop • Primer • Living document in wiki, provides up-to-date overview and instructions for new gateway developers (“how to make your portal a TeraGrid science gateway”)

  12. Gateways are growing in numbersSuccess in a variety of domains • 10 initial projects as part of TG proposal • >20 Gateway projects today • No limit on how many gateways can use TG resources • Prepare services and documentation so developers can work independently • Open Science Grid (OSG) • Special PRiority and Urgent Computing Environment (SPRUCE) • National Virtual Observatory (NVO) • Linked Environments for Atmospheric Discovery (LEAD) • Computational Chemistry Grid (GridChem) • Computational Science and Engineering Online (CSE-Online) • GEON(GEOsciences Network) • Network for Earthquake Engineering Simulation (NEES) • SCEC Earthworks Project • Network for Computational Nanotechnology and nanoHUB • GIScience Gateway (GISolve) • Biology and Biomedicine Science Gateway • Open Life Sciences Gateway • The Telescience Project • Grid Analysis Environment (GAE) • Neutron Science Instrument Gateway • TeraGrid Visualization Gateway • BIRN • Gridblast Bioinformatics Gateway • Earth Systems Grid • Astrophysical Data Repository (Cornell) • Many others interested • SID Grid • HASTAC

  13. Mapping Tool Used on Large Data Sets to Spot Brain Disorders "Using TeraGrid resources at multiple sites, this research has been able to successfully distinguish diagnostic categories such as Alzheimer's and Semantic Dementia from control subjects," said Anthony Kolasny, JHU. "This can potentially lead to a powerful new cyberinfrastructure tool clinicians can use to make earlier, more accurate diagnoses." Source: SDSC Headlines, Paul Tooby Large Deformation Diffeomorphic Metric Mapping (LDDMM), developed at the Center for Imaging Science at Johns Hopkins Computes a mathematical description of which shapes are similar and different by computing metric distances in the space of anatomical images

  14. BIRN uses SSHFS to mount TeraGrid filesystems locally CIS has 87TB of local storage. /cis/net lists network drives. 220TB through CIS portal using autofs, samba, smbwebclient. Source: Anthony Kolasny, Johns Hopkins University

  15. What is SSHFS and how can it help? • SSHFS allows you to mount data through an ssh connection. • http://fuse.sourceforge.net/sshfs.html • http://wikipedia.org/wiki/SSH_Filesystem • Simple command line • sshfs remoteuser@remotehost:/path/to/remote_dir local_dir • Performance is as fast as your ssh connection. Performance tuning possible. • Allows you to use local applications on remote data. • using Paraview to look at data processed on the TeraGrid and stored on the GPFS-WAN. • Directly accessing the remote file. Your changes are seen by everyone. Source: Anthony Kolasny, Johns Hopkins University

  16. TeraGrid Life Science Gateway • Application services for bio-informaticians • Ability for end-users to apply the large scale resources of the TeraGrid to their problems, while leveraging local resources, • Featured apps • InterProScan, version 4.2 • InterProScan Data version 12.0 • hmmr, version 2.3.2 • Blastall (from InterProScan) version 2.2.6 • Plans to engage Bioinformatics Research Centers (BRC) • Eight BRCs sponsored by the National Institute of Allergy and Infectious Disease (NIAID) • Funded to display sequencing and annotation data, comparative analysis, genome polymorphisms, gene expression, proteomics, host/pathogen interactions and pathways for the NIAID list of Category A-C priority pathogens and other pathogens causing emerging and re-emerging diseases.

  17. TeraGrid Bioportal • Access to over 140 computational tools and many biological data sets • Collaborative workspace, simplified access to diverse set of tools • Database searching, alignment and phylogeny, pattern searching, DNA/RNA analysis, and protein analysis • EMBOSS (European Molecular Biology Open Software Suite), GLIMMER (Gene Locator and Interpolated Markov Modeler), HMMER (Hidden Markov Modeler), the NCBI (National Center for Biotechnology Information) toolkit and PHYLIP (PHYLogeny Inference Package). • Standard databases include NCBI Aggregate, PDB, Prints, RepBase, UniProt, PFam, ProSite, and TransFac

  18. GEONDeveloping cyberinfrastructure in support of an environment for integrative geoscience research • IT advances can significantly impact how geoscientists conduct their daily research activities • Web/grid services, TeraGrid • Semantic data integration • Information management and ontologies • Tremendous opportunities to conduct novel and efficient research in many areas of the geosciences • SYNSEIS – SYNthetic SEISmogram generation tool • Helps seismologists calculate synthetic 3D regional seismic waveforms • Accesses distributed data centers and large computational clusters • Users only need to have access to the Internet and a browser. The entire system is web-based and is accessible from the GEONgrid portal web page.

  19. GEON: LiDAR (Light Distance And Ranging) data • Capable of generating digital elevation models (DEMs) more than an order of magnitude more accurate than those currently available • Opportunity for geologists to study the processes the shape the earth’s surface at resolutions not previously possible. • Distribution, interpolation and analysis of large LiDAR datasets, which frequently exceed a billion data-points, present significant computational challenges. • GEON tools begin with a user-defined subset of data and ends with download and visualization of interpolated surfaces and derived products.

  20. Linked Environments for Atmospheric Discovery (LEAD) • Providing tools that are needed to make accurate • predictions of tornados and hurricanes • Meteorological data • Forecast models • Analysis and visualization tools • Data exploration and Grid workflow

  21. LEAD Inspires Students • “Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!” Eric (email, March 2007)

  22. NanoHub Explosive User Growth Nanohub is used to complete coursework by undergraduate and graduate students in dozens of courses at 10 universities. • Nanohub attracts thousands of users • Over 2M hits in last month • In past 12 months • Over 21,000 users • Almost 175,000 simulation runs • Very full-featured • Simulation tools • Research proceedings • Curricula content • Collaboration spaces

  23. GridChem - a desktop application gateway • Computational Chemistry Grid (CCG) science gateway GridChem has been using TeraGrid in production since April 2006 • Currently services over 100 users and has delivered hundreds of thousands of CPU hours • Many paper publications resulting from GridChem use

  24. CReSIS (Center for Remote Sensing of Ice Sheets) • Awarded CI-TEAM funding to build a Polar Gateway • International Polar Year 2007-2008 • CReSISGrid • Build a TeraGrid Science Gateway • Provide broad-based educational and training activity in Cyberinfrastructure for remote sensing and ice sheet dynamics • MSI impact through leadership of Linda Hayden, Elizabeth City State University

  25. Tremendous Potential for Gateways • In only 15 years, the Web has fundamentally changed human communication • Science Gateways can leverage this amazingly powerful tool to: • transform the way scientists collaborate • tackle the toughest problems independent of location • impact the amount of science that can result from each project • influence the public’s perception of science • High end resources can have a profound impact • The future is very exciting! • Web 2.0 • Application Hosting • Gateway-in-a-box

  26. Would development of a gateway help your research? • Researchers using defined sets of tools in different ways • Same executables, different input • Datasets • Workflow creation • Common data formats • Large shared datasets • gateways@teragrid.org mailing list • Email majordomo@teragrid.org • <subscribe gateways> in body • Biweekly telecons to get advice from others • www.teragrid.org • Details about current gateways • Materials from June full day tutorial at TG07

  27. Thank you for your attentionAny questions? Nancy Wilkins-Diehr wilkinsn@sdsc.edu

More Related