350 likes | 469 Views
Science Gateways and their tremendous potential for science. Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu. Phenomenal Impact of the Internet on Scientific Research Only 15 years since the release of Mosaic!. Very rapid changes in how science is conducted
E N D
Science Gatewaysand their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu
Phenomenal Impact of the Internet on Scientific ResearchOnly 15 years since the release of Mosaic! • Very rapid changes in how science is conducted • 1980’s, Early gateways, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today • 1992 Mosaic web browser developed • 1995 “International Protein Data Bank Enhanced by Computer Browser” • 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program • Ensuing explosion of digital information • Need for analysis in a variety of scientific areas Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Very Rapid Changes in Web Usability • First generation • Static Web pages • Second generation • Dynamic, database interfaces, cgi • Lacked the ease of use of desktop applications • Third generation • True networked and internetworked applications that enable dynamic two-way, even multi-way, communication and collaboration on the Web. • These new applications will enable remarkable new uses of the Web in the organizational workplace and on the Internet Source: Screen Porch White Paper, The University of Western Ontario (1998) Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
What’s Next?“Prediction is hard. Especially about the future .”, Yogi Berra • Scientists of tomorrow are familiar with media we don’t even know about • Not using full power of the internet by any means today • Data and knowledge are handled differently • Linking publications and data referenced in those publications • Annotation, data provenance • Inability to create discourse around a piece of data • Ability to keep up with knowledge generation • 16,000 papers a week into PubMed • 50,000 papers a week in biology • Right now have choice between reading abstract or paper, might add 10 minute author clip • How can science motivate in the way YouTube can? • Streaming video to view simulations, using visual and sound media • Ipods everywhere, but not exploited for science • Web 2.0 • Science was earlier internet adopter, now overtaken by business • Now a big difference between commercial and scientific sites • Noticeable efforts to keep users on commercial sites Source: 5/14/07 interview with Phil Bourne, SDSC Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Gateways are a Natural Extension of Internet Developments • 3 common types of gateway • Web portal with users in front and services in back • Client server model where application programs running on users' machines (i.e. workstations and desktops) and accesses services • Bridges across multiple grids, allowing communities to utilize both community developed grids and shared grids • Continued rapid changes ahead, must be adaptable, gateways can provide some nimbleness Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Highlights: LEAD Inspires Students • A student gets excited about what he was able to do with LEAD • “Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy! • Eric” (email, March 2007) Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
1998 Workshop looks at impact of technology on science • Impact of Advances in Computing and Communications Technologies on Chemical Science and Technology: Report of a Workshop (1999), Commission on Physical Sciences, Mathematics, and Applications • Chaired by Thom Dunning • Collaboratory Life: Challenges of Internet-mediated Science for Chemists • Thomas A. Finholt, University of Michigan • Collaboratories: Building Electronic Scientific Communities • Raymond A. Bair, Pacific Northwest National Laboratory Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Finholt: Internet challenges status quo in chemistry research • Since the birth of modern chemistry in the early 19th century there has been tremendous growth in the knowledge and the practical application of chemical principles. • However, in many important ways, the practice of chemistry research and teaching has remained unchanged. • The advent of the Internet as a worldwide mechanism for conducting scientific communication challenges this status quo. • Specifically, innovations like collaboratories, or network-based virtual laboratories, remove constraints of distance and time on scientific collaboration. • Collaboratories increase access to scarce instruments, accelerate the flow of information, and place new demands on senior scientists to mentor students. Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Bair: Internet revolutionizes not only scope, but process of scientific investigation • High-speed computation now provides the means to examine and simulate systems at unprecedented levels of detail and accuracy • Large-scale databases enable analysis of the prodigious volumes of data • Coupling technologies with communications revolutionizes not only the scope but also the process of scientific investigation • Distributed computing and communications technologies enable researchers to access data, instruments, and expertise independent of their location. • The development and adoption of electronic collaboration capabilities will provide geographically distributed research teams with greater abilities for the organization, close-knit interaction, and rapid response, needed to address increasingly challenging research problems. • Reduction in travel and equipment costs, increased access to large facilities Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Bair: Familiarity Breeds Success • As the chemical applications and capabilities provided by collaboratories become more familiar, researchers will move significantly beyond current practice to exciting new paradigms for scientific work • Requirements for future success include: • Development of interdisciplinary partnerships of chemists and computer scientists • Flexible and extensible frameworks for collaboratories • Means to deploy, support, and evaluate collaboratories in the field Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Highlights: NanoHub Explosive User Growth • Nanohub attracts thousands of users • Typically 100-150 guests, 10-15 members online at any given point in time • 1,572,008 visits in February 2007 • In past 12 months • Over 21,000 users • Almost 175,000 simulation runs Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
The Internet as a Resource for News and Information about Science The convenience of getting scientific material on the web opens doors to better attitudes and understanding of science November 20, 2006 John B. Horrigan, Associate Director http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdf
NSF has long recognized the importance of science and technology interactions • Interdisciplinary programs did much to facilitate application-technology integration and develop standard tools • 1997 PACI Program • Marriage of technologists and application scientists • A few groups served as path finders and benefited tremendously • NPACI neuroscience thrust in 1997 leads to Telescience portal and BIRN in 2001 • Information Technology Research (ITR) • NSF Middleware Initiative (NMI) • Need plug and play tools so more groups can benefit • Software call (SDCI) will address some of this • 2008 Cyber-enabled Discovery and Innovation (CDI) • New generation of computationally-based discovery concepts and tools at the intersection of the computational world and the physical and biological worlds. • Data handling, sensors, Pbyte databases • Improved simulation and modeling techniques • Virtual environments and advanced cyberinfrastructure Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Arden BementSenate Testimony April 19, 2007 Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore. In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet. Gateways are a terrific example of interfaces that can support transformative science Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Highlights: GridChem - a desktop application gateway • Computational Chemistry Grid (CCG) science gateway GridChem has been using TeraGrid in production since April 2006 • Currently services over 100 users and has delivered hundreds of thousands of CPU hours • Team expects a significant increase in usage in the coming year as new applications are deployed throughout TeraGrid Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Gateway Idea Resonates with Scientists • Capabilities provided by the Web are easy to envision because we use them in every day life • Researchers can imagine scientific capabilities provided through a familiar interface • Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities • But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities • Scientists know they can undertake more complex analyses and that’s all they want to focus on • But this seamless access doesn’t come for free. It all hinges on very capable developers Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
But Trust and Reliability are Fundamental to Success • Fundamental in business applications • Fundamental for science too • The public gains confidence in internet sites that provide accurate information reliably • Pub Med • National Cancer Institute • Google • For scientists it takes far longer to build this confidence • Scientists will not rely on gateway tools to conduct their analysis and store their research results unless they have ultimate confidence in the interfaces • Proven track record • Run by reputable organization • Been in existence a long time • Provides accurate results • Works repeatedly • Confidence in PDB developed over 30 years, started with community mandate that proteins must be deposited before publications would be accepted Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
How can we build interfaces that scientists will trust? • Expertise • Simple web pages are easy to design • Complex capabilities, particularly those involving grid access, take knowledgeable developers to create a production product • LEAD, nanoHUB show what investment can do • Sustained funding • Most science groups have money for research, not portal building or ongoing support for portals • Knowledge transfer • Investments must result in building blocks that other applications can use • Track industry trends • Many gateways have similar issues • Data access • Analysis capabilities • User work environments • Workflow capabilities Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? • Resource discovery • Accounting • Security • Proposal-based requests for resources (peer-reviewed access) • Code scaling and performance numbers • Justification of resources • Gateway citations • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge • Small number of developers can impact thousands of scientists • But need a way to train and fund those developers and provide them with appropriate tools Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
BIRN uses SSHFS to mount TeraGrid filesystems locally CIS has 87TB of local storage. /cis/net lists network drives. 220TB through CIS portal using autofs, samba, smbwebclient. Source: Anthony Kolasny, Johns Hopkins University
What is SSHFS and how can it help? • SSHFS allows you to mount data through an ssh connection. • http://fuse.sourceforge.net/sshfs.html • http://wikipedia.org/wiki/SSH_Filesystem • Simple command line • sshfs remoteuser@remotehost:/path/to/remote_dir local_dir • Performance is as fast as your ssh connection. Performance tuning possible. • Allows you to use local applications on remote data. • using Paraview to look at data processed on the TeraGrid and stored on the GPFS-WAN. • Directly accessing the remote file. Your changes are seen by everyone. Source: Anthony Kolasny, Johns Hopkins University
What is the TeraGrid? • NSF-funded facility to offer high end compute, data and visualization resources to the nation’s academic researchers 300+ Teraflops Computation Visualization 20+ Petabytes Storage Dedicated cross-country network
TeraGrid Resources Available to Academic Researchers at No Cost • TeraGrid creates integrated, persistent, and pioneering computational resources that significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems • Proposal-based access, researchers can use resources at no cost • Targeted support available as well
Advanced Support for all OCI Resources Available Through Open Request Process • Request through same peer review process used to request resources • Reviews based on appropriate use of resources, science is not reviewed if already funded • Provides access to individual center programs • SAC at SDSC • SAP at NCSA • ASTA at TeraGrid • PSC, TACC, other sites • The above programs support PI research groups and focus on code optimization or use of multiple resources within TeraGrid • Can focus on community codes or research groups with large allocations • Gateway support now available too! Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
Support is Very Targeted • Start with well-defined objectives • Focus on efficient or novel use of OCI resources • Minimum .25 FTE for usually up to 1 year • Enough investment to really understand complex problems • Must have commitment from PIs • Want to make sure work is incorporated into production codes and gateways • Good candidates for targeted support include: • Large, high impact projects • Ability to influence new communities • Happy for feedback from directorates on important projects • Can likely provide some support to ~5 gateways simultaneously • Help one project, move on to others • Provide enough instructions so targeted help is not necessary for success Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
National Virtual Observatory • Large peer-reviewed allocation • CPU used to produce derived image products from multi-terabyte datasets and make these available to scientists through the NVO portal • Computational capabilities through community account recently made available
TeraGrid selects all gateways (F) TeraGrid designs all gateways (F) TeraGrid limits the number of gateways (F) All gateways need TeraGrid funding to exist (F) Any PI can request an allocation and use it to develop a gateway (T) Gateway design is community-developed and that is the core strength of the program (T) TeraGrid staff are alerted to gateway work when a proposal is reviewed or when a community account is requested (T) Limited TeraGrid support can be provided for targeted assistance to integrate an existing gateway with TeraGrid (T) Easy TeraGrid Gateway True and False TestAnswers Provided
TeraGrid Science Gateways Overview • October, 2004 “Science Gateway” term originates • “Our TeraGrid WIDE strategy aims at this broader community and is embodied in the concept of a “science gateway,” providing community-tailored access to TeraGrid services and capabilities.” • Gateway teams identified to help TeraGrid define what it will need to support these entirely new usage models • Address needs of large ITR projects • Spring, 2005 Science Gateway Requirements Analysis Team (RAT) • Identification of common needs across the gateways • Goal is production use of TG resources in the gateway as well as development of process and policy within TG for scalable gateway program and services • Tremendous sharing of experiences amongst talented developers Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
2006 – Implementing Common Gateway Requirements • Web Services • GT4 deployment, identification of remaining capabilities • Information services, WebMDS • Auditing • Need to retrieve job usage info on production resources • GRAM audit deployed in test mode in September, inclusion in CTSSv4 • Community Accounts • Policy finalized, security approaches being tested by RPs • Attribute-based authentication testing • Allocations • Changes in allocation procedures, the mechanisms used to evaluate science impact, and models for identity management, authentication and authorization that are more tuned to virtual organizations. • Scheduling • Metascheduling RAT • On-demand via SPRUCE framework • Outreach • Talks, Schools/workshops (NVO, GISolve), major project demonstrations (LEAD) • SURA, HASTAC, GEON, CI-Channel, SC, Grace Hopper, MSI-CI2, Lariat, Science Workflows and On Demand Computing for Geosciences Workshop • Primer • Living document in wiki, provides up-to-date overview and instructions for new gateway developers (“how to make your portal a TeraGrid science gateway”) Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
2007 – Ready for Production • Activities will include • Targeted support for new gateways • Generalized help desk support • Gateway developers are a growing community with unique needs • Getting started support and documentation • TG07 tutorial materials • Pointers to • Portal frameworks • Data management approaches • Workflow tools • Collaboration tools • Development of tools production gateways can use • Web Service interfaces • Tracking number of users, use of TeraGrid resources • Accounting/authorization tools • Citation capabilities • Proposal tips • “Where can I run now” capability Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)
CReSIS (Center for Remote Sensing of Ice Sheets) • Awarded CI-TEAM funding to build a Polar Gateway • International Polar Year 2007-2008 • CReSISGrid • Build a TeraGrid Science Gateway • Provide broad-based educational and training activity in Cyberinfrastructure for remote sensing and ice sheet dynamics • MSI impact through leadership of Linda Hayden, Elizabeth City State University
When is a gateway appropriate? • Researchers using defined sets of tools in different ways • Same executables, different input • Datasets • Common data formats • National Virtual Observatory • Earth System Grid Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
Tremendous Potential for Gateways • In only 15 years, the Web has fundamentally changed human communication • Science Gateways can leverage this amazingly powerful tool to: • transform the way scientists collaborate • impact the amount of science that can result from each project • influence the public’s perception of science • Like e-commerce, Science Gateways need to build trust in the infrastructure, tools, and methods that they use • Unlike the public or commercial arena, scientists will be vested in these gateways • Science Gateways will need to build trust in the organization behind them. Gateways need to have continuity • High end resources can have a profound impact • The future is very exciting! Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
Special guests today • Gerhard Klimeck, Purdue, nanoHUB (participating remotely) • Dennis Gannon, Indiana University, LEAD • Sudhakar Pamidighantam, UIUC, GridChem • John McGee, RENCI, TeraGrid Bioportal • Shaowen Wang, U Iowa (soon to be UIUC), GISolve • Thank you for your attention • Q&A during and after the presentations • Please contact me at wilkinsn@sdsc.edu Nancy Wilkins-Diehr (TeraGrid GIG, wilkinsn@sdsc.edu)