1 / 89

Facilitating Scientific Discovery with XSEDE

Facilitating Scientific Discovery with XSEDE. Philip Blood (blood@psc.edu) Pittsburgh Supercomputing Center. How do you create cyberinfrastructure to facilitate scientific discovery?. A lot of times, people don't know what they want until you show it to them. -- Steve Jobs.

tadeo
Download Presentation

Facilitating Scientific Discovery with XSEDE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Facilitating Scientific Discovery with XSEDE Philip Blood (blood@psc.edu) Pittsburgh Supercomputing Center

  2. How do you create cyberinfrastructure to facilitate scientific discovery?

  3. A lot of times, people don't know what they want until you show it to them. -- Steve Jobs http://www.quotationspage.com/quotes/Steve_Jobs

  4. Facilitating Scientific Discovery through Computation Challenge: Support diversity in • Computational requirements • compute-driven • data-driven • diverse workflows • Modes of access/collaboration • beyond command-line • connecting to campuses • Levels of computational expertise/interest • make it accessible to everyone who could benefit

  5. You can't just ask customers what they want and then try to give that to them. By the time you get it built, they'll want something new. -- Steve Jobs

  6. Facilitating Scientific Discovery through Computation Challenge: Support diverse and changing requirements and solutions • Advances in software and hardware • e.g. web technologies, GPUs • Advances in experimental technologies • e.g. new telescopes, next-generation sequencing machines

  7. TeraGrid  XSEDE eXtreme Science and Engineering Discovery Environment

  8. XD Solicitation/XD Program • eXtreme Digital Resources for Science and Engineering (NSF 08-571) -- Extremely Complicated • High-Performance Computing and Storage Services • aka Track 2 awardees and others • High-Performance Remote Visualization and Data Analysis Services • 2 awards; 5 years; $3M/year • proposals due November 4, 2008 • Integrating Services (5 years, $26M/year) • Coordination and Management Service (CMS) • 5 years; $12M/year • Technology Audit and Insertion Service (TAIS) • 5 years; $3M/year • Advanced User Support Service (AUSS) • 5 years; $8M/year • Training, Education and Outreach Service (TEOS) • 5 years, $3M/year • two phase proposal process for IS • pre-proposals November 4, 2008 • final proposals due June 15, 2009

  9. XSEDE Vision The eXtreme Science and Engineering Discovery Environment (XSEDE) will: enhance the productivity of scientists and engineers by providing them with new and innovative capabilities and thus facilitate scientific discovery while enabling transformational science/engineering and innovative educational programs

  10. Science requires diverse digital capabilities • XSEDE will be a comprehensive, expertly managed set of advanced heterogeneous high-end digital services, integrated into a general-purpose infrastructure. • XSEDE is about increased user productivity • increased productivity leads to more science • increased productivity is sometimes the difference between a feasible project and an impractical one

  11. XSEDE will support a breadth of research From direct contact with user community as part of requirements collections • Earthquake Science and Civil Engineering • Molecular Dynamics • Nanotechnology • Plant Science • Storm modeling • Epidemiology • Particle Physics • Economic analysis of phone network patterns • Brain science • Analysis of large cosmological simulations • DNA sequencing • Computational Molecular Sciences • Neutron Science  • International Collaboration in Cosmology and Plasma Physics Sampling of much larger set. Many examples are new to TeraGrid/HPC. Range from petascale to disjoint HTC, many are data driven. XSEDE will support thousands of projects.

  12. XSEDE’s Distinguishing Characteristics • Foundation for a national CI ecosystem • comprehensive suite of advanced digital services will federate with other high-end facilities and campus-based resources • Unprecedented integration of diverse digital resources • innovative, open architecture making possible the continuous addition of new technology capabilities and services

  13. How we propose to engage stakeholders • Collection of stakeholder needs: • surveys, ticket mining, … • focus groups, usability panels, … • interviews, shoulder surfing, … • Prioritization of identified need and derived requirements • User Requirements Evaluation and Prioritization (UREP) Working Group • broad participation across architecture, deployment, operations, users, and service providers • Assessing plans and deployments • through a variety of stakeholder-focused, facilitated workshops • e.g., interactive ATAM sessions focused on identifying, quantifying, discussing tradeoffs • Representation in the management of XSEDE • XSEDE Advisory Board • User Advisory Committee • Service Providers Forum

  14. What we mean by architecture • Architecture defines the XSEDE system’s components and how they interact • each component is motivated by one or more requirements • each component is defined in terms of required capabilities: interfaces and qualities of service • Equally important is the process by which we revise the architecture over time • key point: driven by new or revised requirements

  15. Initial XSEDE architecture: High-order bits • Don’t disrupt the user community! Maintain existing TeraGrid services • Focus on user-facing access layer • for power users: “first,do no harm” • for other users: expand use via new hosted XSEDE User Access Services (XUAS) and Global Federated File System (GFFS) • Promote standards and best practices to enhance interoperability, portability, and implementation choice

  16. What is XSEDE? • eXtreme Science and Engineering Discovery Environment (XSEDE) • Freely accessible virtual system for open scientific research, funded by NSF • Provides compute, data, and visualization resources as well as training and support services. • Integrates resources via centralized services, common software, and fast networks • Currently composed of 17 Service Providers (SPs) from around the world

  17. XSEDE Org Chart

  18. XSEDE campus bridging vision • Help XSEDE create the software, tools and training that will allow excellent interoperation between XSEDE infrastructure researchers local (campus) cyberinfrastructure; • Enable excellent usability from the researcher’s standpoint for a variety of modalities and types of computing: HPC, HTC, and data intensive computing • Promote better use of local, regional and national CI resources by • Promoting the use of InCommon for all authentication systems • Making it easier to use contribute campus systems (in whole possibly but generally in part) to the aggregate capacity and capability of XSEDE • Making it easier to use local systems - not contributed to the aggregate of XSEDE overall - more effectively in the context of workflows and cyberinfrastructure that include resources within and beyond XSEDE in a well coordinated fashion • We will work with the various groups in XSEDE to align and assist activities and communications so that XSEDE collectively achieves these goals without interfering with established organizational structures and decision making processes [we plan to provide more lift than drag]

  19. XSEDE campus bridging tactics year 1 • There is an important value proposition that does not involve cash • Support Installers created by Architecture group • Call for participation out (proposals due Dec 9)– pursue a small number of campuses as part of a pilot program to affect uptake by working within the them with diligence, and reap economies of scale (if things go right) or clear learning experiences (otherwise) • Planning to visit campuses to raise awareness and work with campus personnel • Documentation & training • Science Gateways (document by SurreshMarru & Marlon Pierce) • Promote use of systems template created by TACC • Serve as ‘connectors’ in discussions that should or could play a role in campus bridging activities (sit in in Arch and User Services calls) • Work closely with OSG (OSG is plenty good at what they do!)

  20. http://pti.iu.edu/campusbridging/ 21

  21. What resources are available? https://portal.xsede.org/web/guest/resources • Compute • Condor Pool, OSG (HTC) • Shared Memory (16 TB) • Massively Parallel (100,000 cores) • Click on name for more info • Special purpose: heterogeneous, data-intensive web hosting • Visualization • Storage • $HOME, $SCRATCH, and archival storage comes with allocation • Can also request purely data allocations • Science Gateways • Human: Extended Collaborative Support

  22. Available SMP Systems • PSC – Blacklight: • SGI UltraViolet system • UV 1000 system with 4094 cores (Intel Nehalem 8 core processors) and 32 TB of memory • Primarily for large shared memory applications • Also data-intensive application may benefit

  23. Available MPP Systems • NICS – Kraken: • Cray XT5 – 112,896 processors • Intended for highly scalable applications • Minimum Startup allocation request is 100,000 SUs(?) • Find out if users are really going to use this system before adding them or requesting a startup allocation • TACC – Ranger: • Sun Constellation – 62,976 processors • Intended for codes scalable to thousands of cores

  24. Available Cluster Systems • Purdue – Steele: • Dell PowerEdge 1950 – 7,144 processors (1560 are available in the longest running production queue) • Suited for a wide range of serial and small/medium parallel jobs • Longest Wall Time (720 hours) • Primarily 16 GB memory nodes and 1 GigE Ethernet interconnect • Access for up to 4 hours to some nodes with 32 GB memory and SDR Infinibandinterconntect • TACC – Lonestar4: • Dell PowerEdgeWestmere – 22,656 cores (2 6-core processers per node) • 24 GB memory per node (2 GB per core) • Intended primarily for applications scalable up to thousands of cores • Can run serials jobs in a special queue

  25. Special Resources • NCSA – Forge • Dell PowerEdge C6145 Quad Socket nodes (dual 8-core AMD Magny-Cour processors) • Each node supports 8 NVIDIA Fermi N2070 GPUs • Intended for applications that make use of GPGPUs • Purdue – Condor (high throughput) • Pool of over 27,000+ processors • Various Architectures and OS • Designed for high-throughput computing and is excellent for parameter sweeps, Monte Carlo simulations, and other serial applications • See Condor Appendix A for more details. • IU – Quarry: • Used for web services hosting and Science Gateways

  26. Special Resources (cont.) • SDSC – Dash: • Intel Nehalem – 512 processors • Primarily used for data intensive computing • vSMP (virtual shared memory) software from ScaleMP that aggregates memory across 16 nodes. This allows applications to address 768GB of memory. • 4 TB of Flash memory configurable as fast file I/O subsystem or extended fast virtual memory swap space. • SDSC – Trestles: • 10,368 cores (4 8-core AMD Magny-Cours processors per node) • 20 TB DRAM (64 GB per node / 2 GB per core) • 38 TB SSD Flash Memory (120 GB per node) • QDR Infiniband interconnect • Intended for moderately scalable parallel applications and fast local I/O requirements

  27. Visualization Resources • TACC – Longhorn: • Dell/NVIDIA Visualization and Data Analysis Cluster • A hybrid CPU/GPU system designed for remote, interactive visualization and data analysis, but it also supports production, compute-intensive calculations on both the CPUs and GPUs via off-hour queues • TACC – Spur: • Sun Visualization Cluster • 128 compute cores / 32 NVIDIA FX5600 GPUs • Spur is intended for serial and parallel visualization applications that take advantage of large per-node memory, multiple computing cores, and multiple graphics processors.

  28. Visualization Resources (cont.) • NICS – Nautilus (SMP): • SGI UltraViolet SMP • 1024 cores (Intel Nehalem) • 4 TB global shared memory • Primarily used for three tasks • Visualizing data results from computer simulations with many complex variables • Analyzing large amounts of data coming from experimental facilities • Aggregating and interpreting input from a large number of sensors distributed over a wide geographic region

  29. New Systems (Coming Soon) • SDSC – Gordon • Consists of a “compute cluster” and separate I/O nodes • 1024 dual socket compute nodes, 64 I/O nodes • Dual rail, QDR Infiniband, 3D torus interconnect • Designed for data intensive computer that spans domains such as genomics, graph problems, geophysics, and data mining. • Large memory super-nodes are ideal for serial or threaded applications that require a large amount of memory • Flash based I/O may provide significant performance increases for certain applications

  30. OSG and XSEDE • Realizing a national CI ecosystems requires confederation of CI providers • OSG and XSEDE working together represents a substantial step in this direction • still much to address though! • OSG is a significant CI in the US • ties to CI (eScience infrastructure) providers internationally • OSG represents a direction of expanding the scope of XSEDE • OSG has a focus on HTP opportunistic resources • TeraGrid was mostly about large parallel applications of HPC capability and capacity resources • these compliment each other well and present real opportunities for leverage and integration that will benefit the research community

  31. Extended Collaborative Support Services (ECSS) Provides collaboration between XSEDE staff and users. The objective of the program is to enhance effectiveness and productivity. • Extended Support for Research Teams (ESRT) • In depth collaboration with a research group • Extended Support for Community Capabilities (ESCC) • Support for community codes, initiatives • Extended Support for Science Gateways (ESSGW) • Support for building, deploying science gateways

  32. Novel and Innovative Projects • As part of ECSS, major focus to engage communities that have not traditionally used TeraGrid resources or that face new challenges • e.g. Bioinformatics, Social Sciences, Humanities, Machine Learning

  33. Accessing extended collaborative support  Expertise is available in a wide range of areas: performance analysis, parallelization, optimization, gateway and web portal development, specialized scientific software. Can solicit Extended Support at any time through the Allocations tab at the XSEDE User Portal Requires written request Inquire at help@xsede.org

  34. What is a Science Gateway? • A Science Gateway • Enables scientific communities of users with a common scientific goal • Uses high performance computing • Has a common interface • Leverages community investment • Three common forms: • Web-based portals • Application programs running on users' machines but accessing services in XSEDE • Coordinated access points enabling users to move seamlessly between XSEDE and other resources

  35. How can a Gateway help? • Make science more productive • Researchers use same tools • Complex workflows • Common data formats • Data sharing • Bring XSEDE capabilities to the broad science community • Lots of disk space • Lots of compute resources • Powerful analysis capabilities • A nice interface to information

  36. My Experience: TeraGrid Support Enabling New Science My code: 256 atoms DL_POLY: ~13,000 atoms Jan. 2004 NAMD: 740,000 atoms 60X larger!

  37. TeraGrid to the Rescue Fall 2004: Granted allocations at PSC, NCSA, SDSC

  38. Where to Run? • EVERYWHERE • Prepare systems on ANL IA-64 (high availability/long queue time) • Smaller simulations 350,000 atoms on NCSA IA-64s (and later Lonestar) • Large simulations 740,000 atoms on highly scalable systems: PSC XT3 and SDSC Datastar • TeraGrid infrastructure critical • Archiving data • Moving data between sites • Analyzing data on the TeraGrid

  39. Result: Open up new phenomenon to investigation through simulation Membrane Remodeling Blood, P.D. and Voth, G.A. Proc. Natl. Acad. Sci.103, 15068 (2006).

  40. Personalized User Support: Critical to Success Contacted by TG Support to determine needs Worked closely with TG Support on (then) new XT3 architecture to keep runs going during initial stages of machine operation TG worked closely with application developers to find problems with the code and improve performance Same pattern established throughout XSEDE for User Support Special extended support can be obtained for deeper needs (improving codes/workflows/etc.)

  41. Extended Collaborative Support: Improving Parallel Performance of Protein Folding Simulations http://www.chem.cornell.edu/has/ The UNRES molecular dynamics (MD) code utilizes a carefully-derived mesoscopic protein force field to study and predict protein folding pathways by means of molecular dynamics simulations.

  42. Load Imbalance Detection in UNRES Only looking at time spent in the important MD phase • In this case: Developers unaware that chosen algorithm would create load imbalance • Reexamined available algorithms and found one with much better load balance – also faster in serial! • Also parallelized serial function causing bottleneck Observe multiple causes of load imbalance, as well as the serial bottleneck

  43. Major Serial Bottleneck and Load Imbalance in UNRES Eliminated • Due to 4x faster serial algorithm the balance between computation and communication has shifted – communication must be more efficient to scale well • Code is undergoing another round of profiling and optimization

  44. How do I get started using XSEDE? To get started using XSEDE a researcher needs to: • apply for an XSEDE allocation or • request to be added to an existing one. You do either of these through the XSEDE User Portal.

  45. XSEDE User Portal • Web-based single point of contact that provides: • Continually updated information about your accounts. • Access to your XSEDE accounts and allocated resources: The Portal provides a single location from which to access XSEDE resources. One can access all accounts on various machines from the Portal. • Interfaces for data management, data collections, and other user tasks and resources • Access to the Help Desk.

  46. Create a portal account from the main xsede.org site

  47. Creating an XSEDE portal account (1) • Fill in personal information • Choose a passkey • System will send you email with a confirmation number • You will use the confirmation number together with passkey to verify your account

  48. Creating an XSEDE portal account (2) • Verify your account using email address, passkey, and verification code • Choose username • Choose password (following rules) • Password must: • At least 8 characters • At least three of the proscribed character classes • Not be in the dictionary!

  49. Log in to the Portal • Can access login from either xsede.org or portal.xsede.org • xsede.org has additional info about xsede • portal.xsede.org is the main portal for accessing xsede resources

More Related