1 / 70

Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure

Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure. Mary Fran Yafchak, maryfran@sura.org SURA IT Program Coordinator, SURAgrid project manager . About SURA. 501(c)3 consortium of research universities Major programs in: Nuclear Physics (“JLab”, www.jlab.org )

chelsey
Download Presentation

Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application Driven Design for a Large-Scale, Multi-Purpose Grid Infrastructure Mary Fran Yafchak, maryfran@sura.org SURA IT Program Coordinator, SURAgrid project manager

  2. About SURA • 501(c)3 consortium of research universities • Major programs in: • Nuclear Physics (“JLab”, www.jlab.org) • Coastal Science (SCOOP, scoop.sura.org) • Information Technology • Network infrastructure • SURAgrid • Education & Outreach SURA Mission: • foster excellence in scientific research • strengthen the scientific and technical capabilities of the nation and of the Southeast • provide outstanding training opportunities for the next generation of scientists and engineers http://www.sura.org

  3. About SURA • 501(c)3 consortium of research universities • Major programs in: • Nuclear Physics (“JLab”, www.jlab.org) • Coastal Science (SCOOP, scoop.sura.org) • Information Technology • Network infrastructure • SURAgrid • Education & Outreach SURA Mission: • foster excellence in scientific research • strengthen the scientific and technical capabilities of the nation and of the Southeast • provide outstanding training opportunities for the next generation of scientists and engineers http://www.sura.org

  4. Scope of the SURA region • 62 diverse member institutions • Geographically - 16 states plus DC • Perspective extends beyond the membership • Broader education community • Non-SURA higher ed, Minority Serving Institutions, K-12 • Economic Development • Regional network development • Technology transfer • Collaboration with Southern Governors’ Association

  5. About SURAgrid • A open initiative in support of regional strategy and infrastructure development • Applications of regional impact are key drivers • Designed to support a wide variety of applications • “Big science” but “smaller science” O.K. too! • Applications beyond those typically expected on grids • Instructional use, student exposure • Open to what new user communities will bring • On-ramp to national HPC & CI facilities (e.g., Teragrid) • Not as easy as building a community or project-specific grid but needs to be done…

  6. About SURAgrid Broad view of grid infrastructure • Facilitate seamless sharing of resources within a campus, across related campuses and between different institutions • Integrate with other enterprise-wide middleware • Integrate heterogeneous platforms and resources • Explore grid-to-grid integration • Support range of user groups with varying application needs and levels of grid expertise • Participants include IT developers & support staff, computer scientists, domain scientists

  7. SURAgrid Goals • To develop scalable infrastructure that leverages local institutional identity and authorization while managing access to shared resources • To promote theuse of this infrastructure for the broad research and education community • To provide a forum for participants to share experience with grid technology, and participate in collaborative project development

  8. SURAgrid Resources SURAgrid Vision SURAgrid Industry Partner Coop Resources (e.g. IBM partnership) Institutional Resources (e.g. Current participants) Gateways to National Cyberinfrastructure (e.g. Teragrid) VO or Project Resources (e.g, SCOOP) Other externally funded resources (e.g. group proposals) Heterogeneous Environment to Meet Diverse User Needs “MySURAgrid” View Project-Specific View • SURA regional development: • Develop & manage partnership relations • Facilitate collaborative project development • Orchestrate centralized services & support • Foster and catalyze application development • Develop training & education (user, admin) • Other…(Community-driven, over time…) Project-specific tools SURAgrid Resources and Applications Sample User Portals

  9. Bowie State GMU UMD SURAgrid Participants (As of November 2006) UMich UKY UVA GPN UArk Vanderbilt ODU UAH USC NCState MCSR SC UNCC TTU Clemson TACC TAMU UAB UFL LSU Kennesaw State GSU LATech ULL Tulane = Resources on the grid = SURA Member

  10. Major Areas of Activity • Grid-Building (gridportal.sura.org) • Themes: heterogeneity, flexibility, interoperability • Access Management • Themes: local autonomy, scalability, leveraging enterprise infrastructure • Application Discovery & Deployment • Themes: broadly useful, inclusive beyond typical users and uses,promoting collaborative work • Outreach & Community • Themes: sharing experience, incubator for new ideas, fostering scientific & corporate partnerships

  11. Major Areas of Activity • Grid-Building (gridportal.sura.org) • Themes: heterogeneity, flexibility, interoperability • Access Management • Themes: local autonomy, scalability, leveraging enterprise infrastructure • Application Discovery & Deployment • Themes: broadly useful, inclusive beyond typical users and uses,promoting collaborative work • Outreach & Community • Themes: sharing experience, incubator for new ideas, fostering scientific & corporate partnerships

  12. SURAgrid Application Strategy • Provide immediate benefit to applications while applications drive infrastructure development • Leverage initial application set to illustrate benefits and refine deployment • Increase quantity and diversity of both applications and users • Develop processes for scalable, efficient deployment; assist in “grid-enabling” applications Efforts significantly bolstered through NSF award: “Creating a Catalyst Application Set for the Development of Large-Scale Multi-purpose Grid Infrastructure”(NSF-OCI-054555)

  13. Creating a Catalyst Application Set Discovery • Ongoing methods: meetings, conferences, word of mouth • Formal survey of SURA members to supplement methods Evaluation • Develop criteria to help prioritize and direct deployment efforts • Determine readiness to deploy and tools/assistance required Implementation • Exercise and evolve existing deployment & support processes in response to lessons learned • Document and disseminate lessons learned • Explore means to assist in grid-enabling applications

  14. Some Application Close-ups In SURAgrid demo area today: • GSU: Multiple Genome Alignment on the Grid • Demo’d by Victor Bolet, Art Vandenberg • UAB: Dynamic BLAST • Demo’d by: Enis Afgan, John-Paul Robinson • ODU: Bioelectric Simulator for Whole Body Tissues • Demo’d by Mahantesh Halappanavar • NCState: Simulation-Optimization for Threat Management in Urban Water Systems • Demo’d by Sarat Sreepathi • UNC: Storm Surge Modeling with ADCIRC • Demo’d by Howard Lander

  15. GSU Multiple Genome Alignment • Sequence Alignment Problem • Used to determine biological meaningful relationship among organisms • Evolutionary information • Diseases, causes and cures • Information about a new protein • Especially compute intensive for long sequences • Needleman and Wunsch (1970) - optimal global alignment • Smith and Waterman (1981) - optimal local alignment • Taylor (1987) - multiple sequence alignment by pairwise alignment • BLAST trades off optimal results for faster computation

  16. Examples of Genome Alignment Alignment 1 Sequence X A T A – A G T Sequence Y A T G C A G T Score 1 1 -1 -2 1 1 1 Total Score = 2 Alignment 2 Sequence X A T A A G T Sequence Y A T G C A G T Score 1 1 -1 -1 -1 -1 -1 Total Score = -3 • Based on pairwise algorithm • Similarity Matrix, SM, built to compare all sequence positions • Observation that many “alignment scores” are zero value • SM reduced by storing only non-zero elements • Row-column information stored along with value • Block of memory dynamically allocated as non-zero element found • Data structure used to access allocated blocks • Parallelism introduced to reduce computation Ahmed, N, Pan, Y, Vandenberg, A and Sun, Y, "Parallel Algorithm for Multiple Genome Alignment on the Grid Environment," 6th Intl Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-05) in conjunction with (IPDPS-2005) April 4-8, 2005.

  17. Similarity Matrix Generation • Align Sequence X: TGATGGAGGT Sequence Y: GATAGG • 1 = matching; 0 = non-matching • ss = substitution score; gp = gap score • Generate SM max score with respect to neighbors:

  18. Trace sequences • Back trace matrix to find sequence matches

  19. Seq 1-2 Seq 5-6 Seq 3-4 Sequences 1-6 Sequences 7-12 Parallel distribution of multiple sequences

  20. Convergence - collaboration • Algorithm implementation • Nova Ahmed, Masters CS (now PhD student GT) • Dr. Yi Pan, Chair Computer Science • NMI Integration Testbed program • Georgia State, Art Vandenberg, Victor Bolet, Chao “Bill” Xie, Dharam Damani, et al. • University of Alabama at Birmingham, John-Paul Robinson, Pravin Joshi, Jill Gemmill, • SURAgrid • Looking for applications to demonstrate value

  21. Algorithm Validation: Shared Memory SGI Origin 2000 24 250MHz R10000; 4G Limitations • Memory (Max sequence is 2000 x 2000) • Processors (Policy limits student to 12 processors) • Not scalable Performance Validates Algorithm Computation time decreases with increased number of processors

  22. Shared Memory vs. Cluster, Grid Cluster* • UAB cluster: 8 node Beowulf (550MHz Pentium III; 512 MB RAM) • Clusters retain algorithm improvement * NB: Comparing clusters with shared memory is, of course, relative; systems are distinctly different.

  23. Grid (Globus, MPICH-G2) overhead negligible • Advantages of grid-enabled cluster: • Scalable – Can add new cluster nodes to the grid • Easier job submission – Don’t need account on every node • Scheduling is easier –Can submit multiple jobs at one time

  24. Computation Time Speed up (1 cpu / N cpu) 9 processors available in Multi Clustered Grid 32 processors for other configs. Interesting: When multiple clusters used (application spanned three separate clusters), performance improved additionally?!

  25. Grid tools used • Globus Toolkit - built on the Open Grid Services Architecture (OGSA) • Nexus - Communication library, allows multi-method communication with a single API for a wide range of protocols. Using Nexus, Message Passing Interface MPICH-G2 used in the experiments. • Resource specification language (RSL) - job submission and execution (globus-job-submit, globus-job-run) and status (globus-job-status)

  26. The Grid-enabling Story • Iterative, evolutionary, collaborative • 1st ssh to resource and get code working • 2nd submit from local account to remote globus machine • 3rd run from SURAgrid portal • SURAgrid infrastructure components providing improved work-flow • Integration with campus components enables more seamless access • Overall structure can be used as model for campus research infrastructure: • Integrated authentication/authorization • Portals for applications • Grid administration/configuration support

  27. SURAgrid Portal

  28. SURAgrid MyProxy service Get Proxy

  29. MyProxy… secure grid credential GSI proxy credentials are loaded into your account…

  30. SURAgrid Portal file transfer

  31. Job submission via Portal

  32. Output retrieved

  33. SURAgrid Account Managementlist myusers

  34. SURAgrid Account Managementadd user

  35. Multiple Genome Alignment & SURAgrid • Collaborative cooperation • Convergence of opportunity • Application / Infrastructure drivers interact • Emergent applications: • Cosmic ray simulation (Dr. Xiaochun He) • Classification/clustering (Dr. Vijay Vaishnavi, Art Vandenberg) • Muon detector grid (Dr. Xiaochun He) • Neuron (Dr. Paul Katz, Dr. Robert Calin-Jageman, Chao “Bil”l Xie) • AnimatLab (Dr. Don Edwards, Dr. Ying Zhu, David Cofer, James Reid) • IBM System p5 575 with Power5+ Processors

  36. BioSim: Bio-electric Simulator for Whole Body Tissues • Numerical simulations for electrostimulation of tissues and whole-body biomodels • Predicts spatial and time dependent currents and voltages in part or whole-body biomodels • Numerous diagnostic and therapeutic applications, e.g., neurogenesis, cancer treatment, etc. • Fast parallelized computational approach

  37. Simulation Models • From electrical standpoint, tissues are characterized as conductivities and permittivities • Whole-body discretized within a cubic space simulation volume • Cartesian grid of points along the three axes. Thus, at most a total of six nearest neighbors * Dimensions in millimeters

  38. Numerical Models • Kirchhoff’s node analysis • Recast to compute matrix only once • For large models, matrix inversion is intractable • LU decomposition of the matrix

  39. Numerical Models [M] • Voltage: User-specified time-dependent waveform • Impose boundary conditions locally • Actual data for conductivity and permittivity • Results in extremely sparse (asymmetric) matrix Red: Total elements in the matrix Blue: Nonzero Values

  40. Direct A = LU Iterative y’ = Ay More General Non- symmetric Symmetric positive definite More Robust More Robust Less Storage The Landscape of Sparse Ax=b Solvers Source: John Gilbert, Sparse Matrix Days in MIT 18.337

  41. LU Decomposition Source: Florin Dobrian

  42. LU Decomposition Source: Florin Dobrian

  43. Computational Complexity • 100 X 100 X 10 nodes: ~75 GB of memory (8-B floating precision) • Sparse data structure: ~ 6 MB (in our case) • Sparse direct solver: SuperLU-DIST • Xiaoye S. Li and James W. Dimmel, “SuperLU-DIST: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems”, ACM Trans. Mathematical Software, June 2003, Volume 29, Number 2, Pages 110-140. • Fill reducing orderings with Metis • G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs”, SIAM Journal on Scientific Computing, 1999, Volume 20, Number 1.

  44. Performance on compute clusters Time in Seconds 144,000-node Rat Model Blue: Average iteration time Cyan: Factorization time

  45. Output: Visualization with MATLAB Potential Profile at a depth of 12mm

  46. Output: Visualization with MATLAB • Simulated Potential Evolution • Along the Entire 51-mm Width of the Rat Model

  47. Deployment on • Mileva: 4-node cluster dedicated for SURAgrid purposes • Authentication • ODU Root CA • Cross certification with SURA Bridge • Compatibility of accounts for ODU users • Authorization & Accounting • Initial Goals: • Develop larger whole-body models with greater resolution • Scalability tests

  48. Grid Workflow • Establish user accounts for ODU users • SURAgrid Central User Authentication and Authorization System • Off-line/Customized (e.g., USC) • Manually launch jobs based on remote resource • SSH/GSISSH/SURAgrid Portal • PBS/LSF/SGE • Transfer files • SCP/GSISCP/SURAgrid Portal

  49. Conclusions • Science: • Electrostimulation has variety of diagnostic and therapeutic applications • While numerical simulations provide many advantages over real experiments, they can be very arduous • Grid enabling: • New possibilities with grid computing • Grid-enabling an application is complex and time consuming • Security is nontrivial

  50. Future Steps • Grid-enabling BioSim • Explore alternatives for grid enabling BioSim • Establish new collaborations • Scalability experiments with large compute clusters accessible via SURAgrid • Future applications: • Molecular and Cellular Dynamics • Computational Nano-Electronics • Tools: Gromacs, DL-POLY, LAMMPS

More Related