210 likes | 225 Views
Explore the NMI Testbed GRID Virtual Organization for researchers, leveraging enterprise middleware and collaborative environments to enhance grid capabilities. Participating sites push interoperation limits and address interoperation challenges. Catalogs of grid applications, including genome alignment for VOs, are vital resources. Learn about genome alignment problems, improved parallel algorithms, and results on shared memory machines, Hydras, Beowulf clusters, and grids.
E N D
NMI Testbed GRID Utility for Virtual Organization Art Vandenberg Avandenberg@gsu.edu Director, Advanced Campus Services Georgia State University
NSF Supported • This material is based in part upon work supported by the National Science Foundation under • Grant No. ANI-0123937 and • Grant No. ITR-0312636. • Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).
Overview • NMI Testbed GRID – “virtual organization” • Participating sites • Resources for VO • Catalog of grid applications Example: genome alignment for VO Plans for May-August 2004
Vision – NMI Testbed GRID VO • NMI Integration Testbed Program • NSF #ANI-0123937 • Explore grid capability – interoperability • Researchers & faculty • Across heterogeneous sites • Integrated with enterprise middleware • A utility grid using NMI components • Non-specialized, open, transparent
Collaborative environment VO • Beyond application specific grids • Leverage enterprise middleware • Identity management, authN, authZ... • Strive for transparent access • Portals • Ease of use: submit, monitor, retrieve data • Security policy & technology • Federation of cooperating sites
Participating sites - the VO • Testbed sites – push interoperation limits • Georgia State University • Texas Advanced Computing Center • University of Alabama at Birmingham • University of Alabama at Huntsville • University of Michigan • University of Southern California • University of Virginia
Site resources – VO • Testbed sites – interoperation challenges • GSU: Shibboleth, GridPort portal, REU & Grads, disk • TACC: REU student, portal, Enterprise CA, cluster • UAB: beowulf cluster, CA, Pubcookie, OGCE portal • UAH: application expertise, NASA IPG Certs • UMich: KX.509 & Kerberos, MGrid, ATLAS integration • USC: CA, Pubcookie, Shibboleth, Linux cluster, KX.509 • UVa: Bridge CA model • Sites non-homogeneous – a VO challenge
Catalog of grid applications • Knowledge base is important • REU students – Nicole Geiger, Anish Shindore • Graduate Research Asst – Manish Garg • NMI Testbed Sites initially • Researchers, schools, projects • Grid specific as well as grid potential • Started as spreadsheet, now online db
Catalog of grid applications • Catalog of Grid Applications (current version) • http://art12.gsu.edu:8080/grid_cat/index5.jsp • Expanding scope beyond testbed sites • 18 schools/labs, 300 researchers & counting • Differentiated from Globus www.gpds.org • Oriented to researcher, institutional level • Planning clustering, visualization modality • Clustering work related to: NSF #ITR-0312636
Example: genome alignment for VO(GSU – UAB) • An opportunity for utility Grid VO • Nova Ahmed, CS grad with Dr. Yi Pan, GSU • dynamic programming algorithm for genome sequence alignment • Initial runs on GSU shared memory hydra • Limited access (grad student, shared cycles) • Algorithm improvement using multi-processor cluster across a grid?
The Genome Alignment Problem • Alignment of DNA sequences • Sequence X: TGATGGAGGT • Sequence Y: GATAGG • Count the matching score as • 1 => matching • 0 => non-matching • Populate the Similarity matrix using: Observation re Similarity Matrix: • Many zero values • Reduction of memory possible by reducing zero value elements
Improved Parallel Algorithm for Genome Alignment • The parallel Method: • Similarity matrix is divided among processors • Processors calculate in parallel to match the partial sequence • Communication is done among the processors to match the whole sequence • The new Data Structure: • New algorithm calculates only non-zero values of the similarity matrix • Memory is dynamically allocated as needed
Results on the Shared Memory Machine (Hydra) • Limitations • Can not allocate memory for long sequences Ex: Largest sequence to align is 2000 x 2000 • Number of processors are limited Ex: For Hydra 12 processors • Not scalable Performance Computation time decreases with increased number of processors
Using the beowulf cluster: Longer genome sequence can be aligned Highest sequence length can be 10,000 in the cluster Limited scalability Can increase the number of processors up to a certain limit Results on the Beowulf Cluster of UAB
Results via the GRID at UAB Submitting genome alignment program using Globus and MPICH • Advantages: • Scalable – Can add new clusters to the grid • Easier job submission – Don’t need account on every node • Scheduling is easier – Can submit multiple jobs at one time
Future Work Genome Alignment Use MPICH-G2 (instead of MPICH) – • Use the power of Grid Expand the computational resources – • Combine more clusters across the Grid Develop program to align Multiple Genome Sequences (rather than two at a time) – • Requiring more computation resources Use Georgia State certificate via Bridge CA • Via Shibboleth protected sector CA…?
Plans for May-August 2004 • More resources • Contributed from current sites (others?) • Portal for NMI Testbed GRID • Cf. NPACI Hotpage https://hotpage.npaci.edu/ • Integration of campus authN • UVa Bridge CA • More applications • Utility grid for grad research & education
Plans for May-August 2004… • Documentation • Web site • Application docs and demos • Catalog of Grid Applications • Provide for self service contribution • Develop clustering (SOM), visualization options (“find researchers or projects like X”) • Auto-discovery of Grid researchers & apps based on reference sets (core sites)?
Contact Information • Art Vandenberg • Avandenberg@gsu.edu • NMI Testbed GRID • http://www.gsu.edu/~wwwacs/GRID_Group/NMI.html