210 likes | 309 Views
NMI Testbed GRID Utility for Virtual Organization. Art Vandenberg Avandenberg@gsu.edu Director, Advanced Campus Services Georgia State University. NSF Supported. This material is based in part upon work supported by the National Science Foundation under Grant No. ANI-0123937 and
E N D
NMI Testbed GRID Utility for Virtual Organization Art Vandenberg Avandenberg@gsu.edu Director, Advanced Campus Services Georgia State University
NSF Supported • This material is based in part upon work supported by the National Science Foundation under • Grant No. ANI-0123937 and • Grant No. ITR-0312636. • Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).
Overview • NMI Testbed GRID – “virtual organization” • Participating sites • Resources for VO • Catalog of grid applications Example: genome alignment for VO Plans for May-August 2004
Vision – NMI Testbed GRID VO • NMI Integration Testbed Program • NSF #ANI-0123937 • Explore grid capability – interoperability • Researchers & faculty • Across heterogeneous sites • Integrated with enterprise middleware • A utility grid using NMI components • Non-specialized, open, transparent
Collaborative environment VO • Beyond application specific grids • Leverage enterprise middleware • Identity management, authN, authZ... • Strive for transparent access • Portals • Ease of use: submit, monitor, retrieve data • Security policy & technology • Federation of cooperating sites
Participating sites - the VO • Testbed sites – push interoperation limits • Georgia State University • Texas Advanced Computing Center • University of Alabama at Birmingham • University of Alabama at Huntsville • University of Michigan • University of Southern California • University of Virginia
Site resources – VO • Testbed sites – interoperation challenges • GSU: Shibboleth, GridPort portal, REU & Grads, disk • TACC: REU student, portal, Enterprise CA, cluster • UAB: beowulf cluster, CA, Pubcookie, OGCE portal • UAH: application expertise, NASA IPG Certs • UMich: KX.509 & Kerberos, MGrid, ATLAS integration • USC: CA, Pubcookie, Shibboleth, Linux cluster, KX.509 • UVa: Bridge CA model • Sites non-homogeneous – a VO challenge
Catalog of grid applications • Knowledge base is important • REU students – Nicole Geiger, Anish Shindore • Graduate Research Asst – Manish Garg • NMI Testbed Sites initially • Researchers, schools, projects • Grid specific as well as grid potential • Started as spreadsheet, now online db
Catalog of grid applications • Catalog of Grid Applications (current version) • http://art12.gsu.edu:8080/grid_cat/index5.jsp • Expanding scope beyond testbed sites • 18 schools/labs, 300 researchers & counting • Differentiated from Globus www.gpds.org • Oriented to researcher, institutional level • Planning clustering, visualization modality • Clustering work related to: NSF #ITR-0312636
Example: genome alignment for VO(GSU – UAB) • An opportunity for utility Grid VO • Nova Ahmed, CS grad with Dr. Yi Pan, GSU • dynamic programming algorithm for genome sequence alignment • Initial runs on GSU shared memory hydra • Limited access (grad student, shared cycles) • Algorithm improvement using multi-processor cluster across a grid?
The Genome Alignment Problem • Alignment of DNA sequences • Sequence X: TGATGGAGGT • Sequence Y: GATAGG • Count the matching score as • 1 => matching • 0 => non-matching • Populate the Similarity matrix using: Observation re Similarity Matrix: • Many zero values • Reduction of memory possible by reducing zero value elements
Improved Parallel Algorithm for Genome Alignment • The parallel Method: • Similarity matrix is divided among processors • Processors calculate in parallel to match the partial sequence • Communication is done among the processors to match the whole sequence • The new Data Structure: • New algorithm calculates only non-zero values of the similarity matrix • Memory is dynamically allocated as needed
Results on the Shared Memory Machine (Hydra) • Limitations • Can not allocate memory for long sequences Ex: Largest sequence to align is 2000 x 2000 • Number of processors are limited Ex: For Hydra 12 processors • Not scalable Performance Computation time decreases with increased number of processors
Using the beowulf cluster: Longer genome sequence can be aligned Highest sequence length can be 10,000 in the cluster Limited scalability Can increase the number of processors up to a certain limit Results on the Beowulf Cluster of UAB
Results via the GRID at UAB Submitting genome alignment program using Globus and MPICH • Advantages: • Scalable – Can add new clusters to the grid • Easier job submission – Don’t need account on every node • Scheduling is easier – Can submit multiple jobs at one time
Future Work Genome Alignment Use MPICH-G2 (instead of MPICH) – • Use the power of Grid Expand the computational resources – • Combine more clusters across the Grid Develop program to align Multiple Genome Sequences (rather than two at a time) – • Requiring more computation resources Use Georgia State certificate via Bridge CA • Via Shibboleth protected sector CA…?
Plans for May-August 2004 • More resources • Contributed from current sites (others?) • Portal for NMI Testbed GRID • Cf. NPACI Hotpage https://hotpage.npaci.edu/ • Integration of campus authN • UVa Bridge CA • More applications • Utility grid for grad research & education
Plans for May-August 2004… • Documentation • Web site • Application docs and demos • Catalog of Grid Applications • Provide for self service contribution • Develop clustering (SOM), visualization options (“find researchers or projects like X”) • Auto-discovery of Grid researchers & apps based on reference sets (core sites)?
Contact Information • Art Vandenberg • Avandenberg@gsu.edu • NMI Testbed GRID • http://www.gsu.edu/~wwwacs/GRID_Group/NMI.html