230 likes | 387 Views
geWorkbench caGrid TeraGrid Integration. Scott Oster Ohio State University – Dept. of Biomedical Informatics Christine Hung Columbia University – JCSB/C2B2 caBIG Architecture Face-to-Face Salt Lake City, UT January 2008. Agenda. Overview (5 min) Introduction on TeraGrid Workgroup
E N D
geWorkbenchcaGrid TeraGridIntegration Scott Oster Ohio State University – Dept. of Biomedical Informatics Christine Hung Columbia University – JCSB/C2B2 caBIG Architecture Face-to-Face Salt Lake City, UT January 2008
Agenda • Overview (5 min) • Introduction on TeraGrid Workgroup • Background on geWorkbench and geWorkbench/caGrid/TeraGrid Project • Technology (10 min) • Steps to establishing geWorkbench/caGrid/TeraGrid Interface • Use of caGrid Security (GTS, Grid Grouper, Dorian, CDS) • Workflow and communications between services • Demo (5 min) • Discussion (5 min)
Team Members • geWorkbench (Columbia University) • Christine Hung • Kiran Keshav • caGrid (Ohio State University) • Scott Oster • Stephen Langella • caGrid/TeraGrid (Argonne National Laboratory) • Ravi Madduri • TeraGrid (Argonne National Laboratory) • Stuart Martin • Management • Aris Floratos (Columbia University) • Krishnakant Shanbhag (Argonne National Laboratory) • Michael Keller (Booz Allen Hamilton) • Patrick McConnell (Duke University) • Nancy Wilkins-Diehr (San Diego Supercomputer Center)
Overview • Primary problem to address • Lack of infrastructure and operating procedures to support high performance computing needs of caBIG • Overarching goals • Regular caGrid services will run as caGrid/TeraGrid gateways services • Virtualize TeraGrid resources (both compute and storage) • Approach: labor divided between domain and technical tasks • Use cases will be drafted to identify the needs of the community • Existing TeraGrid Gateway projects will be surveyed to identify lessons learned and potential technology for reuse • Demonstrate approach through working prototype • Document best practices and develop “cookbook”
TeraGrid Overview “TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource.” • Characteristics: • > 250 teraflops of computing capability • >30 petabytes of online and archival data storage • high-performance networks • Mechanics: • Prospective users request allocation of HPC resources to a review committee • Allocations are granted, and credentials are issued • Jobs are run with credentials and resource usage is billed to the allocation
caGrid Gateway Service Overview • caGrid service running in the caBIG™ environment which acts as a bridge or proxy to TeraGrid resources for a subset of caBIG™ users • should meet Gold compatibility requirements • Created for a specific scientific scenario: • abstracts away the details of leveraging TeraGrid for performance intensive operations • uses domain-specific operations and data types • has access to TeraGrid allocation • Alleviates the need for caBIG™ users to: • understand the complexities of TeraGrid (or HPC systems) • obtain TeraGrid accounts/allocations
geWorkbench – a Platform for Integrated Genomics • Integrated genomics analysis application • Support for gene expression data, sequences, pathways, and structure • 50+ visualization and analysis modules • Access to local and remote data sources and analytical services • Integration with biological annotation sources • Development Platform • Open source • Java based • Component architecture • Facilitates customization
geWorkbench – a Platform for Integrated Genomics • Large collection of components • Data parsers: Affy MAS/GCOS (txt and CEL), Genepix, RMA, FASTA, caArray, PDB. • Data Management: Project folders, marker/sequence/array groups. • Visualization: Dendrograms, color mosaics, scatter plots, SOM clusters, BLAST results, dot matrices. • Analyses: Hierarchical clustering, t-test, SVM, ARACNE, MEDUSA. MatrixREDUCE. • 3rd Party components: Cytoscape, GoMiner, GeneWays, GenePattern, MEV. • Complete list at www.geworkbench.org.
geWorkbench – a Platform for Integrated Genomics http://www.geworkbench.org/
geWorkbench – Graphical User Interface Projects Area Visualization Area Selection Area Command Area
Creating the Gateway Service • Manually stage the binary (jar file) on TeraGrid • Takes in .ser files as input • Produces results also in a .ser file • Used the RAVi plugin for Introduce to create the gateway service • http://www-unix.mcs.anl.gov/~neillm/ravi/ • Gateway gridFTPs input data and parameters from geWorkbench to TeraGrid • geWorkbench passes input to the gateway in geWorkbench’s native format (caDSR compliant) • Gateway serializes the input before gridFTPing to TeraGrid • Gateway invokes the staged binary • Gateway gridFTPs results back to geWorkbench • Gateway deserializes the result file • Gateway returns results to geWorkbench in its native format • Gateway service is a secured caGrid service which in turn invokes TeraGrid with a caBIG community account
caGrid Security (GTS, Grid Grouper, Dorian, CDS) http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main
Special Thanks • caGrid (Security Services) • Scott Oster • Stephen Langella • caGrid(RAVi Plugin, Gateway Service) • Ravi Madduri
caGrid Security (GTS, Grid Grouper, Dorian, CDS) http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main
caGrid Security (GTS, Grid Grouper, Dorian, CDS) http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main