500 likes | 618 Views
Emerging Standards for Interoperable Biological Systems. Technology for Life: North Carolina Symposium on Biotechnology and Bioinformatics. Standards: Why do we care?. IEEE standards for plugs, outlets and wiring – I can buy an appliance and use it ( most of the time )
E N D
Emerging Standards for Interoperable Biological Systems Technology for Life: North Carolina Symposium on Biotechnology and Bioinformatics Dr. Marty McClelland
Standards: Why do we care? • IEEE standards for plugs, outlets and wiring – I can buy an appliance and use it ( most of the time ) • Any international traveler will tell you that standards vary around the world Dr. Marty McClelland
Without Standards - • Custom builds by experts • Build once – use once • Need expertise in specific domain • Expensive • Most of us – still using candles Dr. Marty McClelland
Standards and Software • World Wide Web • Plug and play • Plug-in / modular components • XML: Extensible Markup Language • Web Services • Federated Search • Grid Services Dr. Marty McClelland
My Standards Journey • Middleware to integrate learning systems with enterprise resource planning systems • IMS / IEEE learning technology standards – learning object metadata • National Science Digital Library – STEM LOM repository • NCCU BBRI Cardiovascular Study – similar issues Dr. Marty McClelland
Bioinformatics Community • Embraced open source • Philosophy of sharing of data and tools • Community involvement yields foundation for standards development Dr. Marty McClelland
Emerging Standards • tools/middleware – web services for harvesting – federated searches • grid computing • ontologies – developing controlled vocabularies • analysis– standards for sharing results– e.g. microarray analysis • models- Systems Biology – standards for interchange Dr. Marty McClelland
Sharing Data, Tools, and Middleware • XML, go to http://www.w3.org/XML/ • Specifications for data interchange in biology applications (XML schemas) • Web services • Define WSDL for biology applications Dr. Marty McClelland
AnatML, CellML, BIOML, GEML, MSAML, GeneXML, MAGE-ML, BSML, CDISC, and HL7 XML for data exchange Dr. Marty McClelland
Virginia Bioinformatics Institute • toolbus • PathPort • Middleware for web services • query multiple databases • facilitate decision making and data interpretation • http://staff.vbi.vt.edu/pathport/services/ Dr. Marty McClelland
BioMOBY • simple extensible protocols • Web services for interoperable databases • http://biomoby.org/ Dr. Marty McClelland
Grid Computing • user authentication and authorization ( like X.509 certificates ) • Open Grid Computing Environment (OGCE) portal toolkit • Open Grid Services Architecture , OGSA • Globus Toolkit Dr. Marty McClelland
Grid Applications • iNquiry – commercial product • NC BioGrid prototype / planning stages • statewide Bioinformatics Portal being created by the University of North Carolina at Chapel Hill • GridNexus project Dr. Marty McClelland
Ontologies • Controlled vocabulary • Crosswalks between controlled vocabularies • Interoperability • Browse and search services across disparate repositories • www.geneontology.org Dr. Marty McClelland
Data Analysis • MIAME, minimal information for the annotation of a microarray experiment • http://mged.sourceforge.net/ontologies/index.php Dr. Marty McClelland
Systems Biology • Historically – many custom, small scale models with little reuse • Goal of Systems Biology is to construct the system with modular models where data can be supplied via web service queries to databases Dr. Marty McClelland
Model Integration • Biology Workbench (SBW) strives to support model integrations through • Systems Biology Markup Language ( SMBL) – XML to represent biochemical networks – common framework to document models • SBW provides framework for interoperation across heterogeneous modeling tools http://sbml.org/index.psp Dr. Marty McClelland
Implications • expose databases with web services • construct queries to locate the data • standards for grid services • community developed XML schemas for sharing biological data Dr. Marty McClelland
GridNexus Dr. Marty McClelland
UNCW Grid Initiative: GridNexus • The UNCW Grid Computing Project is a two-year collaborative project among a multi-discipline, multi-investigator core research team at UNCW and several discipline-focused researchers at partner institutions: NCSU, WCU, NCCU, ECU, and CFCC. The research areas and institutional interests of this project are: • Advanced Grid Software Development (UNCW) • Computational Chemistry (UNCW and ECU) • Bioinformatics (UNCW, NCSU, and NCCU) • Combinatorics (UNCW) • Business Computing (UNCW and NCCU) • Education and Training (UNCW, WCU, CFCC) • This project proposes to develop a Grid interface that is easy-to-use and may be used by a wide-range of applications and users. We have developed an innovative graphical user interface (GUI) for grid applications. In particular, we introduced a new scripting language (JXPL) designed for web-based services, a GUI for creating scripts, and have demonstrated the use of these tools with grid services. Dr. Marty McClelland
GridNexus • This initiative grew in part out of a need for HPC resources following the closure of the NCSC in June 2003, coupled with the availability of faculty with software programming expertise and others with computing applications that could benefit from use of a Grid. • The UNC-OP funded UNCW’s proposal for $557,634 over two years to develop Grid portals (GUI middleware to allow users to access software on computers on a Grid). Dr. Marty McClelland
Resources of UNCW Grid • Beowulf cluster – 16 PIII processors in Computer Sciences Department • Fire and FireDev servers plus disc storage devices • PQS Quantum Cube – 8 cpu cluster with PQS and Gaussian 03 computational chemistry software, plus TCP-Linda environment. • An 8 processor IBM blade cluster with 0.5 tB disk storage will be added soon. • Other computers may be added, including the possibility of using all computing lab computers, or possibly even all faculty/staff computers (when not in use). Dr. Marty McClelland
GridNexus • The objective is to make accessing HPC resources (wherever they may be located) easy to scientists who are not computer savvy. • Most computation involves doing various mathematical operations on a dataset. • A GUI approach is employed, in which the user, after a single login that checks authentication and authorization, can create a ‘workflow’ of functions/operations graphically by connecting boxes dragged from a series of lists of options, then applying that series of steps to a dataset. • Such a ‘workflow’ can be saved for subsequent application to another dataset. Dr. Marty McClelland
GridNexus • Job submission: Ideally in a grid, the grid middleware should select the ‘best’ resource – those computers that are available, capable, and have the software needed to handle the job. • The user need not select – nor know – where the computation is taking place. In fact, the job may even be passed from one computer to another for various aspects of the calculation. • The output is returned to the user’s workstation or account, rather than the user having to access and download the output file from a remote computer. Dr. Marty McClelland
GridNexus • GridNexus is a GUI that allows the user to create/edit/run workflows • Based on Ptolemy II http://ptolemy.eecs.berkeley.edu/ptolemyII. Ptolemy provides the GUI and workflow features. We have extended it to provide the functionality we want (JXPL and GridServices) • Release 1.0.0 download available www.gridnexus.org Dr. Marty McClelland
Getting Started • The right frame is the palette for building workflows • The upper left frame provides the library of modules • The lower left is a thumbnail of the entire workflow Dr. Marty McClelland
The Basics • Sources produce data without needing input • Sinks consume data but may have side effects (such as displaying results) • All workflows must start with sources and end with sinks Dr. Marty McClelland
Simple Example 1 • Click and drag the “Const” source to the workflow. • Click and drag the “JxplDisplay” sink to the workflow Dr. Marty McClelland
Simple Example 1 • Double click on the Const module • Change its value to 10 • Click commit • The new value is shown on the icon Dr. Marty McClelland
Simple Example 1 • Input ports are on the left-hand side and output ports are on the right-hand side of each module • Click and drag from the output port of the Const module to the JxplDisplay Dr. Marty McClelland
Simple Example 1 • A link (or relation) is created between the two modules • The output of Const is consumed by the JxplDisplay Dr. Marty McClelland
Simple Example 1 • Click on the run button ( ) • The JxplDisplay evaluates the input and produces a display window to show the results. • Notice the output is in XML (actually JXPL) Dr. Marty McClelland
Simple Example 2 • Transformers are modules that take input, transform it, and produce new output • This example computes the express: (23 + 6) * -2 Dr. Marty McClelland
Simple Example 2 • The Multiplication module takes the result of the addition (its first input) and multiplies that by -2 (its second input) • The result is consumed by JxplDisplay Dr. Marty McClelland
What's Going On? • The workflow is not actually performing the operations. Instead it is creating a script (JXPL) that, when executed, produces the result • The JxplDisplay is evaluating the script and displaying the results Dr. Marty McClelland
What's Going On? • Double-click on the JxplDisplay and deselect the “Evaluate Jxpl” parameter • This parameter tells JxplDisplay whether or not to evaluate the script that is generated Dr. Marty McClelland
What's Going On? • Now when we run it, we see the actual script that is produced by the workflow • The script is written in XML using a language developed at UNCW called JXPL Dr. Marty McClelland
A Little Bit about JXPL • JXPL is based on LISP • The corresponding LISP to the JXPL on the right looks like: (* (+ (23 6) -2) Dr. Marty McClelland
A Little Bit about JXPL • Why? • XML is used to transport data between web/grid services • XML opening/closing tags <-> LISP opening/closing parens • Everything is either an atom or a list (functions, Data Structures) Dr. Marty McClelland
GridNexus and JXPL with Grid Services • create workflows that can make use of web and grid services • implement primitives in JXPL that are generic web and grid clients • inspect the WSDL of the service to determine its interface Dr. Marty McClelland
GSClient module • GSClient module : whereby the user can specify the factory URL, the instance name of the service, the stub class, and the port type • primitive uses the OGSIServiceGridLocator to find the grid service and invoke the appropriate method with the arguments Dr. Marty McClelland
GridNexus and OGSA-DAI • OGSA-DAI Grid Data Services are designed so that the output of one can be delivered to another • GridNexus allows non-programmers to create JXPL to control GDS interaction in a graphical environment Dr. Marty McClelland
Using OGSA-DAI grid service clients Dr. Marty McClelland
Molecular biology workflow created in GridNexus Dr. Marty McClelland
Molecular chemistry workflow in GridNexus Dr. Marty McClelland
Build the Library • Identify tasks in scientific workflows • Investigate existing open source modules for possible integration with GridNexus • Design for reuse incorporating appropriate standards • Implement library module in GridNexus Dr. Marty McClelland
GridNexus • Release 1.0.0 download available www.gridnexus.org Dr. Marty McClelland
Acknowledgments • UNC-OP for funding the UNCW Grid Initiative Proposal: “Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North Carolina Computing Grid,” Dr. Ron Vetter, PI • Co-PIs:Dr. Rebecca S. Boston, NCSU; Dr. Anthony Wilkinson, WCU; Dr. Marilyn McClelland, NCCU; Dr. Libero Bartolotti, ECU; Ms. Judy Porter, CFCC. • UNCW Participants: Computer Science: Dr. Ron Vetter, Dr. Clayton Ferner, Dr. David Berman, and Dr. Tom Hudson. Information Technology Systems: Dr. Bob Tyndall and Mr. Bobby Miller. Mathematics and Statistics: Dr. Jeff Brown. Chemistry and Biochemistry: Dr. Ned H. Martin. Biological Sciences: Dr. Ann Stapleton Information Systems and Operations Management: Dr. Tom Janicki. • UNCW Computer Science students working on the Chemistry portal: Tristan Carland, Jerry Martin, Andrew Martin Dr. Marty McClelland
Acknowledgments • Grid Computing: Harnessing Underutilized Resources Dr. Ned H. Martin • GridNexus UNCW GUI for Workflow Management Dr. Clayton Ferner • GridNexus: A Grid Services Scientific Workflow SystemJeffrey L. Brown, Clayton S. Ferner, Thomas C. Hudson, Ann E. Stapleton, Ronald J. Vetter, Andrew Martin, Jerry Martin, Allen Rawls, William J. Shipman, and Michael Wood Dr. Marty McClelland