420 likes | 524 Views
cancer Bioinformatics Infrastructure Objects (caBIO). Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC). caBIO.
E N D
cancer Bioinformatics Infrastructure Objects (caBIO) Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC)
caBIO • Thecancer Bioinformatics Infrastructure Objects (caBIO) is an infrastructure which integrates internal and publicly available bioinformatics data spanning multiple scientific disciplines • caBIO objects simulate the behavior of actual bioinformatics components such as genes, chromosomes, sequences, ontologies, trials, agents, etc. • caBIO provides access to a variety of bioinformatics data sources including, Unigene, Homologene, LocusLink, RefSeq, BioCarta, GoldenPath (via DAS), and NCICB’s CGAP (Cancer Genome Anatomy Project) and GAI (Genetic Annotation Initiative) data repositories • caBIO is “open source” and provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details and data management
Model Extensions Clinical Protocols • A clinical protocols object model facilitates the integration of clinical data with genomic data Animal Models • An animal models object model supports queries between human and animal models of cancer
MAGE-OM Extension • caBIO is currently being extended to included the MAGE-OM object model • Microarray data from the NCICB Gene Expression Data Portal (GEDP) will be retrieved via the caBIO MAGE API
caBIO APIs • A Java API is available for Java programmers • A Simple Object Access Protocol (SOAP) API is provided for non-Java programmers • An HTTP API is also available • Developers can request XML or HTML (via XSLT)
Powered by caBIO! caBIO ApplicationsCancer Molecular Analysis Project (CMAP)
Powered by caBIO! Molecular Targets • A collection of genes organized by pathways can be displayed facilitating the evaluation of anomalies
Powered by caBIO! Targeted Agents • Researchers can retrieve information about agents linked to multiple targets and contexts
Powered by caBIO! Clinical Trials • Researchers can view detailed information about therapeutic trials associated with histology types and agents • A Clinical Protocols Portal is available to allow researchers to search and submit clinical protocols affiliated with Specialized Programs of Research Excellence (SPOREs)
caBIO Architecture • caBIO was designed using a J2EE architecture with client interfaces, server components, back-end objects and data sources • Clients (browsers, applications) can receive information (HTML and XML) from back-end objects over HTTP • Client applications can also communicate with back-end objects via Java RMI (Java applications) • Non-Java based applications can communicate via SOAP or HTTP • Server components communicate with back-end objects via Java RMI • Back-end objects communicate directly with data sources (database, URLs, flat files) • caBIO web services can be advertised to facilitate information sharing • RDF can be used to advertise content to crawlers and agents • A UDDI registry may be configured to advertise services • caBIO services can be advertised via bioMOBY central
Clients Presentation Layer Object Layer Data Sources Web Server Servlet Container JSPs External Databases HTML/HTTP Data Access Objects Servlets Object Managers Browsers SOAP Engine JDBC EVS XML/HTTP Other Apps RMI caDSR UI Bean Domain Objects SOAP HTTP XML Builder Chromosomes Genes URLs XSLT Engine Tissues Clusters Agents RDF FTP Libraries Sequences DTDs Flat Files XML Docs Diseases XSL Style Sheet Other Java Apps caBIO Architecture
Data Sources External Public Databases CGAP Database UniGene RefSeq Reference Sequences Genes, Sequences Chromosomes caBIO GO BioCarta CGAP/ GAI CTEP/ SPOREs Gene Ontology Pathways SNPs Trials Locus Link Homolo Gene UCSC Golden Path DAS Homologs Gene Loci, Locus Link Summaries Gene Annotations Genes, Sequences
caBIO Benefits • Provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details • Permits access to allow developers to obtain the information they need from a variety of data sources without data management • Manages the display of large volumes of data to assist in load balancing • Provides an effective mechanism for performing complex queries that rely on diverse data sources • Facilitates information sharing without managing linkages between multiple data sources
caBIO Usage Facilitates solving Complex Queries such as: Find me the Pathways, with Genes that are expressed in tissues with a particular Histopathology that includes a particular Organ and a particular Disease.
Java Packages • gov.nih.nci.caBIO.bean • Contains domain objects to access genomic and biomedical components • gov.nih.nci.caBIO.util.das • Primary interface to the UCSC DAS • Uses JAXB to convert DAS DTDs to objects • gov.nih.nci.caBIO.evs • Provides synonym search and concept based search to the NCI’s Enterprise Vocabulary System (EVS) • gov.nih.nci.caBIO.webservices • Provides access to caBIO via SOAP • gov.nih.nci.caBIO.servlet • Provides access to caBIO via HTTP • gov.nih.nci.caBIO.util • Provides interface to caBIO utilities
Java API Domain objects have companion SearchCriteria objects Gene myGene = new Gene(); GeneSearchCriteria criteria = new GeneSearchCriteria(); criteria.setSymbol("pTEN"); SearchResult result = myGene.search(criteria); Gene[] genes = (Gene[]) result.getResultSet(); • caBIO supports nested SearchCriteria • SearchCriteria from one object type can be fed as parameters into SearchCriteria of another type. • Complex queries without any SQL
Traverse Relationships in Model Find me thePathways, withGenesthat are expressed inTissueswith a particularHistopathologythat includes a particularOrganand a particularDisease. INPUT Disease Histopathology Genes Organ Pathways OUTPUT
findPathway Input disease, organ; create SearchCriteria Objects: public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();
findPathway Nest the SearchCriteria, then do the search: diseaseCriteria.setName(disease); organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND); histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND); Pathway myPathway = new Pathway(); return myPathway.searchPathways(pathCriteria); }
Web Services: SOAP http://cabio.nci.nih.gov/soap/services/index.html
SOAP API Perl Example use SOAP::Lite; $s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter"); my %searchCriteria=(); $searchCriteria{symbol}=“pTEN”; $som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria)); $xmldoc = $som->result;
SOAP output with xlinks <?xml version="1.0" encoding="UTF-8" ?> <nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs> <Pathwayxlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt><endsAt>3</endsAt> </searchResult> </nci-core>
SOAP with returnHeavyXML Data is now returned in full. Pathway object snippet: <gov.nih.nci.caBIO.bean.Pathway id="92"> <name>ptenPathway</name> <displayValue>PTEN Dependent Cell Cycle Arrest and Apoptosis</displayValue> <pathwayDiagram>ptenPathway.svg</pathwayDiagram> </gov.nih.nci.caBIO.bean.Pathway>
HTTP API Direct access to XML-formatted data via URLs: http://cabio.nci.nih.gov/servlet/GetXML? operation=Gene&Symbol=pTEN Method Parameter Value Search Parameter
HTTP API Direct access to SVG-formatted data via URLs: http://cabio.nci.nih.gov:80/servlet/GetSVG?operation=Pathway&name=g2Pathway&GeneInfoLocation=/servlet/GetXML?operation=Gene&ielikes=.svg Method Parameter Value Search Parameter
BIOgopher • BIOgopher enables a researcher to perform complex queries against caBIO data sources • Researchers can: • Provide local data • Create custom queries • Design custom reports
Importing Local Data • Researchers can import local experiment data in spreadsheet format • Researchers can leverage imported data during the session • Researchers can include imported data in defining custom queries and reports
Creating a Query • Researchers can create a query or access an existing query within a session • Researchers specify the caBIO object that will be the subject of the query
Specifying Search Criteria • Researchers can dynamically specify search criteria • Attributes of caBIO objects related to the chosen subject can be selected as search criteria • Local data can be fetched for inclusion as search criteria • Researchers can browse caBIO data for inclusion in search criteria values
Creating a Report • Researchers can create and format reports based on the selected search criteria • Reports can be viewed and exported as a spreadsheet
BIOgopher Architectural Details • Leveraged the Model-View-Controller 2 (MVC 2) architecture • Abstracted the presentation layer from spreadsheet manipulation, meta-data retrieval, query design, and report generation • Developed a server-side N-dimensional query builder • An object-cube was leveraged in support of object-mining
Presentation Layer • Leverages the Jakarta Struts Project
Spreadsheet Manipulation • Leverages the Apache POI Project
Meta-Data Layer • Leverages NCICB’s caDSR
Query Design • Leverages Java Swing components for trees, nodes and tables
caBIO Kernel BIOgopher Client • Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server • Leverages the JXTA protocol for peer-to-peer communication Proxy NCICB caBIO server Object/DB bridge 5. Queries Persistence Layer DSI 4. Parses query and DSI and authenticates user (in any) . 1. Sends query and user info 8. Returns results 6.Returns objects to requestor 3. Passes query to NCICB server Local caBIOServer 7.Queries data map Datamap 8.Queries Persistence layer Object/DB bridge 2. Parses query and DSI, and authenticates user .
Future • caBIO "kernel" • Object-level Security Module (fine grain) • Standard LDAP authentication • Vocabulary Object to extend the EVS API • caDSR objects, API • MAGE-OM, API • Animal Models-OM, API • Extend pathway object model to support KEGG and BioCarta interactions • Analytical Tool handling i.e. BLAST.
Future • New Data Sources • Proteins • PDB, PIR, BioJava for protein data • OMIM - For link from proteins to diseases • Agents - Access agent data from EVS and DCP • Pharmacokinetics • Histology, Tissue/Organ - Leverage EVS vocabulary (currently LASH) • PubMed
NCICB Kenneth Buetow Peter Covitz Carl Schaefer Robert Clifford Mike Edmonson Frank Hartel Sherri DeCoronado SAIC Scott Gustafson Mike Connolly Joshua Phillips Kevric ( documentation ) Diane Zimmerman Acknowledgements Visit our new and improved web site: http://ncicb.nci.nih.gov/core/caBIO