470 likes | 697 Views
NCICB Informatics. Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC). NCI Center for Bioinformatics Building common architecture, common tools, and common standards. access portals. participating group nodes. Clinical Trials.
E N D
NCICB Informatics Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC)
NCI Center for BioinformaticsBuilding common architecture, common tools, and common standards accessportals participatinggroup nodes ClinicalTrials MolecularSignatures CMAP caCore • Establish common data elements • Provide data exchange infrastructure • Develop electronic data interfaces • Distribute architecture model • Provide application tool chest • Develop portals CancerGenomics MouseModels GAI CGAP
caBIO • Thecancer Bioinformatics Infrastructure Objects (caBIO) is a standards based set of bioinformatics components • caBIO objects simulate the behavior of actual genomic components such as genes, chromosomes, sequences, libraries, clones, ontologies, etc. • caBIO provides access to a variety of genomic data sources including, Unigene, Homologene, LocusLink, RefSeq, BioCarta, GoldenPath (via DAS), and NCICB’s CGAP (Cancer Genome Anatomy Project) and GAI (Genetic Annotation Initiative) data repositories • caBIO is “open source” and provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details and data management
Model Extensions Clinical Protocols • A clinical protocols object model facilitates the integration of clinical data with genomic data Animal Models • An animal models object model supports queries between human and animal models of cancer
Powered by caBIO! CMAPCancer Molecular Analysis Projecthttp://cmap.nci.nih.gov
Powered by caBIO! Molecular Targets • A collection of genes organized by pathways can be displayed facilitating the evaluation of anomalies
Powered by caBIO! Gene Navigator • The Gene Navigator presents gene-to-gene and enzyme-to-enzyme relationships as nodes in a connected graph • Spring, Rosette, and Clan-based algorithms are represented • Links between related nodes identify the relationships and the source pathways in which they occur • Researchers can drill down to detailed information about genes and pathways
Powered by caBIO! Targeted Agents • Researchers can retrieve information about agents linked to multiple targets and contexts
Powered by caBIO! Clinical Trials • Researchers can view detailed information about therapeutic trials associated with histology types and agents • A clinical protocols portal is available to allow researchers to search and submit clinical protocols affiliated with Specialized Programs of Research Excellence (SPOREs)
caBIO Architecture • caBIO was designed using a J2EE architecture with client interfaces, server components, back-end objects and data sources • Clients (browsers, applications) can receive information (HTML and XML) from back-end objects over HTTP • Client applications can also communicate with back-end objects via Java RMI (Java applications) • Non-Java based applications can communicate via SOAP or HTTP • Server components communicate with back-end objects via Java RMI • Back-end objects communicate directly with data sources (database, URLs, flat files) • caBIO web services can be advertised to facilitate information sharing • RDF can be used to advertise content to crawlers and agents • A UDDI registry may be configured to advertise services • caBIO services can be advertised via bioMOBY central
Clients Presentation Layer Object Layer Data Sources Web Server Servlet Container JSPs External Databases HTML/HTTP Data Access Objects Servlets Object Managers Browsers SOAP Engine JDBC EVS XML/HTTP Other Apps RMI caDSR UI Bean Domain Objects SOAP HTTP XML Builder Chromosomes Genes URLs XSLT Engine Tissues Clusters Agents RDF FTP Libraries Sequences DTDs Flat Files XML Docs Diseases XSL Style Sheet Other Java Apps caBIO Architecture
Data Sources External Public Databases CGAP Database UniGene RefSeq Reference Sequences Genes, Sequences Chromosomes caBIO KEGG BioCarta CGAP/ GAI CTEP/ SPOREs Pathways Pathways SNPs Trials Locus Link Homolo Gene UCSC Golden Path DAS Homologs Gene Loci, Locus Link Summaries Gene Annotations Genes, Sequences
caBIO Usage Facilitates solving Complex Queries such as: Find me the Pathways, with Genes that are expressed in tissues with a particular Histopathology that includes a particular Organ and a particular Disease.
caBIO APIs • A Java API is available for Java programmers • A Simple Object Access Protocol (SOAP) API is provided for non-Java programmers • An HTTP API is available • Developers can request XML or HTML (with XSL)
Java Packages • gov.nih.nci.caBIO.bean • Contains domain objects to access genomic and biomedical components • gov.nih.nci.caBIO.util.das • Primary interface to the UCSC DAS • Uses JAXB to convert DAS DTDs to objects • gov.nih.nci.caBIO.evs • Provides synonym search and concept based search to the NCI’s Enterprise Vocabulary System (EVS) • gov.nih.nci.caBIO.webservices • Provides access to caBIO via SOAP • gov.nih.nci.caBIO.servlet • Provides access to caBIO via HTTP • gov.nih.nci.caBIO.util • Provides interface to caBIO utilities
Java API Domain objects have companion SearchCriteria objects Gene myGene = new Gene(); GeneSearchCriteria criteria = new GeneSearchCriteria(); criteria.setSymbol("pTEN"); SearchResult result = myGene.search(criteria); Gene[] genes = (Gene[]) result.getResultSet(); • caBIO supports nested SearchCriteria • SearchCriteria from one object type can be fed as parameters into SearchCriteria of another type. • Complex queries without any SQL
Traverse Relationships in Model Find me the Pathways, with Genes that are expressed in tissues with a particular Histopathology that includes a particular Organ and a particular Disease. INPUT Disease Histopathology Genes Organ Pathways OUTPUT
findPathway Input disease, organ; create SearchCriteria Objects: public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();
findPathway Nest the SearchCriteria, then do the search: diseaseCriteria.setName(disease); organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND); histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND); Pathway myPathway = new Pathway(); return myPathway.searchPathways(pathCriteria); }
Web Services: SOAP http://cabio.nci.nih.gov/soap/services/index.html
SOAP API Perl Example use SOAP::Lite; $s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter"); my %searchCriteria=(); $searchCriteria{symbol}=“pTEN”; $som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria)); $xmldoc = $som->result;
SOAP output with xlinks <?xml version="1.0" encoding="UTF-8" ?> <nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs> <Pathwayxlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt><endsAt>3</endsAt> </searchResult> </nci-core>
SOAP with returnHeavyXML Data is now returned in full. Here is a Pathway object snippet: <gov.nih.nci.caBIO.bean.Pathway id="92"> <name>ptenPathway</name> <displayValue>PTEN Dependent Cell Cycle Arrest and Apoptosis</displayValue> <pathwayDiagram>ptenPathway.svg</pathwayDiagram> </gov.nih.nci.caBIO.bean.Pathway>
HTTP API Direct access to XML-formatted data via URLs: http://cabio.nci.nih.gov/servlet/GetXML? operation=Gene&Symbol=pTEN Method Parameter Value Search Parameter
caBIO Future • MYcaBIO Application • an user interface to the caBIO API • MYcaBIO Kernel • federation of caBIO servers to share information between local data sources and the NCICB caBIO server • Web Service Advertisement • publishing caBIO services
MYcaBIO • Allows researchers to browse the caBIO objects, map their data with caBIO objects, and retrieve the results of the data query without programming expertise • Researchers can upload MS Excel files and obtain query results in MS Excel format
caBIO Kernel mycaBIO Client • Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server • Leverages the JXTA protocol for peer-to-peer communication Proxy NCICB caBIO server Object/DB bridge 5. Queries Persistence layer DSI (Data Source Identifier) 4. Parses query & DSI and authenticates user (in any) . 1. Sends query and user info 8. Returns returns 6.Returns objects to requestor 3. Passes query to NCICB server LocalcaBIOServer 7.Queries data map Datamap 8.Queries Persistence layer Object/DB bridge 2. Parses query & DSI and authenticates user .
Service Registry caBIO and Web Services Locate Publish Service Requestor Service Provider Invoke • caBIO is a “Service Provider” providing web services accessible via SOAP • caBIO services are also described using the Web Services Description Language (WSDL) • caBIO will publish Web Services to a central registry • The Universal Description, Discovery, and Integration (UDDI) registry can be leveraged • Third party applications can publish their services and share information by leveraging caBIO service interfaces identified in the WSDL • Applications can request services by obtaining the appropriate location of the WSDL from the UDDI registry • Applications can invoke the appropriate Web Service identified in the WSDL by issuing a SOAP request
caBIO Clinical Service (J2EE) Clinical TrialsWeb Services Web Server Component Container 2. Invoke Publish Third-Party Clinical Service (J2EE) Clinical Trials Registry Web Server 1. Locate Clinical Trials Application(s) Publish Component Container Publish 2. Invoke Third-Party Clinical Service (.NET) Web Server/ Component Container
bioMOBY and caBIO MOBY Central 2. Results - WSDL Register – MOBY Object 1. Query – Object Type caBIO MOBY Client(s) 3. Transact – URI Query MOBY Server(s)
caBIO Benefits • Provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details • Permits access to allow developers to obtain the information they need from a variety of data sources without data management • Manages the display of large volumes of data to assist in load balancing • Provides an effective mechanism for performing complex queries that rely on diverse data sources • Facilitates information sharing without managing linkages between multiple data sources
Lessons learned • Integration • Data • Framework • Vocabulary • Implement upfront
Supporting Technologies Genome Anatomy, Genetic Variants Animal Models Clinical Research Data Gene Expression Data Laboratory Data Molecular Targets, Pathways
Gene Expression Data Portal (GEDP)http://gedp.nci.nih.gov • The Gene Expression Data Portal (GEDP) allows users to submit, search, and analyze microarray experiments (Affy and Spotted Arrays) • The microarray database was designed based on industry standards (MAGEML)
Experiment Details • Researchers can retrieve experiment details, data sets, and MAGEML documents • The GEDP automatically generates MAGEML from submitted experiments
Powered by caBIO! Microarray Analysis • Researchers can retrieve a list of pathways represented in the selected array • Researchers can view each pathway, the genes on the chip, and view expressed genes and the level of gene expression
caLIMShttp://lpglims.nci.nih.gov • caLIMS is an “Enterprise” Web based system designed to automate workflow in the laboratory • Workflow operations include user and laboratory administration, inventory, project creation, project execution, project results, collaboration, and equipment operation • caLIMS can be extended to interface with a variety of laboratory equipment and analysis tools
Cancer Models Database (CMD) http://cancermodels.nci.nih.gov • The Cancer Models Database allows both intramural and extramural researchers to search and submit mouse models • All models submitted by extramural researchers are curated to ensure data integrity
Model Components • Researchers can search for and submit model information including: • General Information • Genetic Descriptions • Carcinogenic Agents • Publications • Histopathology • Therapeutic Approaches • Cell Lines • Images • Microarray Data (via GEDP interface)
Controlled Vocabulary • A generic vocabulary tree browser provides an interface to NCICB vocabulary • The vocabulary browser was designed to provide ontology browsing and the selection of specific concepts • An API is available to facilitate re-use across NCICB applications
caImage http://caimageportal.nci.nih.gov • The NCICB has developed an image portal to allow researchers to search for mouse and human images and annotations • Human and mouse images and annotations were provided by the MMHCC
Image Annotations • Image annotations are represented as XML • Image annotations may include a detailed description, species, organ, diagnosis, strain, and image dimensions across regions of interest • The NCICB is exploring existing standards and interfacing (DICOM-SR) and interfacing with federated image servers (MIRC) • Future annotation tools may leverage MYcaBIO technologies
Summary Building integrative technologies increases the ‘ROIT’ and facilitates innovation caBIO: http://ncicb.nci.nih.gov/core/caBIO caBIO users list: http://list.nih.gov/archives/cabio_users.html caBIO developers list: http://list.nih.gov/archives/cabio_developers.html
NCICB Kenneth Buetow Peter Covitz Carl Schaefer Robert Clifford Mike Edmonson Frank Hartel Sherri DeCoronado SAIC Scott Gustafson Mike Connolly Joshua Phillips Brian Levine Acknowledgements