290 likes | 447 Views
Cancer Bioinformatics Infrastructure Objects. An open-source architecture for Biomedical Informatics. Peter A. Covitz Himanso Sahni Scott Gustafson. NCI Center for Bioinformatics. Established in early 2001 by Kenneth Buetow from the NCI LPG
E N D
Cancer Bioinformatics Infrastructure Objects An open-source architecture for Biomedical Informatics Peter A. Covitz Himanso Sahni Scott Gustafson
NCI Center for Bioinformatics • Established in early 2001 by Kenneth Buetow from the NCI LPG • Folded in several existing programs, launched new ones • Mission to bridge basic and clinical research via bio- medical- informatics
NCICB-supported initiatives • Cancer Genome Anatomy Project • ESTs, SAGE, SNPs, Pathways • Director’s Challenge, Molecular Signatures of Cancer • Microarray • Mouse Models of Human Cancer Consortium • Pathology, microarray • Clinical Trials • Protocols, agents, targets, patient cases
caCORE • Core Infrastructure for biomedical application development • NCI Enterprise Vocabulary Services Thesauri and Ontologies • caBIO Object-modeled middleware and APIs • Cancer Data Standards Repository (caDSR) Metadata
caBIO Overview • caBIO modeled in UML and implemented in J2EE • Genomics, Genetics • Clinical Trials • Access to a variety of data sources hosted at NCICB
Development Process: UML • Use Cases • Class Diagrams • Sequence Diagrams • Iterative Development
Java Packages • gov.nih.nci.caBIO.bean • 45 domain objects • gov.nih.nci.caBIO.util.das • 21 domain objects • Used JAXB to convert DAS DTDs to objects • gov.nih.nci.caBIO.evs • 3 domain objects • gov.nih.nci.caBIO.servlet • gov.nih.nci.caBIO.util • gov.nih.nci.caBIO.webservices
Architecture Clients Presentation Layer Data Sources Object Layer Web Server Servlet Container Databases JSPs HTML/HTTP Data Access Objects Servlets Object Managers Browsers SOAP Engine JDBC XML/ HTTP RMI Other Apps HTTP UI Bean Domain Objects SOAP XML Builder URLs Chromosomes Genes XSLT Engine FTP Tissues Clusters Agents RDF Flat Files Libraries Sequences DTDs XML Docs Diseases XSL Style Sheet Java Apps Other
Data Sources - NCI • Cancer Genome Anatomy Project • EST and SAGE library and expression data • Sequence trace files and verified clones • SNPs • Clinical Trials • Drug Agents • Trials • NCI Enterprise Vocabulary Services • Microarray data via NCI Gene Expression Data Portal
Data Sources - External • Unigene (NCBI) • LocusLink (NCBI) • Homologene (NCBI) • Gene Ontology • Human Genome via DAS (UCSC) • BioCarta Pathways • GIFs re-processed into SVGs to support interactivity
caBIO Powers Applications http://cmap.nci.nih.gov
caBIO APIs • Java • NCICB-hosted data sources via RMI • SOAP • SOAP client in any language/environment can send request to NCICB server for object data • SOAP-XML envelope and payload returned • HTTP-URL • Properly formed URLs in any web browser/client can retrieve XML-formatted object data directly
Java API Domain objects have companion SearchCriteria objects Gene myGene = new Gene(); GeneSearchCriteria criteria = new GeneSearchCriteria(); criteria.setSymbol("pTEN"); SearchResult result = myGene.search(criteria); Gene[] genes = (Gene[]) result.getResultSet();
Nested SearchCriteria • SearchCriteria from one object type can be fed as parameters into SearchCriteria of another type. • Complex queries without any SQL
Complex Queries on Objects Find me the Pathways, with Genes that are expressed, in tissues with a particular Histopathology that includes a particular Organ and a particular Disease.
Traverse Relationships in Model INPUT Disease Histopathology Genes Organ Pathways OUTPUT
findPathway Input disease, organ; create SearchCriteria Objects: public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();
findPathway Nest the SearchCriteria, then do the search: . diseaseCriteria.setName(disease); organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND); histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND); Pathway myPathway = new Pathway(); myPathwayList = myPathway.searchPathways(pathCriteria); }
SOAP API Perl Example use SOAP::Lite; $s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http:caBIO.nci.nih.gov:5080:/soap/servlet/rpcrouter"); my %searchCriteria=(); $searchCriteria{symbol}=“pTEN”; $som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria)); $xmldoc = $som->result;
SOAP output with xlinks <?xml version="1.0" encoding="UTF-8" ?> <nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs> <Pathwayxlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt><endsAt>3</endsAt> </searchResult> </nci-core>
SOAP with returnHeavyXML Data is now returned in full. Here is a Pathway object snippet: <gov.nih.nci.caBIO.bean.Pathway id="92"> <name>ptenPathway</name> <displayValue>PTEN Dependent Cell Cycle Arrest and Apoptosis</displayValue> <pathwayDiagram>ptenPathway.svg</pathwayDiagram> </gov.nih.nci.caBIO.bean.Pathway>
HTTP API Direct access to XML-formatted data via URLs: http://lpgprot101.nci.nih.gov:5080/CORE/GetXML ?operation=Gene&Symbol=pTEN Method Parameter Value Search Parameter
Status and Future • caBIO/caCORE 1.0, late August 2002 • Updated caBIO.JAR and examples • Technical Guide • Stable data servers • Next Release • More domain objects, standard data sources (especially proteins) • Adaptor to add local data sources • More Web Services protocols (WSDL?) • Whatever you guys contribute!
caBIO Team NCI Center for Bioinformatics NCI Laboratory of Population Genetics Science Applications International Corporation (SAIC)
NCI Kenneth Buetow Carl Schaefer Robert Clifford Mike Edmonson SAIC Himanso Sahni Scott Gustafson Sharon Settnek Mike Connolly Brian Levine Acknowledgements
Info, and mailing lists • caCORE http://ncicb.nci.nih.gov/core • caBIO users list http://list.nih.gov/archives/cabio_users.html • caBIO developers list http://list.nih.gov/archives/cabio_developers.html