420 likes | 576 Views
Core 2: Bioinformatics. CBio-Berkeley. Outline. Berkeley group background Core 2 first round what: aims, milestones how: software lifecycle, interaction w/ other cores Current progress Discussion. Berkeley group: genomics. Formerly BDGP (Berkeley Drosophila Genome Project) Informatics
E N D
Core 2: Bioinformatics CBio-Berkeley
Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress • Discussion
Berkeley group: genomics • Formerly BDGP (Berkeley Drosophila Genome Project) Informatics • Genome sequencing, analysis and annotation • Genomic application development • Database development • FlyBase • Generic Model Organism Database
Genomics applications • GadFly • analysis and annotation database • pipeline software • BOP • computational analysis integration • CGL • Comparative Genomics Software Library
SO and SOFA • Sequence Ontology for Feature Annotation • Ontology for genomics • Sequence feature classes: • mRNA, intron, UTR, sequence_variant, … • Sequence feature relations • exonpart_oftranscript • polypeptidederives_frommRNA
Chado • Model organism relational database schema • FlyBase, GMOD • Modules • sequence annotations • expression • map • genotype • phenotype • ontology/cv • … • Generic schema • Uses ontologies for strong typing
Berkeley group: GO • Gene Ontology - Informatics • Database, web portal • Ontology editing tools • Ontology QC and integration • OBO
Obol • Problem: large ontologies of composite terms are difficult to manage • Solution: partial automation (reasoners) • Requires logical definitions • how do we obtain them? • Solution: Obol • Parses logical definitions from class names • Logical definitions can be reasoned over • detect errors and automation • Integrates OBO ontologies
OBO Relations Ontology • Common relations used across ontologies must mean the same thing • is_a • part_of • derives_from • has_participant • … • OBO relations ontology provides precise definitions • defines class-level relations in terms of their instances • http://obo.sourceforge.net/relationship • collaboration with core5, Manchester & others
Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress • Open questions
Core 2 specific aims • Aims • Capture and describe data • Reconcile annotation and ontology changes • Store, view and compare annotations • Link disease genes • First round • phenotypes: Fly and Zebrafish • HIV clinical trial data
Aim 1: Capture and describe data • Phenotype data capture • OBO-Edit plug-ins • Combine classes from multiple ontologies • PATO, anatomical ontologies • NLP tools? • Clinical trial data capture • what are the appropriate tools?
Aim 1: Capture and describe data • Zebrafish, fly • PaTO: Phenotype and trait ontology • phenotype ‘primitives’ • ‘Entity-Attribute-Value’ model • Phenotype ontologies • Genetic data • Orthologs • Clinical trial data • generic instance model • what are the appropriate ontologies here?
PATO • An ontology of attributes and attribute values • e.g. morphology, structure, placement • Current status of PATO? • needs work to conform to sound ontology principles • definitions • formalisation of attributes • working with core3-cambridge (Gkoutos) and core5 (Neuhaus)
Phenotype annotation • Entity-attribute structured annotations • Entity term; PATO term • brain FBbt:00005095;fusedPATO:0000642 • gut MA:0000917;dysplasticPATO:0000640 • tail fin ZDB:020702-16;ventralizedPATO:0000636 • kidney ZDB:020702-16;hypertrophiedPATO:0000636 • midface ZDB:020702-16;hypoplasticPATO:0000636 • Pre-composed phenotype terms • Mammalian Phenotype Ontology • “increased activated B-cell number” MPO:0000319 • “pink fur hue” MPO:0000374
Example (Fly) Gene: Jra Allele: Jra[bZIP.Scer\UAS] Allele Description: defects in head and dorsal cuticle. Scer\GAL4[hs.PB] induces….. A481G bZIP
Genotype-Phenotype datamodel • Need to model complex genotypes • Environment • Phenotype • E-A-V is not enough • Relational attributes • Complex phenotypes • Measurements and assays • CSHL 2005 Phenotype meeting
Aim 2: Reconcile annotation and ontology changes • Ontology evolution can trigger annotation changes • Identifiers • all classes and annotations will have stable identifiers • Cores 1 and 2 to decide on identifier model • LSID URNs • OntoTrack
Aim 3: Store, view and compare annotations • OBO: ontologies • OBD: data annotated using ontologies • genotype-phenotype • clinical trials • others
OBD: A Database for OBO • Data warehouse • collected from MODs and other sources • Annotation versioning • Generic data model • Any data typed by OBO classes can be stored • Specific annotation data views • Clinical trial data view • Phenotype data view • Chado-compliant • Entity-attribute-(value) model
Key technologies • ‘Semantic Web’ database technology • ontology-aware • ontologies are part of meta-model • higher level query languages • SPARQL, SeRQL, … • tool interoperability • Protégé-OWL, Jena, .. • SQL compatibility • optionally layered on relational model • Standards? Maturity? • Many implementations • Sesame, Kowari,
Aim 3: Store, viewand compare annotations • Browsing • AmiGO-2 • Advanced visualization • work with core 1 (University of Victoria)
Comparing annotations • process vs state • regulatory processes: • acidification of midgut has_quality reduced rate • midgut has_quality low acidity • development vs behavior • wing development has_quality abnormal • flight has_quality intermittent • granularity (scale) • chemical vs molecular vs cell vs tissue vs anatomical part
Integrating anatomical ontologies • Annotations should be comparable between species • phenotype annotations are composed of anatomical terms • Multiple species-centric anatomical ontologies • Problem: how do we compare across species? • XSPAN (Bard et al): creating mappings • Core 1: ontology mappings
Aim 4: Linking disease genes • Homology data • Orthologous genes • Genomic data • SNPs, sequence variants • Ontologies • Disease ontologies • Semantic similarity • Ontology integration • Obol, XSPAN
Linking disease to phenotype • Relationship of phenotype to diseases and disorders • essentialist • statistical • Disease ontologies • OBO disease ontology (Northwestern) • EVOC disease ontology (EVOC) • Others • Disease ontology workshop (core 5) • November 2006
Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress • Open questions
Software lifecycle • Software is developed in phases • Different phases require interaction with different cores • Iterative “Agile” methodology • fast cycles • involve ‘customer’ (core3) at all phases
Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress
Current progress • Meetings • CSHL November 2005 • Phenotype ontology meeting • Phenotype tools workshop • Berkeley, UVic, Core 3 • OBO-Edit complex class plug-in • Phenotype browser prototype • Genotype-Phenotype datamodel
OBO-Edit complex class plug-in • Combinatorial composition of classes • Current use-cases: • plant anatomical structures • integrating GO and OBO-Cell • Ideal for phenotype classes • extend to make ‘phenotype’ plug-in
OBD Progress • Genotype-Phenotype data model defined • Prototype implemented • evaulating technologies
Phenotype browser • Experimental branch of AmiGO code • Allows browsing and querying of combinatorial phenotype annotations • Experimental dataset • Demo • http://yuri.lbl.gov/amigo/obd