1 / 41

Core 2: Bioinformatics

Core 2: Bioinformatics. CBio-Berkeley. Outline. Berkeley group background Core 2 first round what: aims, milestones how: software lifecycle, interaction w/ other cores Current progress Discussion. Berkeley group: genomics. Formerly BDGP (Berkeley Drosophila Genome Project) Informatics

lea
Download Presentation

Core 2: Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Core 2: Bioinformatics CBio-Berkeley

  2. Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress • Discussion

  3. Berkeley group: genomics • Formerly BDGP (Berkeley Drosophila Genome Project) Informatics • Genome sequencing, analysis and annotation • Genomic application development • Database development • FlyBase • Generic Model Organism Database

  4. Apollo

  5. GBrowse

  6. In-situ expression database

  7. Genomics applications • GadFly • analysis and annotation database • pipeline software • BOP • computational analysis integration • CGL • Comparative Genomics Software Library

  8. SO and SOFA • Sequence Ontology for Feature Annotation • Ontology for genomics • Sequence feature classes: • mRNA, intron, UTR, sequence_variant, … • Sequence feature relations • exonpart_oftranscript • polypeptidederives_frommRNA

  9. Chado • Model organism relational database schema • FlyBase, GMOD • Modules • sequence annotations • expression • map • genotype • phenotype • ontology/cv • … • Generic schema • Uses ontologies for strong typing

  10. Berkeley group: GO • Gene Ontology - Informatics • Database, web portal • Ontology editing tools • Ontology QC and integration • OBO

  11. OBO-Edit (formerly DAG-Edit)

  12. AmiGO and GO Database

  13. Obol • Problem: large ontologies of composite terms are difficult to manage • Solution: partial automation (reasoners) • Requires logical definitions • how do we obtain them? • Solution: Obol • Parses logical definitions from class names • Logical definitions can be reasoned over • detect errors and automation • Integrates OBO ontologies

  14. OBO Relations Ontology • Common relations used across ontologies must mean the same thing • is_a • part_of • derives_from • has_participant • … • OBO relations ontology provides precise definitions • defines class-level relations in terms of their instances • http://obo.sourceforge.net/relationship • collaboration with core5, Manchester & others

  15. Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress • Open questions

  16. Core 2 specific aims • Aims • Capture and describe data • Reconcile annotation and ontology changes • Store, view and compare annotations • Link disease genes • First round • phenotypes: Fly and Zebrafish • HIV clinical trial data

  17. Aim 1: Capture and describe data • Phenotype data capture • OBO-Edit plug-ins • Combine classes from multiple ontologies • PATO, anatomical ontologies • NLP tools? • Clinical trial data capture • what are the appropriate tools?

  18. Aim 1: Capture and describe data • Zebrafish, fly • PaTO: Phenotype and trait ontology • phenotype ‘primitives’ • ‘Entity-Attribute-Value’ model • Phenotype ontologies • Genetic data • Orthologs • Clinical trial data • generic instance model • what are the appropriate ontologies here?

  19. PATO • An ontology of attributes and attribute values • e.g. morphology, structure, placement • Current status of PATO? • needs work to conform to sound ontology principles • definitions • formalisation of attributes • working with core3-cambridge (Gkoutos) and core5 (Neuhaus)

  20. Phenotype annotation • Entity-attribute structured annotations • Entity term; PATO term • brain FBbt:00005095;fusedPATO:0000642 • gut MA:0000917;dysplasticPATO:0000640 • tail fin ZDB:020702-16;ventralizedPATO:0000636 • kidney ZDB:020702-16;hypertrophiedPATO:0000636 • midface ZDB:020702-16;hypoplasticPATO:0000636 • Pre-composed phenotype terms • Mammalian Phenotype Ontology • “increased activated B-cell number” MPO:0000319 • “pink fur hue” MPO:0000374

  21. Example (Fly) Gene: Jra Allele: Jra[bZIP.Scer\UAS] Allele Description: defects in head and dorsal cuticle. Scer\GAL4[hs.PB] induces….. A481G bZIP

  22. Genotype-Phenotype datamodel • Need to model complex genotypes • Environment • Phenotype • E-A-V is not enough • Relational attributes • Complex phenotypes • Measurements and assays • CSHL 2005 Phenotype meeting

  23. Aim 2: Reconcile annotation and ontology changes • Ontology evolution can trigger annotation changes • Identifiers • all classes and annotations will have stable identifiers • Cores 1 and 2 to decide on identifier model • LSID URNs • OntoTrack

  24. Aim 3: Store, view and compare annotations • OBO: ontologies • OBD: data annotated using ontologies • genotype-phenotype • clinical trials • others

  25. OBD: A Database for OBO • Data warehouse • collected from MODs and other sources • Annotation versioning • Generic data model • Any data typed by OBO classes can be stored • Specific annotation data views • Clinical trial data view • Phenotype data view • Chado-compliant • Entity-attribute-(value) model

  26. Key technologies • ‘Semantic Web’ database technology • ontology-aware • ontologies are part of meta-model • higher level query languages • SPARQL, SeRQL, … • tool interoperability • Protégé-OWL, Jena, .. • SQL compatibility • optionally layered on relational model • Standards? Maturity? • Many implementations • Sesame, Kowari,

  27. Aim 3: Store, viewand compare annotations • Browsing • AmiGO-2 • Advanced visualization • work with core 1 (University of Victoria)

  28. Comparing annotations • process vs state • regulatory processes: • acidification of midgut has_quality reduced rate • midgut has_quality low acidity • development vs behavior • wing development has_quality abnormal • flight has_quality intermittent • granularity (scale) • chemical vs molecular vs cell vs tissue vs anatomical part

  29. Integrating anatomical ontologies • Annotations should be comparable between species • phenotype annotations are composed of anatomical terms • Multiple species-centric anatomical ontologies • Problem: how do we compare across species? • XSPAN (Bard et al): creating mappings • Core 1: ontology mappings

  30. Aim 4: Linking disease genes • Homology data • Orthologous genes • Genomic data • SNPs, sequence variants • Ontologies • Disease ontologies • Semantic similarity • Ontology integration • Obol, XSPAN

  31. Linking disease to phenotype • Relationship of phenotype to diseases and disorders • essentialist • statistical • Disease ontologies • OBO disease ontology (Northwestern) • EVOC disease ontology (EVOC) • Others • Disease ontology workshop (core 5) • November 2006

  32. Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress • Open questions

  33. Software lifecycle • Software is developed in phases • Different phases require interaction with different cores • Iterative “Agile” methodology • fast cycles • involve ‘customer’ (core3) at all phases

  34. Outline • Berkeley group background • Core 2 first round • what: aims, milestones • how: software lifecycle, interaction w/ other cores • Current progress

  35. Current progress • Meetings • CSHL November 2005 • Phenotype ontology meeting • Phenotype tools workshop • Berkeley, UVic, Core 3 • OBO-Edit complex class plug-in • Phenotype browser prototype • Genotype-Phenotype datamodel

  36. OBO-Edit complex class plug-in • Combinatorial composition of classes • Current use-cases: • plant anatomical structures • integrating GO and OBO-Cell • Ideal for phenotype classes • extend to make ‘phenotype’ plug-in

  37. OBD Progress • Genotype-Phenotype data model defined • Prototype implemented • evaulating technologies

  38. Phenotype browser • Experimental branch of AmiGO code • Allows browsing and querying of combinatorial phenotype annotations • Experimental dataset • Demo • http://yuri.lbl.gov/amigo/obd

More Related