1 / 18

Comparative Data Analysis Ontology (CDAO)

CDAO formalizes evolutionary biology knowledge through an ontology for comparative data analysis, facilitating format conversions and automated reasoning. Developed by Prosdocimi, Chisham, Thompson, Pontelli, and Stoltzfus.

rjosue
Download Presentation

Comparative Data Analysis Ontology (CDAO)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Data Analysis Ontology (CDAO) Francisco Prosdocimi, Brandon Chisham, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus

  2. Objectives • Develop a framework to formalize knowledge in the evolutionary biology domain • Formalize an ontology for comparative data analysis • Comparative Data Analysis Ontology (CDAO) • Implement and Evaluate the ontology

  3. Motivation • Interoperation • Ontologies formalize knowledge • Overcome ambiguities in data formats (e.g., the multiple interpretations of NEXUS) • Facilitate provably correct format conversions • Reasoning • Beyond relational queries • Automated generation of format converters • Advanced reasoning required for workflow constructions and validation • Miscellaneous • Guide development of new data formats • Lingua franca for knowledge exchange • …

  4. Development Process

  5. Structure of CDAO • Current Focus: • Taxonomic units • Tree-like networks of relationships • Models of evolutionary changes

  6. Structure of CDAO • Core Components • Representation of Networks and Trees(e.g., NEXUS TREE Block) • Representation of Character Data(e.g., NEXUS CHARACTERS Block) • Imported Components • Amino Acid Ontology • http://www.co-ode.org/ontologies/amino-acid • U. Manchester, 2006 • Nucleotide Ontology • http://www.co-ode.org/ontologies/basic-bio/

  7. CDAO: Core Components • Network/Tree representation • Rooted and Unrooted Trees • Nodes • Edges • Sets of Nodes topology Child Node node rootedtree part_of directededge hasancestor is_a hasdescendant node network is_a Parent Node is_a part_of Unrootedtree edge Represents TU node part_of has_annotation has_annotation haselement mrca_of Annotation: Tree Procedure, Model… Annotation: Transformation,Length… set of nodes is_a lineage

  8. CDAO: Core Components • Representation of a Directed Tree a) D C B E A has_descendantmin 2 Nodes Lineage Subtree MRCA_Node has_child_Node Directed edge or branch EdgeTransformation has_parent_Node Character Ancestor state, Derived state… has_root_node Edge Node Node (Ancestral) Edge Transformation Rooted_tree

  9. CDAO: Core Components • Annotations • Edge Annotations • Length • Transformation • Model Description • Gap Cost • Substitution Model • TU Annotation • Taxonomic Link • Tree Annotation • Tree Procedure EdgeAnnotation transform_character has_left_state has_left_node character state transformation character state has_right_node has_right_state

  10. CDAO: Core Components • Character State Data Matrix • Character • Taxonomic Units • Datum • State Character State Data Matrix has annotation Annotation: Alignment procedures… character statedata matrix part_of part_of Annotation:TAXID, DB-XREF… hasannotation belongs_to taxonomic unit has datum character has datum character state datum belongs_to has represented by node has coordinate character state is_a is_a is_a belongs_to is_a compound aminoacid discrete coordinatesystem is_transformation_of is_a nucleotide continuous

  11. Implementation Details • Formalization • OWL 1.1 • Tools • Protégé 4 [edit] • Swoop 2.3 [validation] • C++ and Perl+Prolog translators • Swoop 2.3 [reasoning] • Pellet [reasoning] • Fact++ [reasoning]

  12. Preliminary Evaluation • We are reaching the stage where concrete evaluation is possible • NEXUS converters • We stumbled on several blocks • A good formalization of CDAO requires sophisticated features (OWL 1.1) • The majority of reasoning engines has not reached OWL 1.1 yet (even if they claim so…)

  13. Some Examples • Simple NEXUS file #NEXUS BEGIN TAXA; DIMENSIONS ntax=10; TAXLABELS Arabidopsis_thaliana_AAD31363.1 Arabidopsis_thaliana_CAB79970.1 Oryza_sativa_BAB21282.1 Dictyostelium_discoideum_AAO51107.1 Caenorhabditis_elegans_CAA92686.1 Drosophila_melanogaster_AAF55117.1 Drosophila_melanogaster_AAF55115.1 Mus_musculus_BAB61955.1 Saccharomyces_cerevisiae_AAB68881.1 Schizosaccharomyces_pombe_CAB16373.1; END; BEGIN CHARACTERS; TITLE dna; LINK taxa=PF00137_47; DIMENSIONS nchar=10; FORMAT datatype=dna gap=- missing=?; MATRIX Arabidopsis_thaliana_CAB79970.1 gtgtggttgc Schizosaccharomyces_pombe_CAB16373.1 tgtatatgct Drosophila_melanogaster_AAF55117.1 tgtacttcgt Arabidopsis_thaliana_AAD31363.1 gt---gtggc Oryza_sativa_BAB21282.1 ct-------- Saccharomyces_cerevisiae_AAB68881.1 tgtacaagct Mus_musculus_BAB61955.1 tctgctacac Dictyostelium_discoideum_AAO51107.1 cacttactcc Caenorhabditis_elegans_CAA92686.1 tgttttacat Drosophila_melanogaster_AAF55115.1 ac------g- ; END; BEGIN TREES; TREE con_50_majrule = (((Arabidopsis_thaliana_AAD31363.1:0.004496,Arabidopsis_thaliana_CAB79970.1:0.009539)inode15:0.090479,Oryza_sativa_BAB21282.1:0.043596)inode14:0.219708,(Dictyostelium_discoideum_AAO51107.1:0.341768,(((Caenorhabditis_elegans_CAA92686.1:0.308884,(Drosophila_melanogaster_AAF55117.1:0.128132,Drosophila_melanogaster_AAF55115.1:0.384443)inode20:0.236060)inode19:0.093887,Mus_musculus_BAB61955.1:0.243982)inode18:0.150844,(Saccharomyces_cerevisiae_AAB68881.1:0.235101,Schizosaccharomyces_pombe_CAB16373.1:0.261646)inode21:0.225955)inode17:0.189073)inode16:0.127974)root; END;

  14. Some Examples • Node: <cdao:Noderdf:ID="node_inode15"> <cdao:part_ofrdf:resource="#Tree"/> <cdao:belongs_to_Edgerdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edgerdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15" /> <cdao:belongs_to_Edgerdf:resource="#edge_Arabidopsis_thaliana_AAD31363_1_inode15" /> <cdao:belongs_to_Edge_as_Childrdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge_as_Parentrdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15" /> <cdao:belongs_to_Edge_as_Parentrdf:resource="#edge_Arabidopsis_thaliana_AAD31363_1_inode15" /> <cdao:nca_node_ofrdf:resource="#set_nca_44"/> </cdao:Node> • Directed_Edge: <cdao:Directed_Edgerdf:ID="edge_Arabidopsis_thaliana_CAB79970_1_inode15"> <cdao:part_ofrdf:resource="#Tree"/> <cdao:has_Parent_Noderdf:resource="#node_inode15"/> <cdao:has_Child_Noderdf:resource="#node_Arabidopsis_thaliana_CAB79970_1"/> <cdao:has_Annotationrdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15_length"/> </cdao:Directed_Edge> <cdao:Edge_Lengthrdf:ID="edge_Arabidopsis_thaliana_CAB79970_1_inode15_length"> <cdao:has_Valuerdf:datatype="&xsd;float"> 0.009539 </cdao:has_Value> </cdao:Edge_Length>

  15. Some Examples • TU <cdao:TUrdf:ID="Caenorhabditis_elegans_CAA92686_1"> <cdao:belongs_to_Character_State_Data_Matrixrdf:resource="#Matrix"/> <cdao:represented_by_Noderdf:resource="#node_Caenorhabditis_elegans_CAA92686_1"/> <cdao:has_Nucleotide_Datumrdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_0"/> <cdao:has_Nucleotide_Datumrdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_1"/> <cdao:has_Nucleotide_Datumrdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_2"/> … </cdao:TU> • Character <cdao:Nucleotide_Characterrdf:ID="char_2"> <cdao:belongs_to_Character_State_Data_Matrixrdf:resource="#Matrix"/> <cdao:has_Nucleotide_Datumrdf:resource="#datum_Oryza_sativa_BAB21282_1_char_2"/> <cdao:has_Nucleotide_Datumrdf:resource="#datum_Arabidopsis_thaliana_CAB79970_1_char_2"/> <cdao:has_Nucleotide_Datumrdf:resource="#datum_Mus_musculus_BAB61955_1_char_2"/> … </cdao:Nucleotide_Character> • Datum <cdao:Nucleotide_State_Datumrdf:ID="datum_Caenorhabditis_elegans_CAA92686_1_char_6"> <cdao:belongs_to_Characterrdf:resource="#char_6"/> <cdao:belongs_to_TUrdf:resource="#Caenorhabditis_elegans_CAA92686_1"/> <cdao:has_Nucleotide_Staterdf:resource="#value_a"/> </cdao:Nucleotide_State_Datum> • State <cdao:Nucleotiderdf:ID="value_a"> <owl:sameAsrdf:resource="#dA"/> </cdao:Nucleotide>

  16. Simple Reasoning Tasks • Determine what TUs contain a gap in their tables: [Fact++] (has_Datum some (has_State value gap)) and TU • Determine the ancestors of a TU in the tree: has_Descendant value node_Drosophila_melanogaster_AAF55115_1

  17. Simple Reasoning Tasks • Extract the row of a specific TU: SELECT ?z,?yWHERE (base:Arabidopsis_thaliana_AAD31363_1>, cdao:has_Datum, ?x) (?x, cdao:has_State, ?y) (?x, cdao:belongs_to_Character, ?z)USING base FOR <file:/C:/Users/epontell/Documents/Research/Proposals/NEXUS/Research/Perl/inst_matrix.owl#>,cdao FOR <http://www.cs.nmsu.edu/~epontell/CURRENT_matrix.owl#>

  18. Future Work • To facilitate evaluation • Create an OWL 1.0 edition of the ontology (and corresponding NEXUS translator) • Java-level reasoning • Aggregation • Etc. • Large scale NEXUS validation • NeXML Interface • OBO distribution

More Related