320 likes | 411 Views
Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM). Yu Lin, Norihiro Sakamoto Department of Sociomedical Informatics, Graduate School of Medicine, Kobe University. Agenda. Wh at are Genetic Susceptibility Factors (GSF) ? How do we confirm genetic susceptibility ?
E N D
Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM) Yu Lin, Norihiro Sakamoto Department of Sociomedical Informatics, Graduate School of Medicine, Kobe University
Agenda • What are Genetic Susceptibility Factors (GSF) ? • How do we confirm genetic susceptibility ? • Why do we need an ontology ? • The Ontology of Genetic Susceptibility Factors to Diabetes Mellitus(OGSF-DM) • Methodology • Testing • Discussion InterOntology08
Search “Genetic Susceptibility” in UMLS InterOntology08
If “decrease”, then “resistence” Scope of “GSF to Diabetes Mellitus” Those genetic characteristic and interaction between genetic and environmental factors which increase the probability to develop diabetes mellitus (DM). • polymorphism • linked loci • SNP • haplotype • genotype InterOntology08
Mendelian Diease VS Complex Disease InterOntology08 Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease.Nature. 2005 Jun 2;435(7042):584-9. Review.
How to confirm the GSF • Through combined family-based linkage study and population-based association study • Through a combined genetic (gene-by-gene function-candidate) association approach with a genome-wide association approach • Through combined statistical study with biological function study InterOntology08
Factors Affecting Statistical Powerof Confirming GSF • Number of disease variants • Allele frequencies among population • Effect size on disease phenotype • Odds Ratio (OR) • Population structure and geography • Selection bias • Genotype and phenotype misclassification errors Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118. InterOntology08
Can we settle down this? No Criteria Established • There are no established criteria for confirming GSF (Genetic Susceptibility Factors) • OR1.5-2.0 ? • sample size • population InterOntology08
A Knowledge Base is Needed • The primary idea is to catalog all GSF to Diabetes Mellitus (DM) • The reality of researches on GSF to DM • Different levels of genetic object • Different types of study design • Inconsistent result • Complex phenotypes of DM • Versatile datasets demand a knowledge base on this topic InterOntology08
Ontology in General • Originally from philosophy • An ontology is “specification of a shared conceptualization” [Gruber T.] • Ontology as an approach to “annotation of multiple bodies of data”[Smith B. et al] • Widely used in computer science and information science • artificial intelligence • the Semantic Web • software engineering • biomedical informatics “Gene Ontology as a successful example” • library science • information architecture as a form of knowledge representation Ref: http://en.wikipedia.org/wiki/Ontology_%28computer_science%29 InterOntology08
Ontology is a Good Tool • In our case, ontology can help with: • Knowledge representation • Database design • Content-oriented analysis • Information retrieval and extraction • Information integration • By setting rules, can we establish a criteria to demonstrate either the genetic susceptibility or causality to complex disease? InterOntology08
Agenda • What are the Genetic Susceptibility Factors (GSF) • How do we confirm genetic susceptibility • Why do we need an ontology • The Ontology of Genetic Susceptibility Factors to Diabetes Mellitus(OGSF-DM) • Methodology • Testing • Discussion InterOntology08
specification conceptualization integration The Methodology of OGSF-DM Specify the domain and scope Build the conceptual model Reuse and import other ontologies Implementation, evaluation Protégé 3.3.1, OWL , SWRL rules InterOntology08
Step1. Specification • Domain: Represent the knowledge of GSF to DM and related phenotypes • Explore relevant literature resources: • PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007) • Books: • Joslin’s Diabetes Mellitus • Human Molecular Genetics 3 • The most fundamental terms: • i) Human disease: diabetes mellitus and related disorders; • ii) Phenotypes and observed quantity parameters; • iii) Genetic concepts; • iv) Geographical regions; • v) Disease gene study of the original paper. InterOntology08
Step2. Conceptualization • The core conception generated by analyzing the titles of the corpus • The conception shows an N-ary relationship InterOntology08
The top-level of OGSF-DM • Adopted terms from BFO (Basic Formal Ontology ): Continuant,Occurrent, Independent_Continuant, Dependent_Contiuant , Quality InterOntology08
The position of core concepts InterOntology08
Class hierarchy • Constraints of class CLASS: Observed_Relationship InterOntology08
The term ‘Allele’ is polysemous • Genetics definition: an allele is either one of a pair (or series) of alternative forms of a gene that can occupy the same locus on a particular chromosome, and that control the same character of the phenotype. (http://www.thefreedictionary.com/allele) • “Allele” appeared in different resources: InterOntology08
Allele CLASS in OGSF-DM • An abstraction • Currently, it satisfied the data model • Need to be refined in the future InterOntology08
Gene concept has evolved Gene as a distinct locus Gene as a physical molecule Gene as ORF sequence pattern Gene as … 1910s 1950s 1960s 1970s-1980s 2007- 1860s-1900s 1940s 1990s-2000s Gene as transcribed code Gene as annotated genomic entity Gene as a discrete unit of heredity Gene as a blueprint for a protein Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81. InterOntology08
Some definitions of ‘gene’ • Human Genome Nomenclature Organization:“a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology”(Wain etal. 2002) • Rat Genome Database :“the DNA sequence necessary and sufficient to express the complete complement of functional products derived from a unit of transcription ”(2003) • Sequence Ontology Consortium: “locatable region of genomic sequence,corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions” (Pearson 2006). • ENCODE project Consortium: “The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.”(Gerstein et al.2008) • MeSH : genes are “Specific sequences of nucleotides along a molecule of DNA (or, in the case of some viruses, RNA) which represent functional units of HEREDITY. Most eukaryotic genes contain a set of coding regions (EXONS) that are spliced together in the transcript, after removal of intervening sequence (INTRONS) and are therefore labeled split genes.” InterOntology08
GeneCLASS in OGSF-DM • A place holder • The instance of Gene is the name of the gene which appears in the research paper InterOntology08
Step3. Integration • Importing two ontologies: • ontology of glucose metabolism disorders • A slim OBO files was extracted from Human Disease ontology • OBO file was transfered to OWL file • The class hierarchy was restructure new terms from “Joslin’s Diabetes Mellitus” added • ontology of geographical regions • Generated by hand adopting the terms from MeSH2008 “Geographic Locations[z01]” InterOntology08
Step4. Implementation and Evaluation • Protégé_3.3.1 + OWL • SWRL rule example:hasPopulation-1 Rule isObservedIn (?x, ?y) ∧ hasStudyPopulation(?y, ?z) → hasPopulation(?x, ?z) to infer the population(z) of the Obeserved_Relationship(x) ; y is a Disease_Gene_Study. InterOntology08
The example article • Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134 InterOntology08
Asserting individual 1) 1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_Association ⋂∀ hasSupportingEvidence ( ∋ {odds_ratio_OR_1.49 } ) ⋂∃ isObservedIn ( ∋ {Disease_Genetic_Study_15047632}) ⋂∃ isObservedRelationshipOf ( ∋ {a_3_intronic_SNP_rs3818247}) ⋂∃ isRelationshipWith ( ∋ {Type_2_Diabetes_}) means that a 3’ intronic SNP rs3818247 is associated with Type 2 Diabetes with a supporting evidence of OR 1.49. The relationship is an associated relationship, but is stated to be neither a susceptibility nor a resistance factor in this study. InterOntology08
Asserting individual 2),3),4) 2) odds_ratio_OR_1.49 ⊆ Odds_Ratio ⋂∀ hasOR ( ∋ {1.49} ) ⋂∀ hasCI95 ( ∋ {1.15-1.90} ) ⋂∃ hasP ( ∋ {Corrected_P_0.0252} ⋂ {Uncorrected_P_0.0028} ) ⋂∃ hasClassifiedGroup ( ∋ {Control_Group_1} ⋂ {Case_Group_1} ) 3) Control_Group_1 ⊆ Classified_Group ⋂∃hasPopulationSize ( ∋ {342 int}) ⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population}) 4) Case_Group_1 ⊆ Classified_Group ⋂∃hasPopulationSize ( ∋ {275 int}) ⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population}) 2), 3) and 4) together means that the study conducted a case-control study(case size =275 and control size = 342) in an Ashkenazai Jewish population. Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected P = 0.0252, uncorrected P = 0.0028). InterOntology08
Asserting individual 5) 5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_Study ⋂∀ hasPubMedID ( ∋ {PMID_15047632} ⋂∃ hasStudyPopulation ( ∋ {an_ashkenazi_jewish_population}) ⋂∀ hasURI ( ∋ {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134}) 6) an_ashkenazi_jewish_population ⊆ Population_Group ⋂∃hasPopulationCharacteristic (∋ {Jews} ) ⋂∃hasGeographicalSite ((∋ {Israel} ⋂ {U.S.} ) 5) and 6) means : • An Ashkenazi Jewish population was investigated in this study; • The population belongs to Jews ethinic group and located in Israel and U.S. ; • the PubMedID and URL of this paper were collected. InterOntology08
The core conception • Put 1)-5) together, the core conception of this one relationship is built: relationships { associated } between the { 3_intronic_SNP_rs3818247} and {Type_2_Diabetes} observed in a { an_ashkenazi_jewish_population } from a study { PMID_15047632}. InterOntology08
Representation of a SNP a_3_intronic_SNP_rs3818247 ⊆ htSNP ⋂∃ hasAlleleComponent ( ∋ {DNA_Level_Allele_T} ⋂ { DNA_Level_Allele_G}) ⋂∃ hasGenomeSite ( ∋ {flanking_3_intronic}) ⋂∃ isGeneticVariantOf ( ∋ {hepatocyte_nuclear_factor-4_alpha}) ⋂∃ hasVariantDatabase ( ∋ {HGVBase_SNP002310533} ⋂{dbSNP_rs3818247}) This means that the 3’ intronic SNP rs3818247 is a htSNP of hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic sequence of the gene. The alleles of this SNP are T/G in DNA level. Reference databases entry : 1) HGVBase : “SNP002310533” 2) dbSNP : “rs3818247” InterOntology08
Discussion • A hybrid of middle-out and top-down approach was conducted to build our ontology. • BFO is important for harmonizing the domain ontologies in our case. • The ontology can apply to other complex diseases too. • We anticipate the further application of this ontology: • Information retrieval • Knowledge base development • Logic rules establishing • Mapping or link to other ontologies, such as GO, Mammalian Phenotype, and so on. InterOntology08