200 likes | 304 Views
Extraction and Evaluation of Transcription Factor Gene-Disease Association. Warren Cheung Wyeth Wasserman Francis Ouellette. Purpose. Quantitatively Integrate Literature Evidence to Predict Gene-Disease Associations Transcription Factor Genes Brain Diseases. Thesis Outline.
E N D
Extraction and Evaluation of Transcription Factor Gene-Disease Association Warren Cheung Wyeth Wasserman Francis Ouellette
Purpose • Quantitatively Integrate Literature Evidence to Predict Gene-Disease Associations • Transcription Factor Genes • Brain Diseases
Thesis Outline • Gene-Disease Associations • Properties of Gene-Disease Associations • Clusters of Genes Related to Disease
Existing Methods • Machine Learning on Sequence Data • DGP: Properties of Disease Genes • Annotations • G2D: MeSH and GO links • Text Mining • CAESAR: Key terms from “expert” text • Integrating Multiple Methods • Endeavor
Goals • Gene-Disease Associations • Mechanisms and Processes involved • Integrate diverse data sources • Quantitative manner • Verifiable supporting evidence • Transparent view of supporting data • User verification and further analysis • Validate results
Core Entities and Example Data Sources • Genes • Entrez Gene • Evidence • PubMed • Disease • MeSH terms
Example Relationships • Gene-Evidence • GeneRIFs • Evidence-Disease • MeSH annotation • Evidence-Evidence • Related Articles
Other Data Sources • Other Annotations • Protein-Protein Interaction • Pathways • Protein Domains • Homology • Annotation in other organisms • Mouse orthologue
Example Gene PubMed Article Disease GeneRIF MeSH Gene PubMed Article PubMed Article Disease GeneRIF Related Article MeSH Gene Gene PubMed Article Disease GeneRIF Interaction MeSH
Scoring • Overrepresentation of terms • Hypergeometric distribution • “Selected” Articles • Gene+GeneRIF • Gene+GeneRIF+Related Article • Gene+Interaction+Related Article
Integrating Scores • Arbitrary Scoring Methods • Average, Product • Combining P-values • Fisher’s Meta-analysis • Z-transform • Weighting • Confidence
Multiple Testing Correction • Testing gene against all possible diseases • Controlling Type I Error • Bonferroni correction
Validation • OMIM • Known disease-gene associations, with literature references • Predictive Performance • Results when using databases saved on date X • Compare with new gene-disease associations discovered after date X
Sensitivity • Ratio • Number of True Positives Identified • Number of Actual True Positives • Only True Positives are known for certain
Beyond Gene-Disease Associations • Properties involved in Gene-Disease Associations • Pathways • Mechanisms • Cluster genes based on disease association
Conclusion • Extract Gene-Disease Associations • Mechanisms • Processes • Quantitative Analysis • Better Understanding of how Genes affect the human condition