430 likes | 598 Views
Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics dpwall@hms.harvard.edu http://wall.hms.harvard.edu. Outline . Rationale & Biological Significance (30 mins) Present status (5 mins)
E N D
Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics dpwall@hms.harvard.edu http://wall.hms.harvard.edu
Outline • Rationale & Biological Significance (30 mins) • Present status (5 mins) • Project Plan (25 mins)
Introduction • Polygenic & Multigenic • Many genes have been linked to autism • Few genes have been replicated in across studies • Difficult for a single researcher to grasp the complexity of the autism gene landscape
StatisticsU.S. number of cases 1992-2006 http://www.fightingautism.org
Behavioral overlap with other disorders Schizophrenia Angelman Autism Epilepsy Fragile X Seizure Disorder Rett Syndrome Mental Retardation Tuberous Sclerosis Others??
Approach • Build the network of all genes implicated in Autism to date • Conduct large comparative analysis of Autism and other neurological disorders at the level of genes, biological processes, and networks • Leverage existing research on Autism-related disorders to find new genetic leads.
Building Gene Lists for All Neurological Disorders (433) Gene Lists Ataxia OMIM NINDS Epilepsy Asperger Fragile X Tourette’s OCD… OCD GeneCards Autism Disease source Disease gene database Gene-Disease sources
Autism Cluster Genes 1100100101… 1110101011… 1001010100… 1001011101… // 1101011101… Disorders Autism Cluster
Network Construction • Data derived from STRING (http://string.embl.de/) • Integration of p-p interaction (interactome), co-expression (transcriptome), orthology (orthologome),text (bibliome), and other lines of evidence. • Focus on creating a networks of possible interactions within a normal cell using classification methods (random forests)
Correlated Expression Sequence coEvolution A B P-P Interaction Random Forest Decision B D1 D2 D3 D4 D5 A D1 D3 = {1,0,2,1,0} D4 D3 Text (aka Bibliome) D2 FXYD1 is identified as aMeCP2target gene whose de-repression may directly contribute to Rett syndrome neuronal pathogenesis D3 D4 D5 D1 D3 Yes No D2 D4 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030043
Networks for all AC disorders FragileX (97N/100E) Hypoxia (586 N/4359E) Microcephaly (135N/166E) Rett (48N/74E) Tuberous Sclerosis (110N/204E) Angelman (51N/57E) Inf. Hypotonia (29N/16E) Mental Retardation (573N/1035E) Hypotonia (154N/208E) Autism (145N/164E) Ataxia (428N/1489E) Spasticity (62N/40E) Seizure Disorder (35N/13E) Asperger (15N/9E) autworks.hms.harvard.edu
Multi-disorder component of autism (MDAG) • 66 out of 127 involved in at least one member of the autism cluster • Highly connected component of the autism network
Significantly enriched MDAG processes Cell Proliferation P = 2.7E-02 CNS Development P = 3.29E-11 Ion Transport P = 7.68E-10 Synaptic Transmission P = 2.45E-04 • Fisher’s exact test • Bonferroni adjustment • 14648 biological processes from Gene Ontology tested
Process-Driven Predictions Putative New Genes Biological Processes Autism Cluster Disorders Fragile X CNS development 64 new genes, all of which occur in 2 or more of the Autism Cluster Disorders Synaptic Transmission Tuberous Sclerosis Ion Transport Cell Proliferation … Seizure Disorder Mental Retardation
Experimental Validation • GEO6575 (from UC Davis M.I.N.D. institute) • White blood cell Affymetrix U133plus2.0 • 17 samples of autistic children without regression • 18 children with regression • 9 children with mental retardation or developmental delay • 12 typically developing children from the general population
Autism without regression (17) Autism with regression (18)
Experimental Validation • GEO6575 (from U.C. Davis M.I.N.D. institute) • White blood cell Affymetrix U133plus2.0 • 17 samples of autistic patients without regression • 18 patients with regression • 9 patients with mental retardation or developmental delay • 12 typically developing children from the general population
Data-driven approach to FDR detection can be ineffective • Standard data-driven application of false discovery rate control yields few genes below FDR threshold of 0.05. (with these data, only 2 genes survive) • This is a frequent circumstance in instances of weak signal and large background noise (e.g. microarray experiments)
Results of process-driven search • 43 Process-derived gene predictions had FDR-adjusted p values <0.05 • Highly significant rate of validation -- 65% of predictions confirmed by expression data
Results of network-driven search • 267 occurred in 1 autism cluster disorder • 58 occurred in 2 • 17 in 3 • 3 in 4 sibling disorders • A total of 345 new predictions
Results of network-driven search • 301 had FDR-adjusted p values <0.05 • 90% (!) of predictions verified by expression data
43 8 10 12 14 Prior knowledge focuses whole-genomic search • 43 Process-derived gene predictions had FDR-adjusted p values <0.05. 65% • 301 Network-derived gene predictions had FDR-adjusted p values <0.05. 90% The rate of validation in both cases is significantly non-random
Top 20 genes occurring in 3 or more Autism Sibling Disorders For many of these candidates, their roles in neurological impairment have been studied in autism cluster disorders, but not in autism.
GO biological process enrichment - cytoskeleton organization - cell communication - cell organization/biogenesis - cell motility Molecular Triangulation Mental Retardation Fragile X Hypotonia Ataxia Hypoxia AR SLC16A2 Microcephaly Rett Syndrome Spasticity Tuberous Sclerosis L1CAM OPHN1 FXN MYO5A SLC6A8 FLNA PAFAH1B1
Conclusions • Previous research has implicated between 100 and 1500 genes as contributors to the molecular physiology of Autism. • Our knowledge-driven approach provides a logical means to filter the genome wide search.
Conclusions • Global “ask” swamped by noisy signal • Informed, knowledge-driven “ask” results in biologically significant gene predictions • Comparative analysis of Autism with related neurological disorders provides a focused search for novel gene candidates
Autworks • Autworks is a web-driven navigation system that allows any researcher to view and search through the network of genes implicated in autism and related neurological disorders • Built to aid and abet the role of serendipity and inspiration for researchers working on autism and other complex neuro diseases. • http://autworks.hms.harvard.edu
The Plan • Bring our analytical strategies and Autworks to the cloud • Beef up underbelly using AWS storage and the Amazon “Turkforce” • Scale up comparative network analysis • Enlarge validation database, verify/re-verify computational predictions, robustify the candidates
Aim 1: Build the neurological disease “gene core” of the Autworks relational database * Can be queried with a disease or gene term
Aim 1: Steps (1) Extract the entire set of neurological disorders listed by NINDS (currently 433) to ensure that we can find any and all commonalities to Autism. (2) Mine all databases in above Table that can be searched using a disease term as the query, specifically the Online Mendelian Inheritance in Man (OMIM), GeneCards, Chromosomal Variation in Man, the Human Gene Mutation Database (HGMD), and SNPedia. (3) Combine and import the features from each of the online resources into a relational database that will become the backend of Autworks, being careful to remove any redundancies. (4) Cross-reference resources to comprehensively populate data model.
Gene-disease data model “Gene Core” This data model will share much in common with Variome project’s database
Medline Medline Medline MeSH Major Topics GeneTagger Candidate gene filtered MeSH term filtered PMID: 17304222 We identified an important component for controlled actin assembly, abelson interacting protein-1 (Abi-1), as a binding partner for the postsynaptic density (PSD) protein ProSAP2/Shank3. During early neuronal development, Abi-1 is localized in neurites and growth cones; at later stages, the protein is enriched in dendritic spines and PSDs… PMID: 17173049 SHANK3 (also known as ProSAP2) regulates the structural organization of dendritic spines and is a binding partner of neuroligins; genes encoding neuroligins are mutated in autism and Aspergersyndrome. Here, we report that a mutation of a single copy of SHANK3 on chromosome 22q13 can result in language and/or social communication disorders... Can we Turkify this process??? Annotator Checks Accuracy through BioNotate system Results: Gene-Gene Gene-Disease Corpora ABI1 Shank3 Shank3 Autism
Network core Interaction Core Ataxia GO Co-Ex Can we “cloud” it up??? Mental Retardation Classifier P-P intx Bibliome Phylo-profiles Autism
Aim 3: comparative network analysis on the cloud • Find disease filtered interacting partners • Find shortest paths btw candidates • Find minimal subnetworks • Verify and reconstruct networks appropriately Autism Schizophrenia
Mental Retardation Genetic Landscape of Autism Rett Syndrome Angelman Syndrome
Zak Kohane Matt Huyck Tom Monaghan Todd DeLuca Nieves Mendizabel Paco Esteban Joaquin Goni Alal Eran Michal Galdzicki Lou Kunkel Alexa McCray Leon Peshkin Acknowledgments