900 likes | 1.27k Views
Computational tools for disease gene identification. Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis. Summary. How could we identify genes involved in human disorders? Positional cloning in the pre-genomic era.
E N D
Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis
Summary • How could we identify genes involved in human disorders? • Positional cloning in the pre-genomic era. • Monogenic/multifactorial diseases. • Computational tools: Positional cloning in the post genomic era.
Monogenic versus Complex Diseases : Genes & Environment Environmental Effect Genetic Component Hemophilia Cystic Fibrosis Stroke Asthma Lung Cancer Skin Cancer Alzheimer’s Cardiovascular Disease Motor Vehicle Accident Schizophrenia Familial Colon or Breast Cancer Type 2 Diabetes Bipolar Disorder S.K. Brahmachari, GENOMED-HEALTH meeting
What could we learn from disease gene identification? • Better understanding of the underlying biology of the trait in question • Serve as direct targets for better treatments • Pharmacogenetics • Interventions • Predictions of susceptibility to the disease • Predictions of the course of the disease • Knowledge for treatment or prevention
“SIMPLE” MENDELIAN GENETIC DISEASES • Diseases of Simple Genetic Architecture • Can tell how trait is passed in a family: follows a recognizable pattern (Mendelian disease) • One gene altered per family (exceptions) • Usually quite rare in population (exceptions) • “Causative” gene
Some examples of deleterious mutations Stop codon creation CAG Gln TAG
Modes of inheritance • X linked • Duchenne muscular dystrophy
Autosomal dominant • Huntington disease
Autosomal recessive • Cystic fibrosis
Mitochondrial • Leber Optic atrophy C
Functional cloning versus positional cloning of genes Disease Chromosomal localisation Function/ Protein Gene Disease Chromosomal localisation Function/ Protein Gene
Position-Independent Methods . • Gene-specific oligonucleotides: hemophilia A Factor VIII gene (most common form of hemophilia, X-linked) • Clotting factor purified from pig, and its N-terminal amino acids were sequenced. • This allowed a group of oligonucleotides to be synthesized. • These probes were used with colony hybridization against a cDNA library.
Positional cloning of genes Disease Chromosomal localisation Function/ Protein Gene Disease Chromosomal localisation Function/ Protein Gene
Identification of informative families Genetic mapping Physical mapping Identification of coding sequences (candidate genes) n o r m a l m u t é Mutation screening . . . C C T G A G G A G . . . . . . C C T G T G G A G . . . Functional analysis . . . P r o G l u G l u . . . . . . P r o V a l G l u . . .
Genetic mapping What are the markers that are used for genetic mapping
Polymorphisms used in Gene Mapping • 1980s – RFLP marker maps • 1990s – microsatellite marker maps
IL-12p35AC F tggtggcagaaatcattgtctgaaaagtaattgttttacttttattcttttcgtgtgtgtgtgtgt gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgcatgtgccagatttcttgtttgaaaggcaat gagcttcatccaagtatcaa 78.57% IL-12p35AC R IL-12p40AC F atttcaggtgtgagccactgtgcctggccagaactttttcaatgaatattcaagataattgtatacacattttatatatatatatatatatacacacacacacacacacacatatgtatacacacattatatatataatccatgttatatacatctctacattatatatatccactatatatattttacttatacatatagattttatttttatgaactaggatcaaattgta 69.23% 1 2 3 4 5 IL-12p40AC R 174 170 166 Identification de Polymorphismes de type microsatellites par analyse de séquence:
SNPs in Genetic Analysis • Abundance – lots • Position – throughout genome • Haplotype patterns – groups of SNPs may provide exploitable diversity • Rapid and efficient to genotype • Increased stability over other types of mutation
Gene mapping: Linkage analysis Do marker alleles co-segregate with the disease by chance or are there linked to the underlying gene?
Recombination Fraction • = ½ : independent assortment (Mendel) • < ½ : linked loci • = 0 : tightly linked loci (no recombination)
LOD Score Analysis The likelihood ratio as defined by Morton (1955): L(pedigree| = x) L(pedigree | = 0.50) where represents the recombination fraction and where 0 x 0.49. When all meioses are “scorable”, the LR is constructed as: L.R. = : z() is the lod score at a particular value of the recombination fraction : z() is the maximum lod score, which occurs at the MLE of the recombination fraction The LOD score (z) is the log10 (L.R.) H1: Linkage H0: Exclusion =0
1 to 10 years! Identification of informative families Cytogenetic anomalies Animal model Genetic mapping Physical mapping Identification of coding sequences (candidate genes) Functional candidate genes n o r m a l m u t é Mutation screening . . . C C T G A G G A G . . . . . . C C T G T G G A G . . . Functional analysis . . . P r o G l u G l u . . . . . . P r o V a l G l u . . .
Branchio-oto-renal syndrome • Clinical features: deafness, renal anomalies, cervical cysts… • Mapped to 8q13. PAC contig 11083 9480 4405 10910 cDNA library screening, cDNA selection and exon trapping
PAC (P1 derived) Sonication or partial digestion T 7 T 3 subcloning in pBCSK+ Selection of clones Sequencing T7, T3 Sequence assemble and analysis
A G C T A T The different steps used for sequence analysis Quality assessment Elimination of contaminating sequences Blastn against vector, bacteria, yeast… databases Assemble using Phred, Phrap, Consed Identification of candidate genes by blastx and tblastx, Gene prediction tool: GRAIL
11083 9480 4405 10910 BLASTX 1.4.7 [19-Dec-94] [Build 07:11:56 Jun 16 1995] Query= w1g9t7.Seq (743 letters) Translating both strands of query sequence in all 6 reading frames Database: ../../databases/fasta/nrprot 244,544 sequences; 71,258,360 total letters. Searching..................................................done Smallest Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N pir|S|A45174 eyes absent (eya) protein (alternatively... -2 173 5.6e-15 1 >pir|S|A45174 eyes absent (eya) protein (alternatively spliced) - fruit fly (Drosophila melanogaster) >gp||DRONOEYE_ Length = 760 Minus Strand HSPs: Score = 173 (79.6 bits), Expect = 5.6e-15, P = 5.6e-15 Identities = 29/36 (80%), Positives = 34/36 (94%), Frame = -2 Query: 169 LCLPXGVRGGVDWMRKLAFRYRRVKEIYNTYKNNVG 62 LCLP GVRGGVDWMRKLAFRYR++K+IYN+Y+ NVG Sbjct: 586 LCLPTGVRGGVDWMRKLAFRYRKIKDIYNSYRGNVG 621
EYA1 gene structure 1 2 1 4 - 1 1 1 ' 2 3 4 5 6 7 8 9 1 0 1 1 1 3 1 5 1 6 - I I I ' I I I I I I V V V I V I I V I I I I X X X I X I I X I V X V X I I I Identification of a new gene family EYA1, EYA2, EYA3, ….
COMPLEX (MULTIFACTORIAL) GENETIC DISEASE • Diseases of Complex Genetic Architecture • No clear pattern of inheritance • Moderate to strong evidence of being inherited • Common in population: cancer, heart disease, dementia etc. • Involves many genes and environment • “Susceptibility” genes
Complex disease loci mapping Linkage Analysis Large Families Small Families Association Studies Family-Based Case-Control
Study Designs Linkage Analysis Large Families Small Families Association Studies Family-Based Case-Control
1 2 (B-C)2 TDT= (B+C) TDT calculation Transmitted 2 1 12 12 Non-Transmitted 11 With > 5 per cell, this follows a 2 distribution with 1 df
Examples: Alzheimer’s • Alzheimer’s disease and ApoE The E4 allele appears to be positively associated with Alzheimer’s disease: Odds Ratio = (58/16)/(33/55) = 6
February 2001 « Finished » sequence April 1953-April 2003
Identification of informative families Genetic mapping Physical mapping Identification of coding sequences (candidate genes) n o r m a l m u t é Mutation screening . . . C C T G A G G A G . . . . . . C C T G T G G A G . . . Functional analysis . . . P r o G l u G l u . . . . . . P r o V a l G l u . . .
Genetic mapping Physical mapping Cytogenetic abnormalities Animal models Positional and functional candidates Genome databases and genome browsers Comparative Genome Hybridization. Comparative Genomics Microarray analysis Past and present tools
NCBI genome browser Visualize all the genes in an interval
How to collect and interpret all the data? • How to choose the best “candidate” gene?
Strategies and adapted tools for gene selection are urgently needed! • Find candidate genes for the trait (time and cost!) • WHAT genes are there? • WHAT do they do? • How could they play a role in the disease • = Data mining and integration!! • Visualization of the whole picture • Global view • Option to zoom into detail
Disease Gene Finding (Center for Biological Sequence Analysis) Combining network theory and phenotype associations in an automated large scale disease gene finding platform Networks – deducing functional relationships from network theory Phenotype association Grouping disorders based on their phenotype.