SNP molecular function, evolution and disease

SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Effect on molecular function Structural Biology Biochemistry Medical Genetics Evolutionary Genetics Phenotype Natural selection

Predicting the effect of mutations in proteins

Why is this useful? • Understanding variation in molecular function and structure • Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection

Linkage analysis Rare

Classical association studies Common Disease Control

Quantitative trait Mendelists Biometricians Forces to maintain variation: Selection Mutation

Common disease / Common variant Trade off (antagonistic pleiotropy) Balancing selection Recent positive selection Reverse in direction of selection Examples APOE Alzheimer’s disease AGT Hypertension CYP3A Hypertension CAPN10 Type 2 diabetes

Individual human genome is a target for deleterious mutations ! Frequency of deleterious variants is directly proportional to mutation rate (q=m/s) ~40% of human Mendelian diseases are due to hypermutable sites

Multiple mostly rare variants Many deleterious alleles in mutation-selection balance Examples Plasma level of HDL-C Plasma level of LDL-C Colorectal adenomas

Function: damaging Evolution: deleterious Phenotype: detrimental Advantageous pseudogenization (Zhang et al. 2006) Gain of function disease mutations Sickle Cell Anemia Harmful mutations

protein multiple alignment profile

PolyPhen

Prediction rate of damaging substitutions possibly probably 82% 57% Disease mutations 9% 3% Divergence Polymorphism 27% 15%

10% of PolyPhen false-positives are due to compensatory substitutions

Neutral mutation model Human ACCTTGCAAAT ChimpanzeeACCTTACAAAT Baboon ACCTTACAAAT Prob(TAC->TGC) Prob(TGC->TAC) Prob(XY1Z->XY2Z) 64x3 matrix

Strongly detrimental mutations

Effectivelyneutralmutations

Mildlydeleterious mutations

Mildly deleterious mutations 54 genes, 757 individuals inflammatory response 236 genes, 46-47 individuals DNA repair and cell cycle pathways 518 genes, 90-95 individuals

Wild type New mutation N1= 4 N2= 3 N2 Fitness 1 = 1 – s N1 Selection coefficient Fitness and selection coefficient

Classical association studies Common Disease Control

Genetic polymorphism • Genetic Polymorphism: A difference in DNA sequence among • individuals, groups, or populations. • Genetic Mutation: A change in the nucleotide sequence of a • DNA molecule. • Genetic mutations are a kind of genetic polymorphism. Genetic Variation Single nucleotide Polymorphism (point mutation) Repeat heterogeneity

SNPSingle Nucleotide Polymorphisms • A Single Nucleotide Polymorphism is a source variance in a genome. • A SNP ("snip") is a single base mutation in DNA. • SNPs are the most simple form and most common source of • genetic polymorphism in the human genome (90% of all human DNA polymorphisms). • There are two types of nucleotide base substitutions resulting in SNPs: • Transition: substitution between purines (A, G) or between • pyrimidines (C, T). Constitute two thirds of all SNPs. • Transversion: substitution between a purine and a pyrimidine.

SNP -----------------------ACGGCTAA -----------------------ATGGCTAA Instead of using restriction enzymes, these are found by direct sequencing They are extremely useful for mapping Markers Classical Mendelian 100 RFLPs 7000 SNPs 1.4x106 SNPs occur every 300-1000 bp along the 3 billion long human genome Many SNPs have no effect on cell function

Human Genome and SNPs • Human genome is (mostly) sequenced, attention turning to the evaluation of variation • Alterations in DNA involving a single base pair are called single nucleotide polymorphisms, or SNPs • Map of ~1.4 million SNPs (Feb 2001) • It is estimated that ~60,000 SNPs occur within exons

Goals of SNP Initiatives • Immediate goals: • Detection/identification of all SNPs estimated to be present in the human genome • Interest also in other organisms, e.g. potatoes(!) • Establishment of SNP Database(s)

SNPs Humans are genetically >99 per cent identical: it is the tiny percentage that is different Much of our genetic variation is caused by single-nucleotide differences in our DNA : these are called single nucleotide polymorphisms, or SNPs. As a result, each of us has a unique genotype that typically differs in about three million nucleotides from every other person. SNPs occur about once every 300-1000 base pairs in the genome, and the frequency of a particular polymorphism tends to remain stable in the population. Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences".

Longer term goals: Areas of SNP Application • Gene discovery and mapping • Association-based candidate polymorphism testing • Diagnostics/risk profiling • Response prediction • Homogeneity testing/study design • Gene function identification etc.

Polymorphism • Technical definition: most common variant (allele) occurs with less than 99% frequency in the population • Also used as a general term for variation • Many types of DNA polymorphisms, including RFLPs, VNTRs, micro-satellites • ‘Highly polymorphic’ = many variants

SNPs in Genetic Analysis • Abundance – lots • Position – throughout genome • Haplotype patterns – groups of SNPs may provide exploitable diversity • Rapid and efficient to genotype • Increased stability over other types of mutation • Recombination patterns – e.g. ‘hot spots’

Coding Region SNPs • Types of coding region SNPs • Synonymous: the substitution causes no amino acid change to the protein it produces. This is also called a silent mutation. • Non-Synonymous: the substitution results in an alteration of the encoded amino acid. A missense mutation changes the protein by causing a change of codon. A nonsense mutation results in a misplaced termination. • One half of all coding sequence SNPs result in non-synonymous codon changes. Occasionally, a SNP may actually cause a disease. SNPs within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein.

Intergenic SNPs • Researchers have found that most SNPs are not responsible for a disease state because they are intergenic SNPs • Instead, they serve as biological markers for pinpointing a disease on the human genome map, because they are usually located near a gene found to be associated with a certain disease. • Scientists have long known that diseases caused by single genes and inherited according to the laws of Mendel are actually rare. • Most common diseases, like diabetes, are caused by multiple genes. Finding all of these genes is a difficult task. • Recently, there has been focus on the idea that all of the genes involved can be traced by using SNPs. • By comparing the SNP patterns in affected and non-affected individuals—patients with diabetes and healthy controls, for example—scientists can catalog the specific DNA variations that underlie susceptibility for diabetes

Polymorphic Sites Revealed in Sequencing

Medium- and Low-throughput SNP Genotyping I. SNP Discovery and validation. A. Data base mining, “resequencing” on microarrays, de novo sequencing of EST libraries. B. Genotyping of pooled samples for determining heterozygosity. II. How many SNPs are to be typed in how many samples? A. What degree of multiplexing is possible for the” before-typing” PCR reactions? B. What degree of multiplexing is possible for the genotyping reactions? III. What is the appropriate platform given the size of the project, the budget and the degree of automation desired?

Mapping 100K Coverage: 116,204 SNPs July 2003 NCBI build 34 Red = at least 1 SNP per 100 kb Black = Gaps in genome coverage • 92% of genome within 100kb of a SNP • 83% of genome within 50 kb of a SNP • 50% of genome within 15 kb of a SNP • 25% of genome within 5 kb of a SNP

Chemistry/Demultiplexing/Detection Options in SNP Genotyping Enzyme Chemistry Demultiplexing Detection Method Platform/Company Illumina BeadArrayTM Allele-Specific Extend + Ligate Semi-Homogen. Luminex 100 Flow Cytometry Sequenom iPlexTM Mass Spec. Oligonucleotide Ligation Assay Fluorescence Solid phase microspheres ABI SNPlexTM Single Nucleotide Primer Extension Homogeneous Mass Spectrometry Microarray Minisequencing ABI TaqmanTM 5’-Nuclease Capillary Electrophoresis Allele-Specific Hybridization Fluor Res Energy Transfer-FRET ABI SNaPShotTM Solid phase microarray “DASH”, Amplicon Tm Allele-Specific PCR Fluorescence Polarization Perkin-Elmer FP-TDI

A Long GC Short GC C A 5’ 5’ A A G T A 5’ 5’ T T Allele-specific Hybridization 5’ A C 5’ T Single Base Primer Extension, “Minisequencing” T LSO C T Allele-specific Primer Extension 5’ A Allele-specific Primer Extension and Ligation T Enzymatic Options in SNP Genotyping 5’ PCR only: Tm-shift Primers Probes SBE Primer ddC-biot or ddA-biot ddA-biot, dATP, dTTP, dGTP

SNP Genotyping on Beads/Microarrays Multiplex PCR Selection of SNPs Cyclic SBE/ASPE with biot(fluor.)-ddNTP/dNTP Design of PCR and “Tag” SBE/ASPE primers Capture of products on beads Preparation of beads with “Anti-Tag” primers Signal measurement in flow cytometer/scanner

Single Base Extension (SBE) of Targets on Microarrays Pastinen, et al., Gen. Res. 7, 606, 1997

SBE (Minisequencing) of Target DNA with Glass-immobilized primers

Allele-Specific Extension & Identification in CE: “Minisequencing” (ABI SNaPShotTM)

Degree of Multiplexing Depends on Resolution in CE dR6G dR110 ABI SNaPshot® on 3130xl

Fluorescence Polarization Gen. Res. 9: 492, 1999

SBE (Minisequencing) with Detection by Fluorescence Polarization Gen. Res. 9: 492, 1999

Genotyping by SBE and Mass Spectrometry PCR Amplification SAP Treatment Single Base Extension Spot on 384-place Chips MALDI-TOF Mass Spec

Allele-specific Primer Extension (ASPE) with Chain Termination

SNP molecular function, evolution and disease

SNP molecular function, evolution and disease

Presentation Transcript

Molecular Evolution

Sex and molecular evolution

Molecular Evolution

Molecular Phylogeny and Evolution

Molecular Evolution

Molecular Evolution

Protein Function and Evolution

Molecular Evolution

Molecular Evolution and Phylogeny

Thyroid Function and Disease

Molecular evolution

Molecular Evolution

Molecular evolution

Molecular Evolution

Molecular Evolution

Molecular Phylogeny and Evolution

Human non-synonymous SNP: molecular function, evolution and disease

Molecular evolution

Molecular Evolution

Molecular Evolution and Ebola

Thyroid Function and Disease

Molecular Evolution and Phylogeny