1.06k likes | 1.67k Views
Single Nucleotide Polymorphisms. Arthur M. Lesk Bologna Winter School 2011. What are SNPs and why are they important?. SNP = Single nucleotide polymorphism, an isolated change in a single nucleotide SNPs are one type of mutation Some have obvious functional consequences
E N D
Single Nucleotide Polymorphisms Arthur M. Lesk Bologna Winter School 2011
What are SNPs and why are they important? • SNP = Single nucleotide polymorphism, an isolated change in a single nucleotide • SNPs are one type of mutation • Some have obvious functional consequences • Sickle-cell haemoglobin: gag→gtg (β6 Gln→Val) • First “molecular disease” sickle-cell anaemia • Some are ‘silent’ • Some are in non-coding regions • affect splice sites? • affect regulatory sites? • some have no known phenotypic effect
What is a SNP? • The genomes of individuals in a population contain a particular base at some position most of the time. • That is, there is a “normal” sequence • A SNP is a deviation from the normal sequence. • Many people require that a variation occur in at least 1% of the population, to be considered a SNP • But: what population? What if two distinct populations have a consistent polymorphism?
SNPs in human genomes • SNPs are about 90% of all inter-human variation • Occur on the average once in every 300 bases • 2/3 of SNPs are C→T changes (perhaps because C can easily deaminate) → cytosine uracil
SNP density varies across human genome • Some high-density patches • Some ‘deserts’ • SNPs in coding regions ~1/3 as many as in non-coding regions • SNP density correlated with recombination rate (which causes which??) • AT microsatellites: long (AT)n repeat tracts tend to appear in regions of low SNP density
Figure 14 SNP density in each 100-kbp interval as determined with Celera-PFP SNPs. J C Venter et al. Science 2001;291:1304-1351 Published by AAAS
What is normal? • Obviously we all differ genomically • Swedes and Chinese have obviously different phenotypes • Most Swedes and Chinese are healthy indviduals • Therefore genetic differences do not necessarily cause disease • Pointless to check for differences from a single ‘reference sequence’ • Of course, many genetic differences not just SNPs
Variation in human and other species • Any two humans ~99.5% identical in sequence • Chimpanzees, gorillas: twice as variable, despite much smaller population size • Implies prehistoric bottleneck in human population, recent common origin • Most SNPs (> 5%) shared among human populations from around the world • Most populations (e.g. British) contain 85-90% of all known variation
Variation in human and other species • Some variation is population-specific • In some cases, there is local selective pressure • For example, adult lactose tolerance, malaria resistance • African populations have greatest genetic diversity • Supports ‘Out of Africa’ theory of human origin and migration
Identification of geographical origin, phenotype • A criminal leaves a blood sample at a crime scene • How much can we tell about him or her? • Not perfectly, but: • Ethnic group • Eye and hair colour (hair colour easier to change) • Family name?
Types of SNPs • Transitions: • purine↔ purine • pyrimidine↔ pyrimidine (cytosine→uracil) • Transversions: • purine ↔pyrimidine • Transitions are more common than transversions
Prevalence of SNPs in human genomes • approximately 1 in 300 bp (0.001%) • compare difference between human / chimpanzee genomes: • 4% different (not all SNPs!)
‘Life cycle’ of a SNP • Generation of a mutation • Initial survival, against ‘sampling loss’ • Increase in frequency – survival until become homozygous in some individuals; • chance of loss reduced (helped by bottlenecks, founder effects – population size dependent) • Fixation
Initial survival of a SNP • Suppose a person is heterozygous for a novel, selectively-neutral mutation. • Suppose the person has 2 children that survive to reproductive age. The probability of loss of the mutation is 25%. • If each descendant has 2 children that survive to reproductive age, probability of loss in 200 years = 94%
Where do SNPs occur in the human genome? • Distributed throughout the genome • 50% in non-coding regions • NOT the same asnon-functional!!! • 25% missense mutations (amino acid substitution) • 25% silent (amino acid unchanged) • silent = no change in encoded amino-acid sequence • NOT the same as no phenotypic effect!!! • would be better to call them synonomous SNPs rather than silent SNPs
SNPs in non-human genomes • Of course other species have SNPs • Here we will focus on human SNPs because of relevance to human disease • However, SNPs in pathogens are sometimes associated with antibiotic resistance, and therefore related to human disease • SNPs in some plants give clues to domestication
Organised efforts to collect SNPs • The HapMap is a catalogue of common human genetic variants • HapMap Project = international collaboration among Japan, the United Kingdom, Canada, China, Nigeria, and the United States • NOT Europe • Carry out measurements, provide database • Other projects collect SNPs in other species
HapMap project • International consortium: International HapMap Project • http://hapmap.ncbi.nlm.nih.gov/ • Catalogue of human genetic variants : • What sites? • How distributed – frequency in different populations • Raw material for linking genomics with disease
Origin of samples • Total of 270 people. • The Yoruba people of Ibadan, Nigeria • Japan (Tokyo) • China (Beijing) • U.S. residents with Northern and Western European ancestry
What is a haplotype? • Often, a set of SNPs appear nearby on the same chromosome • In absence of recombination, they will be inherited in blocks • Pattern of SNPs in a block is called a haplotype • A block may contain many SNPs, but only a few are needed to identify a haplotype • These signature SNPs within a haplotype block are called `tag SNPs’
http://www.riken.go.jp/engn/r-world/info/release/news/2003/nov/image/frol_06.gifhttp://www.riken.go.jp/engn/r-world/info/release/news/2003/nov/image/frol_06.gif
http://img.medscape.com/fullsize/migrated/553/400/ncpcard553400.fig1.gifhttp://img.medscape.com/fullsize/migrated/553/400/ncpcard553400.fig1.gif
Guide to SNP databases • SNPlinks: http://www.snpforid.org/snpdata.html • NCBI dbSNPhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp • The SNP Consortium http://snp.cshl.org/ • HapMaphttp://www.hapmap.org/ • Applied Biosystemshttp://myscience.appliedbiosystems.com/cdsEntry Assays-on-Demand /Form/assay_search_basic.jsp • Ensemblhttp://www.ensembl.org/Homo_sapiens/ • HGVBasehttp://hgvbase.cgb.ki.se/ • SeattleSNPshttps://gvs.gs.washington.edu/GVS/
dbSNP database at NCBI • non-redundant dataset • nomenclature: rs number • rs = reference SNP.
General human mutations • Human Gene Mutation Database http://www.hgmd.cf.ac.uk • over 100000 mutations, in 3700 genes • 6.2% of total ~23000 genes • about 10000 new mutations found per year • OMIM (Online Mendelian Inheritance in Man) • database of mutations associated with human disease • OMIA (Online Mendelian Inheritance in Animal)
Databases with important related information • Online Mendelian Inheritance in Man (OMIM) [NCBI] • Comprehensive compendium of human genes and associated phenotypes • Not limited to SNPs • SNPs3D http://www.snps3d.org/ • SNPs3D assigns molecular functional effects to non-synonymous SNPs based on structure and sequence analysis. • SNPperhttp://snpper.chip.org/ • Retrieve SNPs by position or gene association
Quality of sequence information is important • SNPs appear in human genome at approximately 1 in 300 bases • Obviously error rate in resequencing must be substantially lower than this if SNP data are to be meaningful • Measure of DNA sequencing quality: PHRED
PHRED – measure of sequence quality • Phred scores accepted to characterize the quality of DNA sequences • Originally Phred was a program, that determined accurate quality scores indicating error probabilities. • Accepted as general standard • Phred quality score Q. Let P = probability of base error Q = -10 log10 P
A method that gave an averaged phred score Q = 30 would give approximately as many errors as there are SNPs!
What can SNPs tell us? • Causes of disease -- dysfunctional protein • Correlation with disease prognosis, success of particular treatment • Useful genetic markers, to locate some gene of phenotypic interest; for instance, a gene correlated with a disease • Characterise individuals • Characterise populations (SNP distribution) • Applications in anthropology -- tracing of migrations, human evolution
Use of SNPs as genetic markers • Before 1980, genetic maps were constructed by measuring recombination frequencies between genes giving measurable phenotypic traits • This goes back at least to Sturtevandt and Morgan, if not to Mendel • At that time, phenotypes were the only visible aspect of the genome
Use of SNPs as genetic markers • In 1980, Botstein, Davis, Skolnick & White proposed using polymorphic DNA markers for genetic mapping, even if they had no known phenotypic effect • Example: (then) restriction sites • SNPs →restriction fragment length polymorphisms (RFLPs) • Did linkage mapping with restriction sites • Now we can use SNPs
Traits depending on multiple loci • Use of SNPs to identify traits, including but not limited to diseases, that depend on multiple loci • Single genes for diseases showing simple Mendelian inheritance (for instance, cystic fibrosis) can be isolated • Diseases that depend on interaction with multiple loci can be studied with enough SNP linkage information
SNPs tell us about human history • Development of ability to digest lactose past infancy correlated with domestication of cattle, increased (non-fermented) dairy products in human diet • Source of calcium and calories • Many Asian populations retain adult lactose intolerance • Where do they get calcium? “The soybean is the cow of Asia.”
Ability to digest lactose in adulthood • Digestion of lactose depends on enzyme lactase-phlorizinhydrolase, which catalyzes hydrolysis of lactose → glucose + galactose
Ability to digest lactose in adulthood • In many people, the ability to digest lactose is a juvenile characteristic • Expression declines after age 2 • varies among individuals • Consistent with lifestyle involving breast feeding until this age, followed by weaning followed by diet not including (non-fermented) milk and other dairy products • To form yoghurt, bacteria cleave lactose
Evolution of adult lactase expression • Domestication of cattle, with concomitant rise of milk in the diet, led to selective pressure for lactose tolerance • Mutation arose among cattle-raising people: • the Funnel Beaker culture • north-central Europe ~5,000-6,000 years ago • Most common mutations in Europeans: SNPs • C/T-13910 • G/A-22018 • Not surprisingly, in control regions for lactase gene
Prevalance of lactose-tolerance SNP Group Study Exchange http://gseorlando.files.wordpress.com/2010/09/j.jpg
Multiple development of lactose tolerance • Development of lactose tolerance apparently appeared four times, independently • Europe: C/T-13910 and G/A-22018 • Pastoral areas of Africa – three independent mutations: • G/C-14010 East Africa • T/G-13915 North Sudan • C/G-13907 North Kenya
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0002.jpghttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0002.jpg
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0001.jpghttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672153/bin/ukmss-4417-f0001.jpg
SNPs in anthropology • Useful in tracing relationships between populations, migration routes • Initially used mitochondrial DNA (16569 bp) • Maternal inheritance only • (Y chromosome gives paternal inheritance only) • Important argument for “out of Africa” theory of human origins and dispersal • Can choose non-selected regions, in contrast to previous work on blood groups, MHC haplotypes
Migration routes into Asia and the Pacific based on SNPs http://i49.tinypic.com/2d0j2py.jpg
DNA sequences and language groups • Proposal by L. L. Cavalli-Sforza • Showed consistency between trees based on genetic markers and trees based on linguistic groupings • Controversial! • In some cases, genomics has confirmed hypotheses of population affinity based on language similarity / dissimilarity • Basques are outliers in both genes and language
Recommended reading Tomasz Kamusella The Politics of Language and Nationalism in Modern Central Europe Palgrave Macmillan, 2008
What happens after invasions? • Hungary invaded by Magyars in 896 AD. Country converted to speaking Uralic language • Rome fell to vandals in 476 AD but did NOT impose their language. (Perhaps recognising superiority of Italian culture – which their descendants don’t) • England invaded by Anglo-Saxons in about 5th century. Anglo-Saxon pushed Celtic languages to far reaches of British Isles + Brittany • Norman invasion of 1066 did NOT entirely replace Anglo-Saxon by French.
Possible effects of SNPs • In protein-coding sequences • silent • missense • coding → stop codon • stop codon → coding • SNPs can → dysfunctional proteins • In splice sites • 15% of disease-causing mutations in human genome are point mutations in vicinity of mRNA splice junctions • In regulatory sequences
What are possible effects of SNPs in coding sequences? • Change in amino acid • Example: sickle-cell anaemia • sense codon → stop codon • protein truncated • stop codon→ sense codon • protein extended