330 likes | 532 Views
SNP and Variation. Ka-Lok Ng Asia University. References. http://www.mun.ca/biology/scarr/4241rm_chapter31.html http://www.bioinfo.rpi.edu/~bystrc/courses/biol4540/lecture21/lec21.pdf. Introduction.
E N D
SNP and Variation Ka-Lok Ng Asia University
References • http://www.mun.ca/biology/scarr/4241rm_chapter31.html • http://www.bioinfo.rpi.edu/~bystrc/courses/biol4540/lecture21/lec21.pdf
Introduction • Having sequenced the genomics then studies the nature and distribution of variation between individuals • Variation at DNA level = nucleotide insertions, deletions, and Single Nucleotide Polymorphism (SNP) or small nucleotide polymorphisms • SNP refers to any site where two or more different nucleotides are segregating in population.• Cluster of linked SNP’s = haplotype• SNP’s and haplotype’s are increasingly important component in biological studies which range from ecology and evolution to biomedical (disease association study)• These variations apply to characterization of population structure and history or functional study of genes.• They are indispensable for recombination mapping purposes (linkage analysis) or used as positional markers for physical mapping • SNPs are the most common genetic variationsoccur once every 100 to 300 bases.
The Nature of Single Nucleotide Polymorphisms Classification of SNP’s • Most common = changing from one base to another • This could either be transversions or transitions • Could also be insertions and deletions, also termed “indels” • Some geneticists see two-nucleotide changes and small insertions/deletions of a few nucleotides as SNP’s, therefore simple-nucleotide polymorphism may be a better description • Microsatellites, longer sequence repeats, and any other molecular polymorphism (transposable element insertions, deletions, chromosome inversions and translocations, and aneuploidy) are not regarded as SNPs • Aneuploidy is an error in cell division that results in the "daughter" cells having the wrong number of chromosomes. In some cases there is a missing chromosome, while in others an extra.
Classification of SNP’s • SNP’s classified on nature of affected nucleotide • Noncoding SNP– 5’ or 3’ nontranscribed region (NTR), 5’ or 3’ untranslated region (UTR), intron, or intergenic
3.1 (Part 1) Human promoter SNPs that affect gene expression • Coding SNP – replacement polymorphisms(change the amino acid encoded for) or synonymous polymorphisms(change the codon but not the amino acid) • Nonreplacement polymorphisms includeboth synonymous and noncoding polymorphisms, but, could still affect gene functionby having an effect on transcriptional or translational regulation, splicing, or RNA stability. • This type of polymorphism is important in increased genetic variation (Fig 3.1). • Fig. 3.1 – a collection of over 140 human promoter SNPs that have been associated with an effect on gene expression or TF binding, and in many cases, a clinical outcome Fig. 3.1. Human promoter SNPs that affect gene expression. These are loci for which a SNP has been implicated in modulation of transcript levels, either by statistical association or using a biochemical assay in cell lines that are dispersed throughout the human genome. The figure shows where some of these nonreplacement polymorphisms lie and affect gene expression.
3.1 (Part 2) Human promoter SNPs that affect gene expression Fig. 3.1. Human promoter SNPs that affect gene expression.
SNPs can also be classified as transitions or transversions • Transitions– change purine to a purine (A G) or a pyrimidine to a pyrimidine (C T) • Transversions– change purine to pyrimidine and vice versa (A or G C or T and vice versa) • Transitions tend to occur just as frequently as transversions and are actually more prevalent (普遍的), despite transversions having twice as many possible changes • This holds broadly true for both coding and noncoding SNP’s • In part a result of difference in ab initio (protein prediction) mechanisms where certain types of mutations arise and are repaired • Due to the nature of the genetic code, transitions are less likely to affect amino acids than transversions. • This means transitions are thought to have a higher probability of retaining the proper coding regions number of transitions/number of transversions > 1 in coding region
Synonymous • TGT TGC results in Cys Cys Nonsynonymous: replacement • TGT TGG results in Cys Trp • can be conservative or nonconservative • Nonsynonymous: nonsense mutation, introduction of a stop codon • TGT TGA results in Cys stop • Nonsynonymous: read through mutation • TAA TTA results in stop Ile
SNP and disease • Sickle-cell anaemia – a disease caused by a specific SNP: an AT mutation (GTGAG GTGTG) in the b-globin gene changes a Glu Val, creating a sticky surface on the haemoglobin molecule that leads to polymerization of the deoxy form SNP and blood groups – A, B and O alleles • A and B alleles differ by four SNP substitutions • They code for related enzymes that add different saccharide (sugar, general formula (CH2O)n) units to an antigen on the surface of red blood cells (rbc) Allele Sequence Saccharide A ….gctggtgacccctt N-acetylgalactosamine B ….gctcgtcaccgcta galacotse O ….cgtggt-acccctt -- • The O allelle has undergone a mutation causing a phase shift, and produce no enzyme. The rbc of type O contain neither the A nor the B antigen, This is why people with type O blood are universal donors in bolld transfusions. The loss of activity of the protein does not seem to carry any adverse consequences. The ABO antigens are terminal sugars found at the end of long sugar chains (oligosaccharides) that are attached to lipids on the red cell membrane. The A and B antigens are the last sugar added to the chain. The "O" antigen is the lack of A or B antigens but it does have the most amount of next to last terminal sugar that is called H antigen.http://matcmadison.edu/is/hhps/mlt/mljensen/BloodBank/lectures/abo_blood_group_system.htm
In classical population genetic theory, genetic loci are only regarded as polymorphic if the frequency of the most common allele is < 95% that is a 5% changes • Most SNP are first detected in a sample of fewer than 10 individuals, so the frequency criterion is not applied; all single nucleotide changes are described initially as candidate SNPs. • NCBI – dbSNP http://www.ncbi.nlm.nih.gov/SNP/index.html • Seattle SNP http://pga.mbt.washington.edu
From Fig. 3.1 chromosome 1 ‘FY’, and do a NCBI search • NCBI SNP keyword FY AND homo refSNP ID: rs17851571
Comment - polymorphisms ≠ mutations Confusion arises over the distinction between polymorphisms and mutations, largely due to dual usage of the term “mutation”. All SNPs arise as mutations, in the sense that the conversion of one nucleotide into another is a mutational event. But by the time a seq. variant is observed in a population, the event that created it is usually long past, so the observed SNP is no longer a mutation – it is just a rare seq. variant or a polymorphism. Since the distinction only applies to a small fraction of all SNPs, then the term polymorphism is more general.
Distribution of SNPs • Distribution of SNP's lies within the domain of population genetics • Study of relationship between SNP's and phenotypic variation lies in the domain of Quantitative Genetics • Application of SNP Quantitative trait loci (QTL), which are loci that contribute to polygenic phenotypic variation Neutral theory of molecular evolution • Balance between mutation and genetic drift • Rate of mutations introduced into a population = rate at which polymorphisms are lost • Most mutations whether deleterious, advantageous or neutral in effect, are lost within a few generations • The effect of selection – acts to reduce the frequency of slightly deleterious alleles, but on occasion tends to favor a new allele (positive selection) or maintain two or more polymorphisms (balancing selection) at some loci
Three key concepts are important in characterizing SNP variation • Allele frequency distribution • Linkage disequilibrium • Population stratification (層化) Aspects of frequency distribution • Population structure - example: SNP can be more frequent in one population than another. As migration is a potent (有效的) source of diversity, isolation affects the rate at which variation is lost (i.e. no variation) due to drift. • Nucleotide Diversity - the average fraction of nucleotides that differ between a pair of alleles chosen at random from a population • Hs – lower nucleotide diversity, with an average of one SNP every kbp between the chromosomes of any individuals • Fly and maize – an order of magnitude greater polymorphism, with one SNP every 50-100 bp Linkage Disequilibrium and Haplotype Maps • Linkage Disequilibrium (LD) – Non-random association of alleles • LD allows mapping of disease loci in large population • In humans - LD is commonly observed for several tens, and in many cases, ~100 kbps of either side of SNP • LD has an effect on haplotypes which display clustered distribution • Broad approximation - Genome = tens of thousands of blocks • Each block = up to 100,000 bases • = 3 ~ 5 common haplotypes • Each haplotype = tens or hundreds of SNPs in LD • International HapMap Project - Effort to map all common haplotypes in human genome Population Stratification - the partitioning of genetic variation among population within species
3.2 (Part 1) Nucleotide diversity in natural populations Fig. 3.2 Nucleotide diversity in natural population. (A) Observed and expected of SNP frequencies for 874 SNP's from 75 candidate human hypertension loci. Rare alleles are the most frequent, and the number of SNPs in each frequency class declines as the more rare allele becomes more common. In a sample of several hundred alleles, the most common class of SNPs are singletons (which appear only once in the sample), followed by doubletons, tripletons, and so on. Only between 1/3 and ½ of all SNPs are “common” in the sense that the more rare allele is present in more than 5% of the individuals.
3.2 (Part 2) Nucleotide diversity in natural populations (B) LD (D’) decays with time (number of generations) in proportion to the recombination rate r. (C) The level of nucleotide diversity is a function of recombination rate, and hence chromosomal position, as in this example for fruit-fly. (B) As number of generations ↑, frequency of SNP segregate ↑ (no more clustering) LD ↓ (C) as r ↑, nucleotide diversity ↑
dbSNP accepts submissions for SNP, microsatellite repeats, and small-scale deletion and insertion polymorphisms NCBI – dbSNP http://www.ncbi.nlm.nih.gov/SNP/index.html dbSNP summary for various species
dbSNP • Submitted data • The submitter HANDLE is a short tag that uniquely defines each submitting laboratory in the database • A unique ssSNP identifier SNP order record, such as ss4923558, HANDLE = YUSUKE • Keyword: ss4923558 AND homo • Keyword ss4923558 will return multiple records ! More than 11 rsSNP records • More than one submitter more than one ssSNP these ssSNP are clustered into reference SNP identifier rsSNP
Alleles: A/G Ancestral Allele: G Handle: YUSUKE, EGP_SNPS, PERLEGEN, ABI Fasta seq.: >gnl|dbSNP|rs3737559|allelePos=301|totalLen=601|taxid=9606|snpclass=1|alleles='A/G'|mol=Genomic|build=126 dbSNP
Go to the bottom of the page • JBIC – sample size 1270, Allele frequency of A and G • Other populations have a smaller sample size
Click NCBI Assay ID ss4923558 • Japanese Millennium Genome Project • Measured in a group of East Asian DNA samples • There is no individual genotype data for ss4923558 • Click Handle|Submitter ID • YUSUKE|IMS-JST082810 Allele frequency G : 0.8929 A : 0.1071 Sample Size : 1270 (number of chromosomes)
Entrez SNP search terms • http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Snp
Alleles SNP integration in Genome Browsers Ensembl http://www.ensembl.org/index.html rs3737559 • SNP rs3737559 is located in the following transcripts • Genotype and Allele frequencies per population BioMart
The local DNA seq. within 100 kb on either side of the SNP is shown.
The different types of SNPs are color coded as to type (e.g. coding, intronic, flanking or other). Deletion and insertion polymorphisms are indicated with a triangle. The letters (K, M, R, S, W, Y) inside the SNP squares indicate the type of SNP using IUPAC ambiguity codes.
SNP UCSC Genome browser http://genome.ucsc.edu/cgi-bin/hgGateway BRCA1 gene
NCBI Entrez Gene Gene: BRAC1 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=search&term=brca1 SNP GeneView The coding SNPs in the BRCA1 gene. Those that do not change the aa are colored in green, those that result in a different aa are colored in red.
SNP association studies Association studies • A case group of people vs. a control group of people • The case group - are diagnosed with some disease (e.g. cystic fibrosis), react to some type of medicine, or are even specially healthy (e.g. more than 100 years old) • The control group are people that do not exhibit the feature selected for the case group. • For case-control studies, a selection of SNPs is genotyped in both the case and control groups • alleles (case group) > alleles (control group) potential markers for the observed phenotype
SNP and disease • Functional variation – a SNP may be assoicated with a nonsynonymous substitution in a coding region