1 / 43

Single Nucleotide Polymorphism

Single Nucleotide Polymorphism. Anshu Bhardwaj Research Fellow Centre for Cellular & Molecular Biology Hyderabad 8 th November, 2003. Single Nucleotide Polymorphism. Single base-pair differences occurring in a population with a frequency of >1%. ... C C A T T G A C.

wayde
Download Presentation

Single Nucleotide Polymorphism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single Nucleotide Polymorphism Anshu Bhardwaj Research Fellow Centre for Cellular & Molecular Biology Hyderabad 8th November, 2003

  2. Single Nucleotide Polymorphism Single base-pair differences occurring in a population with a frequency of >1% ...C C A T T G A C... …G G T A A C T G... ...C C G T T G A C... …G G C A A C T G...

  3. SNPs can be found in.. • NON-CODING REGION: • *5’ and 3’ UTR’s • * Introns • * splice sites • CODING REGION: * Non-synonymous • Amino acid substitution * Synonymous • Silent

  4. MUTATION POLYMORPHISM Single base-pair differences occurring in a population with a frequency of >1%

  5. GENOTYPIC FREQUENCY Relative distribution of genotypes in a population for a particular locus

  6. Percent p q MM MN NN Location 83.5 15.6 0.9 0.92 0.08 Greenland ALLELIC FREQUENCY The relative abundance of an allele of a particular gene with reference to its other alleles Let p=f(M) and q=f(N). Thus, p=f(MM) + ½ f(MN) and q=f(NN) + ½ f(MN).

  7. Genotype # of Individuals Genotypic frequencies MM 5118 MM = 5118/6129 = 83.5% MN 956 MN = 956/6129 = 15.6% NN 55 NN = 55/6129 = 0.9% Total 6129 ALLELIC FREQUENCY : The relative abundance of an allele of a particular gene with reference to its other alleles Percent p q MM MN NN Location 83.5 15.6 0.9 0.92 0.08 Greenland Let p=f(M) and q=f(N). Thus, p=f(MM) + ½ f(MN) and q=f(NN) + ½ f(MN). GENOTYPIC FREQUENCY : Relative distribution of genotypes in a population for a particular locus

  8. WHY SNPs ? ? • SNPs are distributed non-randomly throughout the genome • On an average a significant SNP is found for every 1kb of the human genome, resulting in approximately 3 million SNPs • Large number • Unambiguous assay techniques • High levels of polymorphisms in population • Most of the phenotypic differences arise from SNPs in genes, but these form only a small fraction of the total number

  9. dbSNP DENSITY DISTRIBUTION IN HUMAN • Mean Density : • 0.001765 SNPs per base (17.652 SNPs per 10 kb) • Mean Spacing : • 566.5118 bases per SNP

  10. SNP Discovery • SNP Discovery refers to the initial identification of new • SNPs • The established method is electrophoresis(DNA sequencing) • with subsequent data analysis. Some indirect Discovery • techniques (e.g., dHPLC, SSCP) only indicate that a SNP • (or other mutation) exists • DNA sequencing of multiple individuals is used to determine • the point and type of polymorphism

  11. SNP Validation • SNP Validation refers to genetic validation, the process of ensuring that the SNP is not due to sequencing error • Confirmation of SNPs found in discovery • Larger numbers of individual samples to get statistical data on occurrence in the population

  12. THE EXPERIMENTAL APPROACH • RESTRICTION FRAGMENT LENGTH POLYMORPHISM • SINGLE STRANDED CONFORMATIONAL POLYMORPHISM • DENATURING HIGH PRESSURE LIQUID CHROMATOGRAPHY • HYBRIDIZATION METHOD • MALDI-TOF METHOD SEQUENCING & ALIGNMENT THEREAFTER

  13. THE EXPERIMENTAL APPROACH • RESTRICTION FRAGMENT LENGTH POLYMORPHISM • SINGLE STRANDED CONFORMATIONAL POLYMORPHISM • DENATURING HIGH PRESSURE LIQUID CHROMATOGRAPHY • HYBRIDIZATION METHOD • MALDI-TOF METHOD SEQUENCING & ALIGNMENT THEREAFTER

  14. IN SILICO SNP PREDICTION POLYBAYES SEAN SNP Prediction Program SNP Finder

  15. IN SILICO SNP PREDICTION POLYBAYES SEAN SNP Prediction Program SNP Finder

  16. Restriction Fragment Length Polymorphisms Botstein et al (1980) CHANGES IN MIGRATION PATTERNS THAT REPRESENT ALLELIC VARIATION A 3 Kb Homolog 1 12 B 12 A 12 C Homolog 2 1 Kb 2 Kb PROBE B 3 Kb Homolog 1 & 2 C Homolog 1 & 2 1 Kb 2 Kb CAN BE USED TO DETECT SNPs DIFFERENTIALLY IN HOMOZYGOUS & HETEROZYGOUS INDIVIDUALS

  17. MALDI-TOF METHOD Matrix-assisted laser desorption ionization-time of flight

  18. High Voltage Sample Laser Detector source Drift region

  19. SEQUENCING METHOD:

  20. POLYBAYES BAYESIAN INFERENCE ENGINE TO CALCULATE THE PROBABILITY THAT A GIVEN SITE IS POLYMORPHIC • FRAGMENT CLUSTERING • PARALOGUE IDENTIFICATION • MULTIPLE ALIGNMENT

  21. SNP DETECTION IN REDUNDANT SEQUENCE DATA SEQUENCE CLUSTERING CLUSTER REFINEMENT MULTIPLE ALIGNMENT SNP DETECTION

  22. The PolyBayes Approach • Use genomic sequence as reference • cluster and align all available sequences • remove repeats/paralogs • Use Bayesian statistics to • distinguish polymorphic sites from artifacts • estimate likelihood • Marth, GT, Korf, I, Yandell, MD, Yeh, RT, Gu, Z, Zakeri, H, Stitziel, NO, Hillier, L, Kwok, P-Y, Gish, WR: A general approach to single-nucleotide polymorphism discovery. Nature Genet. 1999; 23:452-456.

  23. 1. Known repeat sequences are masked using RepeatMasker 2. FRAGMENT CLUSTERING (a) WU-BLAST used to search against dbEST (b) Sequence traces processed with PHRED base-calling values (c) Distinct group of matching ESTs registered as clusters 3. Each cluster member pair-wise aligned to the genomic anchor sequence with CROSS_MATCH

  24. PARALOGUE IDENTIFICATION 1. May give rise to false SNP predictions & points to difficulties during marker development 2. Calculate probability PNAT that a cluster member is derived from genomic region. 3. Distinguish between less accurate sequences that nevertheless originate from the same underlying genomic location More accurate sequences with high-quality discrepancies that are likely to be paralogous 4. Using a threshold value PNAT,MIN paralogous cluster members are removed

  25. 1 1+e(DNAT- DPAR).(DPAR/DPAR) DNAT = L * PPOLY.2 + E (PPOLY.2 = 0.001) DPAR = L * PPAR + E (PPAR =0.02) d = discrepancies P(MODELNAT|D) = PNAT,MIN = 0.75

  26. MULTIPLE ALIGNMENT • Depth of coverage • The base-quality values of the sequences • The a priori expected rate of polymorphic sites in the region • PSNP PROBABILITY THAT THE SITE IS POLYMORPHIC • DISTRIBUTION OF PROBABILITY SCORES EXHIBITS A • HIGH LEVEL OF SPECIFICITY

  27. THRESHOLD VALUE PSNP = 0.4

  28. THE POLYBAYES SOFTWARE

  29. OTHER SNP PREDICTION & SNP FINDING SOFTWARE • SEAN: Search for localized SNPs and predict SNPs • (http://zebrafish.doc.ic.ac.uk/Sean/) • SNP Finder: For analyzing user-submitted trace data (http://gai.nci.nih.gov/)

  30. SIGNIFICANCE OF SNPs • IN DISEASE DIAGNOSIS • IN FINDING PREDISPOSITION TO DISEASES • IN DRUG DISCOVERY & DEVELOPMENT • IN DRUG RESPONSES • INVESTIGATION OF MIGRATION PATTERNS ALL THESE ASPECT WILL HELP TO LOOK FOR MEDICATION & DIAGNOSIS AT INDIVIDUAL LEVEL

  31. SNP Screening • Two different screening strategies - Many SNPs in a few individuals - A few SNPs in many individuals • Different strategies will require different tools • Important in determining markers for complex genetic states

  32. SNP genotyping methods for detecting genes contributing to susceptibility or resistance to multifactorial diseases, adverse drug reactions: • => case-control association analysis ….GCCGTTGAC…. ….GCCATTGAC…. ….GCCATTGAC…. ….GCCATTGAC…. case control allele frequency genotype frequency haplotype frequency A %, G% AA %, AG %, GG% SNP1, SNP2, SNP3

  33. HAPLOTYPE A set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by recombination)

  34. SNP-Haplotype Phenotype SNP SNP BLACK EYE BROWN EYE BLACK EYE BLUE EYE BROWN EYE BROWN EYE GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAAT Haplotypes AG 2/6(BLACK EYE) GTA 3/6(BROWN EYE) AGA 1/6 (BLUE EYE) 1 2 3 4 5 6 DNA Sequence

  35. HAPLOTYPE CORRELATION WITH PHENOTYPE • The “Haplotype centric” approach combines the information of adjacent SNPs into composite multilocus haplotypes. • Haplotypes are not only more informative but also capture the regional LD information, which is assumed to be robust and powerful • Association of haplotype frequencies with the presence of desired phenotypic frequencies in the population will help in utilizing the maximum potential of SNP as a marker.

  36. ADVANTAGES: SNPs ARE THE MOST FREQUENT FORM OF DNA VARIATIONS THEY ARE THE DISEASE CAUSING MUTATIONS IN MANY GENES THEY ARE ABUNDANT & HAVE SLOW MUTATION RATES EASY TO SCORE MAY WORK AS THE NEXT GENERATION OF GENETIC MARKERS

  37. LIMITATIONS: 1.EXPERIMENTAL DETECTION OF SNPs REQUIRES IMPLEMENTATION OF EXPENSIVE TECHNOLOGIES 2. NEED FOR LARGE POPULATION DATASETS FOR ASSOCIATION STUDIES

  38. Some important SNP database Resources 1. dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/list.cgi) 2. TSC (http://snp.cshl.org/) 3. SNPper (http://snpper.chip.org/bio/) 4. JSNP (http://snp.ims.u-tokyo.ac.jp/search.html) 5. GeneSNPs (http://www.genome.utah.edu/genesnps/) 6. HGVbase (http://hgvbase.cgb.ki.se/) 7. PolyPhen (http://dove.embl-heidelberg.de/PolyPhen/) OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM) 8. Human SNP database (http://www-genome.wi.mit.edu/snp/human/) Feb. 25. 2003 SI Hung

More Related