1 / 25

Sequence Variation in Ensembl

Sequence Variation in Ensembl. Outline. SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources. Single nucleotide polymorphisms (SNPs). Two human genomes differ by ~0.1% Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people

alagan
Download Presentation

Sequence Variation in Ensembl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Variation in Ensembl

  2. Outline • SNPs • SNPs in Ensembl • Linkage disequilibrium • SNPs in BioMart • DAS sources

  3. Single nucleotide polymorphisms (SNPs) • Two human genomes differ by ~0.1% • Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people • Most polymorphisms (~90%) take the forms of SNPs: variations that involve just one nucleotide • ~1 out of every 300 bases in the human genome • ~10 million in the human genome

  4. Functional Consequences • SNPs in coding area that alter aa sequence • SNPs in coding areas that don’t alter aa sequence • SNPs in promoter or regulatory regions • SNPs in other regions Cause of most monogenic disorders, e.g: Hemochromatosis (HFE) Cystic fibrosis (CFTR) Hemophilia (F8) May affect splicing May affect the level, location or timing of gene expression No direct known impact on phenotype, useful as markers

  5. Practical Applications • Disease diagnosis • Association studies • Pharmacogenomics • Forensic testing • Population genetics and evolutionary studies • Marker-assisted selection

  6. Practical Applications

  7. SNPs in Ensembl • Most SNPs imported from dbSNP (rs……): • Imported data: alleles, flanking sequences, frequencies, …. • Calculated data: position, synonymous status, peptide shift, …. • For human also: • HGVbase • TSC • Affy GeneChip 100K and 500K Mapping Array • Affy Genome-Wide SNP array 6.0 • Ensembl-called SNPs (from Celera reads and Jim Watson’s and Craig Venter’s genomes) • For mouse, rat, dog and chicken also: • Sanger- and Ensembl-called SNPs (other strains / breeds)

  8. dbSNP • Central repository for simple genetic polymorphisms: • single-base nucleotide substitutions • small-scale multi-base deletions or insertions • retroposable element insertions and microsatellite repeat variations • http://www.ncbi.nlm.nih.gov/SNP/index.html • For human (dbSNP build 128): • 34,434,159 submissions (ss#’s) • 11,883,685 RefSNP clusters (rs#’s) • 6,262,709 validated • 737,679 with frequency

  9. SNPs in Ensembl - Types Non-synonymous In coding sequence, resulting in an aa change Synonymous In coding sequence, not resulting in an aa change Frameshift In coding sequence, resulting in a frameshift Stop lost In coding sequence, resulting in the loss of a stop codon Stop gained In coding sequence, resulting in the gain of a stop codon Essential splice site In the first 2 or the last 2 basepairs of an intron Splice site 1-3 bps into an exon or 3-8 bps into an intron Upstream Within 5 kb upstream of the 5'-end of a transcript Regulatory region In regulatory region annotated by Ensembl 5' UTR In 5' UTR Intronic In intron 3' UTR In 3' UTR Downstream Within 5 kb downstream of the 3'-end of a transcript Intergenic More than 5 kb away from a transcript

  10. Human • Chimp • Mouse • Rat • Dog • Cow • Platypus • Chicken • Zebrafish • Tetraodon • Mosquito SNPs in Ensembl - Species

  11. Caveat For human, mouse and rat Ensembl defines all SNP alleles respective to the + strand of the genome assembly! (to be able to merge dbSNP data with Sanger resequencing data) Exceptions: Those cases where SNPs are shown as part of a sequence

  12. 5 MINUTE EXERCISE A missense SNP, C1858T, in PTPN22 (Tyrosine-protein phosphatase non-receptor type 22) has been identified as a genetic risk factor for rheumatoid arthritis. This SNP is also referred to as R620W. • Find the SNPView page for this SNP. • Why are the alleles on this page given as A/G? • What is the minor allele of this SNP in Caucasians?

  13. SNPs in Ensembl GeneSNPView (1) Transcript InterPro domains SNP alleles

  14. SNPs in Ensembl GeneSNPView (2)

  15. SNPs in Ensembl TranscriptSNPView (1) • Shows SNP alleles in different: • Individuals (human): • Celera HuAA, HuCC, HuDD and HuFF, Craig Venter, Jim Watson • Strains (mouse, rat) • Breeds (chicken, dog)

  16. SNPs in Ensembl TranscriptSNPView (2) Different individuals Resequencing coverage SNP alleles Alleles in different individuals

  17. SNPs in Ensembl TranscriptSNPView (3)

  18. 5 MINUTE EXERCISE • Find the TranscriptSNPView page for human PTPN22. • Do all individuals (HuAA, HuCC, HuDD, HuFF, Venter and Watson) have resequence coverage at the position of the C1858T (R620W) SNP? • Has any of the individuals a higher risk to get rheumatoid arthritis based on its genotype at this position? • Is there an individual that is heterozygote at this position?

  19. Haplotypes and Linkage Disequilibrium A haplotype is a set of SNPs on a single chromatid that are statistically associated Linkage disequilibrium describes a situation in which some combinations of SNP alleles occur more or less frequently in a population than would be expected from a random formation of haplotypes from alleles based on their frequencies

  20. Measures of LD • D = P(AB) – P(A)P(B) • D ranges from – 0.25 to + 0.25 • D = 0 indicates linkage equilibrium • dependent on allele frequencies, therefore of little use • D’ = D / maximum possible value • D’ = 1 indicates perfect LD • estimates of D’ strongly inflated in small samples • r2 = D2 / P(A)P(B)P(a)P(b) • r2 = 1 indicates perfect LD • measure of choice

  21. Linkage Disequilibrium LDView It is also possible to export SNP information for upload into the HaploView software tool

  22. Linkage Disequilibrium LDTableView

  23. 5 MINUTE EXERCISE Retrieve all non-synonymous SNPs for the human CFTR gene using BioMart and export their id, genomic position, alleles and peptide shift (hint: which dataset should you start with?).

  24. DAS Sources For human, data from the following DAS Sources can be visualised on ContigView: • DGV and DGV loci: Structural variations from the Database of Genomic Variations (CNVs, InDels, inversions etc.) • RedonCNV regions and RedonCNV loci: Copy number variations from Redon et al. paper • SegDup Washu: Segmental Duplications, University of Washington

  25. Q & A Q U E S T I O N S A N S W E R S

More Related