260 likes | 407 Views
Predicting effect of SNPs and de novo mutations on splicing. presented by Alexander Tchourbanov Biology Department New Mexico State University. Motivation. Recently, high throughput genotyping methods became available
E N D
Predicting effect of SNPs and de novo mutations on splicing presented by Alexander Tchourbanov Biology Department New Mexico State University
Motivation • Recently, high throughput genotyping methods became available • High-density 500K chips are available for genotyping (Illumina Hap550, Affymetrix 5.0) • Genome resequencing (SOLID Applied Biosystems, Solexa/Illumina genome analyzer, Roche 454 FLX) • Researchers, interested to understand genetic risk factors contributing to a disorder, routinely genotype patients
Motivation • Many SNPs have been associated with predisposition to various diseases (Breast cancer, Alzheimer's, Multiple sclerosis, etc.) • Only fraction of actual SNPs are genotyped with chips • Some SNPs with significantly low P-values have been associated through LD with affected haplotypes • Fraction of associated SNPs are causal variants • There is a growing evidence that Autism Spectrum Disorder (ASD) could be triggered by de novo mutations absent in both parents
Types of SNPs • Several classes of variants to consider: • Single Nucleotide Polymorphisms (SNPs) • Deletion/Insertion polymorphisms (DIPs) • Simple Tandem Repeat polymorphisms (STRs) • Named polymorphisms (e.g., Alu/ dimorphisms) • Multinucleotide polymorphisms (MNPs)
SNPs distribution • ~ 6 million SNPs are located in human gene loci (dbSNP build 129) • 63% intronic • 11% untranslated region • 1% nonsynonymous • 1% synonymous • 24% 2 kBp from a gene • <1% splice site • <1% unknown coding variant
What are the common disease causing variants? • SNPs are defined as former mutations with >1% of population penetrance • According to Human Gene Mutation Database HGMD (http://www.hgmd.cf.ac.uk) • 49,806 mutations are missence/nonsense • 8,548 mutations have consequences in mRNA splicing • Many missence/nonsence mutations are eliminated by purifying selection and never make it to SNPs…
Splicing components Image credit: Understanding alternative splicing: towards a cellular code: Arianne J. Matlin, Francis Clark and Christopher W. J. Smith, Nature Reviews Molecular Cell Biology 6, 386-398 (May 2005)
Orthologos blocks from UCSC GB • 2,333,379 extended exons from 23 Tetrapoda organisms were obtained • A number of experimental reports showed that genes from distantly related Tetrapoda organisms were correctly expressed and post-transcriptionally modified in transgenic animals (Capetanaki Y et al.: Proc Natl Acad Sci USA 1989, Jacobs GH et al.: Science 2007) • The genes encoding well-known RNA binding proteins involved in splicing regulation are enriched with ultraconserved elements (Bejerano G. et al.:Science 2004)
Elements found • Using the orthologous exons available for 23 Tetrapoda organisms we have identified 2,546 unique splicing regulatory elements. • Among these elements 203 (7.97%) 3’SS and 177 (6.95%) 5’SS supporting motifs are novel and have not been previously reported in systematic screens detecting such elements. • Among our predicted elements, 41.08% of sequences were heptamers and 51.81% were octamers and only 6.76% hexamers and 0.35% pentamers
Optimal exon length • Depends on flanking 5’SS and 3’SS strengths
Exon scoring method • LOD scores associated with 5’SS,3’SS, exonic length, competing SSs and Enhancer/Silencer signals are combined towards an exon strength
IVS2+2delC mutation >IVS2+2delC ttcggataagacaaagattttatataatattttgaaaacattaaataatt tgtcattcctttatttcctttattttagCTTCGCAGAATCAAGAACGGCTATGTGCGTTTAAAGATCCGTATCAGCAAGACCTTGGGATAG/GTGAGAGTAGAATCTCTCATGAAAATGGGACAATATTATGCTCGAAAG/GTAGCACCTGCTATGGCCTTTGGGAGAAATCAAAAGGGGACATAAATCTTGTAAAACAAGg(c)aagtgatactttccttacctgaaatgactgtgttttatacaattgatatttatctaaaaaggacatgggagtatgttaaaatcctgttcagaaaaacagtgaatttaaaagtgtatatataaagccaggtgtggtggctcatgcctgtaattccagcacttttcgaggctgaggtgggcggatcacttgaggccaggagtttgagaccagcctgggtaataacatggtgaaaccccgt
Example of SpliceScan II predicting effects of mutations • An example of successfully predicted effect of mutation IVS2+2delC causing familial pulmonary arterial hypertension (Cogan JD et al: Am J Respir Crit Care Med 2006) • Another example of SpliceScan II correctly predicting the effect of IVS10-6del34 micro deletion causing gastrointestinal stromal tumors (Chen LL et al:Oncogene 2005 )
Effect of rs849563 (Autism associated SNP) • There is a change in annotated exon potential here: • rs849563 changes the exon sharing one boundary with annotated exon gi|41872561|ref|NM_201266.1| 2433-2577 where the exon score changes 0.60->0.19
Effect of rs885747 (Autism associated SNP) • There is a change in annotated exon potential here: • rs885747 changes the exon sharing one boundary with annotated exon gi|194097340|ref|NM_002616.2| 1627-1735 where the exon score changes 0.30->0.49
SpliceScan II tool • SpliceScan II tool http://splicescan2.lumc.edu/ • Is more sensitive than existing splicing simulators (NetUTR, ExonScan) • Uses novel 5’ GC SS Bayesian sensor • Method allows predicting aberrant splicing events associated with genomic variants • ACGMAP companion database http://www.stritch.luc.edu/node/375