240 likes | 255 Views
This research project aims to identify alternative splicing regulatory elements, characterize alternatively spliced genes, and correlate polymorphic regulatory elements with phenotype. The study explores the complexity of the proteome, mechanisms of alternative splicing, and combinatorial control of enhancers and silencers. By mapping regulatory elements, identifying splicing motifs, and correlating with single nucleotide polymorphisms (SNPs), potential therapeutic implications for human diseases like SMA are proposed.
E N D
Identification of alternative splicing regulatory elements, characterization of alternatively spliced genes, and correlation of the role of polymorphic regulatory elements with phenotype Matt Mailman October 31, 2002
Background • Mechanisms for increased proteome complexity • Increase in gene number - not consistent • Multi-functional proteins • Somatic cell recombination (ie: T-cell receptors, Ig) • Multiple transcription start sites • pre-mRNA editing • Post-translational protein modification • Alternative poly-adenylation • Alternative splicing - most important • 35-59% of human genes are alternatively spliced • Alternative splicing causes ~15% of human disease
protein-protein interaction SR protein-RNA interaction ESE Consensussplice sites and effect of exonic splicing enhancers (ESEs) TIBS 25:381
Exon-recognition and the spliceosome 5’ splice site branch point 3’ splice site poly-pyrimidine tract Nature 418:236
RESCUE technique (Burge group) to identify ESEs • 4096 possible hexanucleotides • Mapped Refseq cDNAs to intron/exon boundaries • X-axis = intron vs exon • Y-axis = weak vs strong consensus splice signals • Target candidates = 1st quadrant
Relative contribution of consensus splice sequences enhancers & silencers
Alternative splicing vs. Constitutive splicing • Constitutively spliced exons • Have strong consensus sequences • Rely less on enhancer/silencer elements • Alternatively spliced exons • Weaker consensus sequences • Rely more on enhancer/silencer elements • Combinatorial control of elements • Multiple enhancer/silencer elements • Protein milieu determines tissue-specific / developmental stage-specific use of these elements
S1 1 1 1 1 E3 E1 E2 E1 E2 E3 Example of tissue-specific splicing regulation Pancreas 2 2 3 Brain 2 2 3
Proposed Plan • Select all genes with multiple assemblies • Determine which exons in which genes are alternatively spliced • Capture 200 bp 5’ and 3’ of the intron/exon junction • Apply method to identify enhancer/silencer motifs • HMM • Neural networks • RESCUE
Proposed Plan (continued) • Map regulatory elements onto genomic sequence as NAFeatures • Develop comprehensive model of splicing • How do different enhancers and silencers interact at splice sites (combinatorial control) • Are certain combinations of elements common in splicing of genes expressed in certain tissues • Map published SNPs as NAFeatures • Determine where SNPs co-map to splicing elements • Discover cryptic splicing elements • Correlate with human disease/resistance mutations • Suggests therapy
Which exons are alternatively spliced? • DoTS assemblies – alternatively spliced genes yield multiple assemblies
How to display alternative splice products? • Assemblies • Splicing graphs (Pevzner group)
How to choose alternatively spliced exons? • Cluster all exons in the genome that are never spliced or are alternatively spliced • Problem: incomplete set of data • Solution: choose exons where we have a certain number of ESTs or mRNAs included in the assembly (suggestions?). Also, representation from a threshold number of tissues may be required (suggestions?).
Prediction of regulatory motifs • Method • HMM • Neural Networks • suggestions? • Large training set = all exons of all genes that meet criteria: • Alternatively or consitutively spliced • Sufficient coverage of sequences / tissues
Map splicing regulatory elements to genomic sequence • Multiple motifs will be generated: • Enhancers • exonic • intronic • Silencers • exonic • Intronic • BLAT alignment of sequences to genome
Correlation of motifs with public SNP data • Map human/mouse dbSNPs as NAFeatures • Determine where SNPs co-map with a regulatory feature • Decide if /which polymorphisms increase or decrease the consensus sequence • Correlate to disease / tissue (splice element combination) A
Example of polymorphic elements causing disease Spinal muscular atrophy (SMA) • Neuromuscular disease of varying severity • Most common recessively inherited cause of death in infants • Two genes result from duplication • SMN1 = all introns included • SMN2 = exon 7 spliced out • 1 base change responsible – doesn’t affect coding sequence • Deletion of SMN1 causes disease despite presence of SMN2 (which has same coding sequence)
= SC35 motif score = SF2/ASF motif score SNP abolishes ESE consensus sequences
Short-term Which genes are alternatively spliced? In which tissues are genes spliced into which forms? Do certain tissues have more splice variants in general? What GO functions can be ascribed to genes with many splice variants? Long-term Can we develop a more comprehensive picture of the process of alternative splicing? How are regulatory elements affected by known nucleotide variation? Can we make associations between known disease mutations and up- or down-regulation of splice regulators? Questions to answer