1 / 47

Regulation of Alternative Splicing

Oral Preliminary Exam (May 7, 2007). Regulation of Alternative Splicing. Jihye Kim. Outline. Alternative Splicing Overview Goal : Investigate “regulation” of AS Method : Association Rule Mining Part I : Finding association rules of cis -regulatory elements involved in alternative splicing

niran
Download Presentation

Regulation of Alternative Splicing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oral Preliminary Exam (May 7, 2007) Regulation of Alternative Splicing Jihye Kim

  2. Outline • Alternative Splicing Overview • Goal : Investigate “regulation” of AS • Method : Association Rule Mining • Part I : Finding association rules of cis-regulatory elements involved in alternative splicing • Part II : Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing • Summary • Future Work

  3. Splicing • Introns are removed and flanking exons are concatenated • Spliceosome - snRNPs and other proteins [image from http://fig.cox.miami.edu/~cmallery/150/gene/c7.17.11.spliceosome.jpg]

  4. Splice Sites • Recognized by spliceosome • Splice sites are too weak to predict intron location accurately 5’ 3’ [image from http://web-books.com/MoBio/Free/Ch5A4.htm]

  5. Assist spliceosome to identify splice sites Splicing factors SR (serine/arginine-rich) proteins Exonic and intronic enhancers and silencers (cis-acting) ESE (A/G rich motifs), ESS (hnRNP), ISE (G triples, UGCAUG), ISS Splicing Factors and Binding Sites Exon Exon 2 [Source from Katherina Kechris in Rocky’05 Conference]

  6. Pre-mRNA mRNA protein Alternative Splicing • Over 70% in human genome • Major mechanism to generate protein diversity • Highly relevant to disease • 15% disease-causing mutations affect splicing [Krawczak 1992] [Krawczak 1992] Krawczak, M., Reiss, J., and Cooper, D.N. 1992 Hum. Genet. 90: 41-54

  7. Types of Alternative Splicing Cassette Exon [Source from Cartegni et al. 2002]

  8. Investigating Alternative Splicing • Traditionally, align ESTs and mRNAs to genomic sequences • Recently, microarray technology (Splice arrays) • Exon skipping is measured • Hard to measure other types of AS

  9. Previous Work on AS Regulation • Most methods • use only sequence data • focus on the effect of individual motifs • Brain-specific exon skipping [Brudno 2001] • 25 brain-specific cassette exons from literature • Over-representation of UGCAUG in downstream intron • RESCUE-ESE [Fairbrother 2002] • Frequent hexamers in exon by weak splice sites • 10 ESE motifs show enhancer activity in experiment [Brudno 2001] Brudno M., Gelfand M.S., et al., 2001 NAR 20 (11) 2338-21348 [Fairbrother 2002] Fairbrother WG., et al., 2002 Science 9;297(5583):1007-13

  10. What We Have Done So Far • Investigate cis-regulatory motifs that influence amount of AS or tissue-specific AS [Jihye Kim, Sihui Zhao, Steffen Heber, “Finding association rules of cis-regulatory elements involved in alternative splicing”, Proceedings of the 45th annual southeast regional conference (ACM-SE) pp. 232 – 237] [Jihye Kim, Sihui Zhao, Steffen Heber, “Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing”,7th workshop on Algorithms in Bioinformatics (WABI 2007) (submitted) • Use mouse splice array data • Apply Association Rule Mining • Investigate motif combination involved in tissue-specific AS

  11. Dataset Splice Array [Pan 2004] with 6 probes 3126 exon skipping genes in mouse %ASex : percentage of exon skipping in 10 tissues Aim I-I : representing data context AS Datasets in Mouse [Pan 2004] Pan, Q., et al., 2004 Mol Cell 16(6):929-942

  12. Association Rule Mining • By Agrawal et al. in 1993 • Initially used for Market Basket Analysis • An association rule is a pattern that states when X occurs, Y occurs with certain probability • X : antecedent (left-hand-side, lhs), Y : consequent (right-hand-side, rhs) • Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) XY

  13. Rule Strength Measures • Given a rule, • Support = Pr(X∧Y) • Confidence = Pr(Y | X) • Lift = Pr(X∧Y)/ Pr(X)Pr(Y) • Dependency of lhs and rhs • Generally, lhs and rhs have positive dependency if lift >1.0 XY

  14. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam

  15. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemset = itemset whose support > 0.5

  16. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets (support)

  17. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets (support) Bread (2/5 < 0.5)

  18. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets (support) Beer (0.8) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6)

  19. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets Association Rules (confidence) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6)

  20. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets Association Rules (confidence) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6) Beer => Jam (2/4 < 0.7)

  21. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets Association Rules (confidence) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6) Beer => Diaper (0.75)

  22. Apriori Algorithm • Most popular algorithm • Two steps: • Find all itemsets that satisify min_supp. (frequent itemsets) • any subset of a frequent itemset is also frequent • Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on. • Generate Rules • A  B is an association rule if Confidence(A  B) ≥ min_conf

  23. Part I : Finding association rules of cis-regulatory elements involved in alternative splicing[Proceedings of the 45th annual southeast regional conference (ACM-SE) Winston-Salem, North Carolina pp. 232 – 237]

  24. Aim I-I : representing data context K-mers Around Cassette Exon (items) • Pre-mRNA sequences • Transcripts from NCBI • BLAT to align transcripts to mouse genome • 200 bps from 7 regions around cassette exon • 2565 genes in total • Items (6mers) : AAAAAA to TTTTTT in region 1 … 7

  25. ARM in Finding AS Motif Rule • Items : all possible hexamers (motifs) • Transactions : 2565 AS genes • Goal : finding motif association rules in AS genes. (e.g., AGGATA TTAGCT) • By Apriori algorithm [Agrawal 1993] Find All Frequent Hexamers Generate Hexamer Rules [Agrawal 1993] Agrawal R., Imielinski T., Swami AN., 1993 SIGMOD 22(2):207-216

  26. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7

  27. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 - Frequent 3-mer sets (support) AGG (0.8),

  28. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 - Frequent 3mers sets (support) AGG (0.8), GAT (0.6), TAG (0.6), {AGG,TAG} (0.6)

  29. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 - Frequent 3mers sets (support) AGG (0.8), GAT (0.6), TAG (0.6), {AGG,TAG} (0.6) - Rules (confidence) AGGGAT conf = 2 / 4 = 0.5 < minconf

  30. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 - Frequent 3mers sets (support) AGG (0.8), GAT (0.6), TAG (0.6), {AGG,TAG} (0.6) - Rules (confidence) AGG  TAG (0.75) TAG  AGG (1.0)

  31. - 7_TGAAGA, 7_GAAGAA (ASF/SF2, SRp55) - 6_TTTTCT, 6_AATAAA, … - Among 6,000 6-mers, 1/3 are in AEDB - Candidates of regulatory motifs Aim I-II : finding motif association rules for all AS genes 1 7 4 6 5 2 3 Motif Association Rules from AS Genes Frequent 6-mers Minsup = 0.05 (129 genes) Association Rules Minconf = 0.4 - 7_AAAAAT  7_TGAAGA, 7_AAAGGA  7_AGAAGA, - 7_GAAAAA  7_AAGAAG, 7_CTGCCT  7_CTGGAG, - 7_AGGAAA  7_AAGAAG, 7_AATAAA  7_AAGAAG - Candidates of regulatory combinations for AS

  32. Aim I-III : finding motif association rules for cluster Clustering by AS Pattern in 10 Tissues • Hypothesize : Motif combinations “cause” AS profile • Cluster genes based on AS profile. We use • Euclidean distance / Correlation • Average linkage clustering • Frequent 6-mers in cluster are motif candidates

  33. Aim I-III : finding motif association rules for cluster 1 7 4 6 5 2 3 Association Rules from Clusters • Lift (XY) > 2.0 • Comparison with outside the cluster (p-value < 2.13e-10) • Association rules are candidates of motif combinations for the corresponding AS pattern Correlation based clusters

  34. Part II : Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing[7th workshop on Algorithms in Bioinformatics (WABI 2007) (submitted)]

  35. Aim II-I : finding motif association rules for tissue-specific AS Finding Motifs Involved in Tissue-Specific AS • Items : • hexamers in gene regions and • exon skipping rate in tissues • Transactions : • 2565 genes from Pan’s data set • Goal : find associations AGGATA in cassette exon  High exon skipping in Brain • We focus on complex rules, e.g. {AGGATA in cassette exon, CCTGCG in downstream intron}  High exon skipping in Brain

  36. AS profile items • Use quartile to convert numeric %ASexes to character AS profile items • BrainLow :The first %ASex quartile in Brain • BrainHigh : The last %ASex quartile in Brain BrainLow BrainHigh

  37. Motif Combination ARM Example [Sequence] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 [AS profile] BH, HH BH, HL BH, HH BL, HH BH, HL + BH : BrianHigh BL : BrainLow HH : HeartHigh HL : HeartLow

  38. Motif Combination ARM Example

  39. Aim II-I : finding motif association rules for tissue-specific AS Tissue-Specific AS Motif Combinations • With strict thresholds • Min_supp = 0.01, Min_conf = 0.5, Min_lift = 1.2 • MinLen of lhs = 2 (for complex rule) • Rule appearance • lhs : hexamers, rhs : AS profile items • 197 association rules are found in total • 27 complex rules are found • lhs : combinations of 34 frequent hexamers rhs : AS profile items in tissues • All rules have >1.9 lift • 23 rules show motif combinations in different regions

  40. Aim II-I : finding motif association rules for tissue-specific AS 1 7 4 6 5 2 3 {5_TTTTTA, 7_AGAGGA} => {HeartHigh}

  41. Aim II- II : analyzing motif combination 1 7 4 6 5 2 3 AS Profile of Motif Combinations

  42. Summary of Graphs • In some cases, genes with one motif do not show any different AS profile from all AS genes • However, often, genes containing all multiple motifs show significantly changed exon skipping levels • Combination of cis-regulatory motifs can influence AS profile in tissues

  43. Comparison with AEDB • AEDB in EBI • Transcript regulatory sequences from literature • 292 enhancers and silencers • >60% extracted frequent hexamers are part of AEDB motifs • >97% of hexamers involved in complex rules are part of AEDB motifs

  44. Summary • Association rule mining (ARM) applied • Finding motif association rules for AS • Finding motif association rules for AS clusters • Finding motif combinations for tissue-specific AS

  45. Future Work Improve method • Improve motif representation, e.g. • variable motif length, gapped k-mers • results from motif finding tools • Improve AS profile representation • Add more features, e.g. • position and distance between motifs • splice site • exon / intron length • conservation, gene information • Statistical analysis • Thresholds • Multiple testing

  46. Future Work • Systematic analysis of simple & complex motifs • Other data sources • Human splice array [Johnson 2003] • ESTs • Investigate discovered motifs • Apply motif discovery tools • Analyze genome occurrence • Analyze gene and protein structure • Build predictive model and apply it (If I have enough time  ) • Experimental verification [Johnson 2003] Science. 2003 Dec 19;302(5653):2141-4

  47. Acknowledgements • Dr. Steffen Heber • Dr. Eric A. Stone • Dr. Zhao-Bang Zeng • Dr. Barbara Sherry • Sihui Zhao • Li Zhang • Hyunmin Kim THANK YOU

More Related