1 / 24

A computational screen detected thousands of new A-to-I RNA hyper-editing sites

A computational screen detected thousands of new A-to-I RNA hyper-editing sites. Shai Carmi. Lab meeting, January 2011. Together with: Itamar Borokhov ( Compugen ), Erez Levanon Thanks to: Gilad Finkelstein, Khen Khermesh , Nurit Paz- Yaacov. A-to-I RNA editing.

thora
Download Presentation

A computational screen detected thousands of new A-to-I RNA hyper-editing sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A computational screen detected thousands of new A-to-I RNA hyper-editing sites Shai Carmi Lab meeting, January 2011 Together with: ItamarBorokhov (Compugen), ErezLevanon Thanks to: Gilad Finkelstein, KhenKhermesh, Nurit Paz-Yaacov

  2. A-to-I RNA editing • RNA editing is a post-transcriptional change in the pre-RNA. • Alters the RNA sequence encoded by the DNA in a single-nucleotide, site-specific manner. • Adenosine is converted to Inosine. • Inosine is read as Guanosine during translation and sequencing. • The ADAR (Adenosine Deaminase that Act on RNA) dsRNA binding protein family catalyzes A-to-I editing. • Embryonically lethal, related to brain diseases. Gommans, Mullen & Maas. Bioessays 31, 1137 (2009)

  3. Hyper-editing • Editing occurs mostly near Aluinverted repeats. • Many editing sites in each repeat. • Hyper-editing also observed. • Only one known target (biochemical screen) (Morse, Aruscavage & Bass. PNAS 99, 7906 (2002)). Nishikura. Annu. Rev. Biochem. 79, 321 (2010) Global effect: “Inosine-containing dsRNA binds a stress-granule-like complex and downregulates gene expression in trans” (Scadden, Mol. Cell 28, 491 (2007)). “Double-stranded RNAs containing multiple IU pairs are sufficient to suppress interferon induction and apoptosis” (Vitali & Scadden. Nat. Struct. Mol. Biol 17, 1043 (2010)).

  4. Detecting hyper-editing Typical editing is detected by aligning the RNA to the genome and searching for A→G mismatches “Too edited” RNAs will not align to the genome at all! How to detect such editing events?

  5. An algorithm • Collect all sequences that were rejected from alignment to the genome (by UCSC genome browser). • Transform the sequences— change every “A” to “G” in the human genome and in the RNA sequences. • Re-align to the genome.

  6. How hard is it? • There are about 500k ESTs not aligning to the genome, of length about 500bp each. The genome is 3Gbp. • The 3-letters genome has lower complexity. • Need to perform all 4 different strand combinations before the 3-letters transformation. (DNA[+/-] x RNA[+/-]). • Need to check also A→C, G→C, A→T for control. • Two years on a new laptop. Solution: use cloud and parallelize.

  7. Overview of the procedure • EST downloading and filtering • EST and genome preprocessing • Blast • Original sequence reconstruction • Examination of mismatches • (Filtering results) • Properties of hyper-edited RNAs Computing times 2-3 weeks on local server 1 day 2-3 months on Amazon cloud and partly on local server;~2000 paid computer hours; $500. 2-3 days 2-3 hours Minutes for each operation

  8. Results • Final editing criteria:>12 A-to-G, >6% of sites edited, >90% of mismataches are A-to-G. • Number of clusters of each type. Note that: A-to-G represents also T-to-C. G-to-A represents also C-to-T. A-to-C represents also T-to-G. … AT run not finished yet. • Quality scores not available for these ESTs to explain the non A-to-G clusters. • We have some understanding of G-to-A clusters (APOBEC, sequencing errors).

  9. Examples chr16:23457836-23458381, DA103871 (-+) Query 1 AGATATTTTTAGGCTTGGCATTGTGGATCACACTTGTAATCCCAGCATTTTGGGAGGCCT 60 Sbjct 1 .............................G.G.....G...................... 60 Query 61 AGCCAGGCAGGTCCCTTGAGCCCAGGAGTTTGAGACCAGCCTGGGCAACATGGTGAAACT 120 Sbjct 61 G.................G......................................... 120 Query 121 GTCTCTGCAAAATATATAAAAATTATTCAGTCCTGGTGGTGTGTGCCTGTAGTCCCACCT 180 Sbjct 121 .........G........................................G......... 180 Query 181 ACTTGAGAGGCTGAGGTGGGAGGATCACCTGAGACCAGGAGGTTGAGGTTGCAGTGAGCT 240 Sbjct 181 G....G....................G.............................G... 240 Query 241 GTGATTTCACCACTGCACTCCAGTCTGGGCAACCGAGTGAGACCCTGTCTCAAAAATAAT 300 Sbjct 241 ........G..G...................G............................ 300 Query 301 TTTAAAATAGGCCGGGCCTGGTGGCTCATGCCTGTATTCCCAGCACTTTGGGAGCCCAAG 360 Sbjct 301 .........................................G...............G.. 360 Query 361 GCGGGTGGATCACCTGAGGTCAGGGGTTCAAGACCAGCCTGGCCAACATGGTGAAACCCC 420 Sbjct 361 .....................G...................................... 420 Query 421 GTCTCTACTGAAAATACAAAAAATTAGCCAGGCGGGTGGCGGGCGCCTATAAAACCAGCT 480 Sbjct 421 ................................................G.GG....G... 480 Query 481 ACTCAGGAGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCC 540 Sbjct 481 G...G..G.....G...G........G.....G..........G........G....... 540 Query 541 GAGATT 546 Sbjct 541 ...... 546

  10. Examples chr15:68482051-68482545, DA105809 (++) Query 1 ATGTGTATTCCACACACAAATGGCTGAGTTATAGTCATAAAACAATTTGCAATAAAAAAA 60 Sbjct 1 ............................................................ 60 Query 61 AAACCAAAACAGATTGTCAGTTAACCAGGAAACAGTTAATGTTTTTTAATGAATCTGGCA 120 Sbjct 61 .....................................G..............G....... 120 Query 121 TTATAGTGAGCAAATGTCGTATTAATTTAGGCTAATTTCTAATAC-TACCATAATTTGTG 179 Sbjct 121 ..G.G......G................G.....G.....G....N.G....GG...... 180 Query 180 TCTAAATTTCTGTTGGGGTAGAAATTACTAAAATTGTGGGGAGTTTTTTCTGATTTTTAC 239 Sbjct 181 .......................G..G..GG.....................G.....G. 240 Query 240 ATTGCTTTAGGAAACATTTTTACTAATTCAGCTGTCTTAGGTAAAATGAATAGTTTTCTT 299 Sbjct 241 ........G...GG...........G............G...GG.G...G.G........ 300 Query 300 CCTGTTTTTTTATGTGTCATTGTTAGTGGTCTCAGAATTCTGATCAGTAACTTTGTGTAT 359 Sbjct 301 ...........G............G........G..G...........GG........G. 360 Query 360 GATGCTGAATTACAAACCGTTTGAATGATCCAGTTGAAAACGTATCCCTCTACTTTCTTC 419 Sbjct 361 ...........G..G............G........GG.G...G.......G........ 420 Query 420 AGTTGTAGAAAAGGTTAATTTCCCTCAGTGTCCCACATTATACCAACCTAAGAGAAGAAC 479 Sbjct 421 ......G.........G.........G............G..........G......... 480 Query 480 AGGTAATAGGGAGAA 494 Sbjct 481 ....GG......... 495 No Alu

  11. Examples chr16:29383242-29383699, BM703103 (++) Query 1 AGAACTAATGAGCACAGAACTAAGAAAGCCCAGGCACAGTGGCTCACATCAGTAATTCTA 60 Sbjct 1 ............................................................ 60  Query 61 GGGCCTTGGGAGGCAAGACAAGAGAATCACTTGAGGCCATGAGTTCAAGGGCAGCCTAGG 120 Sbjct 61 ....................G............G............G.....G....G.. 120  Query 121 CAACATAGTGGGACCCTATCTCCACAAAAATAATAATATTATTATTATTAAATAAAATAA 180 Sbjct 121 ............................................................ 180  Query 181 AAGGAAGAGACAGCCATGAAGATAACTAGCTGAGGCCAGGTACAGTGGCTCATGCCTATA 240 Sbjct 181 ...........................................G.............G.. 240  Query 241 ATCCCAACACTTTGGGAGGTTGAGGTGGACAGATTGCTTGAGGTCAGAAGTTCCAGACCA 300 Sbjct 241 G....G.......................................G.GG.......G..G 300  Query 301 GACTGAACAACATAGCAAAACCCCATCCCTACTAAAAATACAAAAATTAGCTGGGCGTGG 360 Sbjct 301 .............G..GG......G.....G..G...G..........G........... 360  Query 361 TGGCAGGCACCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCACCTGAACCT 420 Sbjct 361 ....G...G.....G.........G...................G.....G.....G... 420  Query 421 GGGAGGCGGAGGATGCAGTGCGCTGAGATCATGCCACT 458 Sbjct 421 ............G...G.............G....... 458

  12. Examples chr12:56344120-56344654, DB160834 (++) Query 1 TTGCTCTGTCACCCAGGCTGGAGTGCAGTGGCGCAATCTCGGCTCACTGCAAGCTCCACC 60 Sbjct 1 ..........G...G...................GG..............GG.....G.. 60  Query 61 TCCTGGGTTCACGCCATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGACTATAGGTGCCC 120 Sbjct 61 ...............G............G.....................G......... 120  Query 121 ACCACCACGCCTGGCTAATTTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTAGC 180 Sbjct 121 ...G......................G.....G..G.....................G.. 180  Query 181 CAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTACTGG 240 Sbjct 181 .G...................G.......G....................GGG..G.... 240  Query 241 GATTACAGGTGTGAGCCACTGATACCTGGCCAATTTTTATATTTGTTGTAGAGATGAGGT 300 Sbjct 241 ....G....................................................... 300  Query 301 TTTGCCATATTGTCCAGGCTGGTCTCAAACTCCTGGTCTCAAGGGATCACCCGCCTCAGC 360 Sbjct 301 ........G................................................G.. 360  Query 361 CTCCCAAAGTGCTGGGACTACAGGAGTGAGCCACTGTGCCTGGCCTTGTTTGTTTGTTTT 420 Sbjct 361 .......G.................................................... 420  Query 421 TTGAGATGGGGTCTCACTATGTTGGCCAGGCTGGTCTCGAACTCCTGGGTTTGAGCAATC 480 Sbjct 421 ..................G......................................... 480  Query 481 CTCCTGCCATGTAGCTGGGATTATAGAGGCTACCATGTCCGTCTAGTTTTAAATT 535 Sbjct 481 ....................................................... 535

  13. Examples chr11:65608125-65608668, DA221841 (++) Query 2 TTAGCCAGGCATGGTGGCAGATGCCTGTAGTCCCAGCTACTCAGGAGGCTGAAGTGGGAG 61 Sbjct 2 ............................G.....G...G...G........GG....... 61  Query 62 GATCCCTTGAGCCTGGGAGTTCAAGGCTGCCATGAGCCAAGATGGCACTACCACACTCCA 121 Sbjct 62 .G.......G............GG.......G..G....G............G....... 121  Query 122 TCCTGGATGACAGAGCAAGACCCTGTCTC-AAAAAAAAAAAAAGAATCTACAAACGATTA 180 Sbjct 122 ................G............A.............................. 181  Query 181 AATTAATAAGTGAGTTCAGCAAGATATTTTAAAAAATTATTAAAATTAACAAGTAAATTT 240 Sbjct 182 ............................................................ 241  Query 241 GTGGGGACCAAGGTAAATATATAAAAATCTATTATGGTTTTTTTTTCTTTCTTTCTTTCT 300 Sbjct 242 ............................................................ 301  Query 301 TTTTTTTCTGAGATGGAGTTTCACTCTTGTCACCCAGGCTGGAGTGCAATGGTGCGATCT 360 Sbjct 302 ................G.....G............G...........GG.......G... 361  Query 361 TGTTTCACCGCCACCTCTGCCTCCGGGTTCAAGGGATTTTCCTGCCTCAGCCTCCTGAGT 420 Sbjct 362 ...............................G................G........... 421  Query 421 AGCTGGGATTACAAGCGCCCCCCACCACACCTGGCTAATTTTTGTATTTTTAGCAGAGAC 480 Sbjct 422 G.........G..G..............G........G.......G.....G..G..... 481  Query 481 GGGGTTTTACCATGTTGACCAGCCTGGTCCTCGAACTCCTGAGCTCAGGTGATCCACCCG 540 Sbjct 482 ...........G......................G...........G........G.... 541  Query 541 CCTC 544 Sbjct 542 .... 545

  14. Examples chr17:73090375-73090779, DB352453 (+-) Query 1 TTCCCTGGAGGTGCTGGGAGCTGGGAAATGTATGCGGCTGTGAATTATTAATATTTTGGA 60 Sbjct 405 ...................N.............N.......................... 346  Query 61 GACCCTCACTAGGGCAGGGAGTGGCTTCAGGATAGGAAAGGGGACGCAAGGAAGACACCA 120 Sbjct 345 ............................................................ 286  Query 121 GGAATGGCCGGGCGCGATGGCTTACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGG 180 Sbjct 285 .............N..G......G.......GG........................... 226  Query 181 TCAGATCACCTGAGGTCGGGAGTTTGAGACCAGCCTGACCAACATGGAGAAACCCTGTCT 240 Sbjct 225 ..G....G....G.......G.....G....G.....G..GG.....G.G.G........ 166  Query 241 CTACTGAAAATACAAAATTAGCTGGGCGTGGTGGCGGGTGCCTGTAATCCCAGCTACTCA 300 Sbjct 165 ...............GG..G.........................G.........G...G 106  Query 301 GGAGGCTGAGGCAGGAGAATCTCTTGAACCCAGGAGGCAGAGGTTGCGGTGAGCTGAGAT 360 Sbjct 105 ........G...G.....G........G...G......G..................... 46  Query 361 GGTGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTGTCTC 405 Sbjct 45 .........................G................... 1

  15. Novelty and more • Total number of edited ESTs: 807. • Total number of edited sites: 16184. • Number of novel sites: 15362 (more than any previous screen). • Number of novel ESTs: 700. • Number of novel hyper-edited ESTs (known<=5): 749. • Number not in Alu: 76. • Number of edited regions supported by multiple ESTs: 74 (169 ESTs). • Number of sites overlapping with a (transition) SNP: 250 (99 non-cDNA). 60% cDNA SNPS vs. only 0.3% expected.

  16. Editing signature Hyper editing Li et al. Science 324, 1210 (2009) ADAR2 motifsabsent in our data

  17. Tissues and health states Top tissues: Tissue #ESTs fraction edited (x10-4) liver 316 12.0038 brain 121 0.907091 lung 39 0.964747 thymus 29 3.19151 prostate 28 0.830392 eye 26 1.0526 muscle 22 1.63777 uncharacterized tissue 22 0.54209 uterus 16 0.572324 kidney 15 0.598217 intestine 15 0.462488 testis 14 0.369041 spleen 12 1.9836 bone 10 1.22326 pancreas 10 0.413158 Top health states: State #ESTs fraction edited (x10-4)normal 599 1.50345 lung tumor 12 0.605776 head and neck tumor 12 0.504392 colorectal tumor 11 0.496144 glioma 9 0.717595 soft tissue/muscle tissue tumor 9 0.618 gastrointestinal tumor 8 0.496873 kidney tumor 7 0.638465 germ cell tumor 7 0.241261 uterine tumor 6 0.508863 chondrosarcoma 5 0.583676 pancreatic tumor 5 0.424452 Human liver regeneration after partial hepatectomy. About 40% of the new sites.

  18. Secondary structure 1 Chr16:68283493-68285759 Are hyper-edited RNAs double-stranded? Consider the genomic sequence 10kbp flanking of the edited EST. Three measures of ``double-strandness’’. The maximal length of dsRNAaccording to RNAFold (2kbp region). The total number of aligned bases when blasting against the reverse complement. The number of (+) and (-) Alus in the region. 2 3 Chr4:373100-375422

  19. Function USP1 MARCKSL1 ILF2 GSK3B TNXB E2F5 AS3MT STARD10 OLR1 LDHB ATF1 LIPC CALML4 ASB16 • 14 hyper edited coding sequences. • 186 UCSC genes. • 120 RefSeq genes. • Functional annotation of UCSC genes: generation of precursor metabolites and energy cellular lipid catabolic process hexose metabolic process secondary metabolic process monosaccharide metabolic process immune response mutagenesis site carbohydrate catabolic process domain:Leucine-zipper zinc finger region:RanBP2-type hdl low-density lipoprotein binding response to xenobiotic stimulus Zinc finger, RanBP2-type high-density lipoprotein particle NAD metabolic process ZnF_RBZ coiled coil immune response lipid transport lysosome calmodulin glucose metabolic processre-entry into mitotic cell cyclelipid metabolism http://david.abcc.ncifcrf.gov/

  20. Evolution • 28 human-specific elements in the hyper-edited regions. • Conservation scores (primate, 2kbp region):

  21. Validation Chr 1 GTTTCCAAGTTTCCCTCTCCCTTCTTTGACTTCTGACAGCTTCCGAAGTGTGCACACAGC 60 RNA 1 ............................................................ 60 Chr 61 CTCTTGTCAGCACTGTTTGGTACCTGCATCTAAAAATGAGATCACAGTCCTTCCGCTCCG 120 RNA 61 ............................................................ 120 Chr 121 CAAACCCTGACAGAGACAGAATACAGAGTGGGCTTGTAGACTTGAAGTATAAAACTTTTG 180 RNA 121 ............................................................ 180 Chr 181 GCCAGTCCTGGTGGCTCACACCTGTAATCCCAGCACTTTGAGAGGCCGAGGTGGGCGGAT 240 X X X RNA 181 .................G.G....................G.......G........... 240 Chr 241 CACCTGAGGTCAGGAGTTCGAGACAAGCCTGGCCAACCTTGTGAAACCCCGTCTCAACTA 300 X X X X X X RNA 241 ......G....G.............G..............................G..G 300 Chr 301 AAAATACAAAAACTAGCCGGGCATGGTGGCATGTGCCTGTAATCCCAGCTACTCAGGAGG 360 XX X X XXXX X X X RNA 301 ..G..G.GG.....G.......G...............................G..... 360 Chr 361 CGGAGGCGTGAGAATCACTTGAACCTGGGAGGTGTAGGTTGCAGTGAGCCAAGATCGCAC 420 X XX X XX X X RNA 361 ..........G..G..G....G.............G..........G...G......... 420 Chr 421 CACTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCCATCTCAAAAAAAAAAGACAGAAA 480 X X XX RNA 421 ............N...........G......G............................ 480 Chr 481 ACCTTTGGAGG 491 RNA 481 ........... 491 6 candidates. 1- not amplified. 2- not sequenced. 2- not edited (beyond known). 1- edited! EST: DA364252(normal brain) Genome:chr2:242643522-242644012G- editing according to the ESTX- experimentally validated editing

  22. Validation AluSq (+)AluSz (-) 385bp usAluY (-) 593bp ds ING5 (a tumor suppressor protein that can interact with TP53) None of the sites is a SNP RNA editing not previously known in this gene.

  23. Validation

  24. Thank you CGACAAGAGTGTACGATGACGTC |||||*||||||*|||||*|||| CGACCGGAGTGTGCGCTGGCGTC

More Related