320 likes | 400 Views
PREditor P redictive R NA Editor for Plant Mitochondrial Genes. Jeff Mower. What is RNA Editing? A process that alters the RNA sequence Nt insertion, deletion, or conversion Does not include RNA maturation processes. RNA Editing in Plants Occurs in mitochondria and chloroplasts
E N D
PREditorPredictive RNA Editor for Plant Mitochondrial Genes Jeff Mower
What is RNA Editing? • A process that alters the RNA sequence • Nt insertion, deletion, or conversion • Does not include RNA maturation processes
RNA Editing in Plants • Occurs in mitochondria and chloroplasts • C to U and U to C conversions • Mechanism is not known
RNA Editing in Plants • In seed plants (conifers, flowering plants, etc.) • Widespread in mitochondrion • Rare in chloroplast • Predominantly C to U • In non-seed plants (mosses, ferns, etc.) • Frequent in mitochondrion and chloroplast • Both C to U and U to C are common
AUG CGU AAU UCU GUA GGC UGC CAA UGA AGACGGUC UGGC M R N S V G C Q *
AUG CGU AAU UCU GUA GGC UGC CAA UGA AGACGGUC UGGC M R N S V G C Q * AUG GUC AUG CGU AAU UCU GUA GGC UGC CAA UGA AG UGGC M V M R N S V G C Q * Creation of new start codon
AUG CGU AAU UCU GUA GGC UGC CAA UGA AGACGGUC UGGC M R N S V G C Q * AUG GUC AUG UGU AAU UUU GUA GGC UGC CAA UGA AG UGGC M V M C N F V G C Q * Creation of new start codon Alteration of protein sequence
AUG CGU AAU UCU GUA GGC UGC CAA UGA AGACGGUC UGGC M R N S V G C Q * AUG GUC AUG UGU AAU UUU GUA GGU UGC CAA UGA AG UGGC M V M C N F V G C Q * Creation of new start codon Alteration of protein sequence No effect on protein sequence
AUG CGU AAU UCU GUA GGC UGC CAA UGA AGACGGUC UGGC M R N S V G C Q * AUG GUC AUG UGU AAU UUU GUA GGU UGC UAA AG UGAUGGC M V M C N F V G C * Creation of new start codon Creation of new stop codon Alteration of protein sequence No effect on protein sequence
Identifying Edit Sites • Determine experimentally • Need to isolate and reverse transcribe RNA • Need multiple reads (editing is not always complete)
Identifying Edit Sites • Determine experimentally • Need to isolate and reverse transcribe RNA • Need multiple reads (editing is not always complete) • Predict based on sequence context • Upstream and downstream regions are important • Unambiguous motifs have not been identified
Identifying Edit Sites • Determine experimentally • Need to isolate and reverse transcribe RNA • Need multiple reads (editing is not always complete) • Predict based on sequence context • Upstream and downstream regions are important • Unambiguous motifs have not been identified • Predict based on protein conservation • Proteins are more conserved after editing • Editing tends to “correct” amino acid sequences
PREditor Methodology • Aligned sequence database (ASD) Construction • 363 DNA sequences • RNA editing information is known • Organisms do not perform RNA editing • Proteins were translated using the editing information • Homologous proteins were aligned • 42 different alignments • All known mt proteins are covered
PREditor Methodology • Input Sequence Manipulation • Accept a protein-coding DNA sequence as input • Translate input sequence • Align translation to homologous proteins in ASD
PREditor Methodology • The Underlying Principle • ASD sequences translated from edited RNA • Input sequence translated from unedited DNA • “Where can RNA editing in the input sequence increase conservation to the ASD sequences?”
Performance Analysis • Remove one protein sequence from the database • Use the unedited DNA sequence as input • Calculate statistics • Accuracy = (TP + TN) / (TP + FP + TN + FN) • Sensitivity = TP / (TP + FN) • Specificity = TN / (TN + FP) • Repeat for each sequence
Performance Analysis • Total # C’s = 58,982 • True edited sites = 3,548 (6.0%) • TP = 2,922 • FN = 626 • True non-edited sites = 55,434 • TN = 54,829 • FP = 605
Performance Analysis • Sensitivity = 82.4% • Proportion of true edited sites that were predicted correctly • Increases to 94.6% if you ignore missed silent edited sites • Specificity = 98.9% • Proportion of true non-edited sites that were predicted correctly • Accuracy = 97.9% • Proportion of all sites that were predicted correctly • Increases to 98.7% if you ignore missed silent edited sites
Limitations • Cannot predict editing at silent sites • 458 of 626 FN are at silent sites
Limitations • Cannot predict editing at silent sites • 458 of 626 FN are at silent sites • Not a major problem in practice • Silent editing sites do not affect the protein sequence • Many silent sites are only occasionally edited • Only 13% of editing sites are silent (expect ~38%)
Limitations • Aligned sequence database is skewed Angiosperms 266 (74%) Gymnosperms 2 (1%) Origin of RNA editing Ferns 0 Horsetails 0 Hornworts 0 Mosses 0 Liverworts 32 (9%) Charophytes 59 (16%) Chlorophytes―
Limitations • The skewed database effect ASD1 Z ASD2 Z ASD3 Z ASD4 Z ASD4 X Input X or Z?
Ongoing Work • Reducing the skewed database effect • Weighted sequences and phylogenetics ASD1 Z ASD2 Z ASD3 Z ASD4 Z ASD4 X Input X!
Ongoing Work • Making the online resource more appealing • Making the online resource more user-friendly
Future Directions • Increase diversity in the ASD • Analyze sequence context using the ASD • Apply methodology to editing in chloroplasts • Apply methodology to U to C editing
Thanks • Jeff Palmer • Sun Kim • Danny Rice and other members of the Palmer lab