90 likes | 209 Views
UTR Modeling Part III. Sam Gross Randy Brown. Old 5 ' UTR Model. Einit. Inter. Utr5. Prom. Esngl. Prom. ATG. Coding Exon. Inter. Utr5. New 5 ' UTR Model. Enc. Inc. Ep. Einit. Ea. Inter. Prom. Epa. Esngl. Prom. Epa. ATG. Coding Exon. Inter. Prom. Ep. Ea. ATG.
E N D
UTR ModelingPart III Sam Gross Randy Brown
Old 5' UTR Model Einit Inter Utr5 Prom Esngl Prom ATG Coding Exon Inter Utr5
New 5' UTR Model Enc Inc Ep Einit Ea Inter Prom Epa Esngl Prom Epa ATG Coding Exon Inter Prom Ep Ea ATG Coding Exon Inter Inc Prom Ep Enc Ea ATG Coding Exon Inter Inc Inc
New Data From DBTSS 3.0 • Extended RefSeq 5' UTR information by mapping full-length cDNAs from DBTSS using est2genome. 1500 genes in old training/testing set; new set has 6400 genes • Data set now large enough to move to directly-estimated length distributions for Ep, Epa, Ea, and Enc states
Separate UTR “coding” models for Ep/Epa and Ea/Enc • Tried UTR “coding” models of 3rd, 4th, and 5th order, with and without division by isochore. Best model was 4th order with isochore division.
Dual coding model (CpG-related/not CpG-related) for Ep, Epa, and Einit states was not very effective. Still working on modeling CpG islands. • Initial coding exon hexamer distribution depends more on whether the UTR is spliced or unspliced than whether the gene is associated with a CpG island or not • This suggests splitting the Einit state into two states, each with different coding parameters
EinitS/EinitU 5' UTR Model Enc EinitS Inc Ep Ea Esngl Inter Prom Epa EinitU Prom Epa ATG EinitU Inter Prom Ep Ea ATG EinitS Inter Inc Prom Ep Enc Ea ATG EinitS Inter Inc Inc
Future Directions • Test performance of EinitS/EinitU model • Try directly-estimated length distributions for 5' UTR states • Conservation sequence models for 5' UTR states • CpG island model