1 / 21

Trace recalling on MGC traces

Trace recalling on MGC traces. Dec 21 2005. Why?. There are almost certainly alternately spliced targets in the MGC set that we would like to find Might be able to get some more hits and confirmed hits using trace recalling because of ambiguity sequence alignment. How?.

robyn
Download Presentation

Trace recalling on MGC traces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trace recalling on MGC traces Dec 21 2005

  2. Why? • There are almost certainly alternately spliced targets in the MGC set that we would like to find • Might be able to get some more hits and confirmed hits using trace recalling because of ambiguity sequence alignment

  3. How? • Pipeline begins with .blat files generated by Mike • Result of BLATing each MGC trace (or the assembled fwd/rev reads) to the human genome • Represent a set of loci from which trace sequence could have originated

  4. How? • Extract BLAT aligned sequence + 1000 bp flanking sequence from human genome • Run trace recalling between each trace and the corresponding extracted genomic loci • Adjust scores of first alignment (ambig sequence to genome) by adding back scores from intron penalties • This lessens bias from processed pseudogenes

  5. How? • Select “correct locus” as the locus that aligns with the highest adjusted score • For the rest of the analysis this is the only locus that is considered • Apply hit criteria to each first align file to the correct locus • Spliced alignment (at least 1 intron) • > 60% of splice sites have at least 8 matches in a 10 bp window around the splice site • Overall percent identity > 75%

  6. How? • Last step classifies each trace as a hit or a non-hit • Lift coordinates of alignment to extracted genomic fragment back to genomic coordinates • Hit becomes confirmed if there is at least a 1 bp overlap to the targeted predicted gene

  7. How? • As part of trace recalling each read/genomic fragment is flagged if an alternate splice is observed • Compare alignment of ambiguity sequence and alignment of recalled sequence to determine if there is an alternate splice • example

  8. How? • Analysis splits at this point between: • Comparing hit, confirmed hit, non-hit status of reads to original pipeline • Trying to find alternate splices in the whole set of traces

  9. Results • Comparison of hit, confirmed hit, non-hit status* • 120 experiments went from non-hit to confirmed hit • 37 experiments went from non confirmed hit to confirmed hits * this part isn’t quite done and some of the non-hit  confirmed hit cases look a little funny

  10. Results • Finding alternate splices • Trace recalling identifies 622 alternate splicing events in the MGC set • Retained intron: 148 • Alt 3’ ss: 40 • Alt 5’ ss: 36 • Alt splice both sides: 56 • Alternate exon: 103 • Clean alternate exon: 189 • Mutex exon: 26 • Clean mutex exon: 23

  11. Results • Finding alternate splices • Trace recalling identifies 622 alternate splicing events in the MGC set • Retained intron: 148 • Alt 3’ ss: 40 • Alt 5’ ss: 36 • Alt splice both sides: 56 • Alternate exon: 103 • Clean alternate exon: 189 • Mutex exon: 26 • Clean mutex exon: 23 • The projector in Bryan 509 working: priceless

  12. Results • 288 of these are what I consider the “hard” altsplices to get (clean alt exon, clean mutex exon, individual 3’ or 5’ splice sites) • Wanted to validate these predictions somehow • Would normally go back to known gene but if there was a known gene it wouldn’t be an MGC target!

  13. Results • Look at cases where the same type of altsplice is observed on both reads • There were a total of 72 experiments in which the same altsplice is observed on both reads (high confidence altsplices) • Example

  14. Exact same alternate exon represented in both reads

  15. Results • Breakdown of validated altsplices by type • Clean alternate exon: 39 • Alternate exon: 6 • Retained intron: 16 • Alt 5’ ss: 3 • Alt 3’ ss: 6 • Clean mutex exon: 2

  16. Results • Flagged altsplices which were not validated (low confidence altsplices) could be: • Mistakes • Reads didn’t overlap • Didn’t see both sides of an alternate splice • One good read and one read that totally failed • Might be slightly different types (eg a clean alternate exon and an alternate exon) • Examples

  17. A slight misalignment causes one read to be flagged as an “alternate exon” and the other to be flagged as a “clean alternate exon”… the black one is probably right

  18. Recalled sequence picks up where you would expect if the single trace part were corrupted by noise 12 bp alternate splice site

  19. Results • Looked at 40 examples of low confidence hits • 26 of them looked like they fell into one of the last 3 categories from before • 14 looked like actual miscalled alternate splices

  20. To do • Modification to trace recalling which might clean up the alignments a bit more • Define something like the hit criteria for MGC alignments to take into account the number of matches in the trace recalling alignments (look at old E-value stuff)

More Related