160 likes | 289 Views
Regular Meeting December 22, 2008. Mark Borodovsky Ivan Antonov. Topics. What have been done FSMark HMM implementation Answers to the previous meeting questions Future work. What have been done. HMM implementation in FSMark has been changed
E N D
Regular MeetingDecember 22, 2008 Mark Borodovsky Ivan Antonov
Topics • What have been done • FSMark HMM implementation • Answers to the previous meeting questions • Future work GATech
What have been done • HMM implementation in FSMark has been changed • Some questions from the previous meeting have been answered GATech
Current HMM implementation • Currently for a given position i we look backward on 2 nucleotides instead of looking forward • FSMark starts examining sequence from the 3rd position only (i=2), so we have complete emission string (there are strange results if we start with 1st position) • Since FSMark starts with i=2 gene without frame shift will have state 2 GATech
FSMark prediction depends on FS letter • A test has been done for a sample gene inserting different letters in the middle of the gene. FSMark-GM hmm_def file was used. GATech
Control GeneMark Genome without frame shifts 417 overlaps FSMark-GM 118 frame shifts GATech
Experiment 171 overlaps caused by frame shift GeneMark Genome with frame shifts in 400 genes 599 overlaps FSMark-GM 325 frame shifts GATech
Questions to answer • Take a look at the distribution of overlap lengths in GeneMark output • Understand why GeneMark predicts gene overlap for less than 50% of genes with Frame Shifts. There are two possible reasons: • Missing short part, i.e. GeneMark predicts one gene only • GeneMark predicts two genes but they don’t overlap • Try to understand why did we get more False Positive in experiment than in control GATech
GeneMark analysis • Why does GeneMark barely predict overlaps for genes with frame shift? • In my GeneMark output there are 357 typical genes (out of 400). • Probably I use wrong GeneMark option? GATech
GeneMark output statistics Genome with frame shifts in 400 genes 599 gene overlaps 4,388 genes 171 overlaps caused by fs 22 genes with fs are missing 335 genes with fs fs in 164 genes didn’t cause overlap 163 decreased their lengths 4 fs caused new gene downstream the initial gene GATech
Conclusions • I need to check how to run GeneMark in order to get the same 400 typical genes • It seems that the small chunk in the shifted frame is not enough for GeneMark to predict a new gene GATech
Time Table GATech