230 likes | 321 Views
RNA Assembly Using extending method. Wei Xueliang 2010-04-07. Overview. Why abandon deBruijn . Why abandon Extended deBruijn . Introduction to current method. Handle the old problem. The new problem. Tod o. Why abandon deBruijn . De Bruijn Graph’s ( dis )advantage: Very Fast.
E N D
RNA Assembly Using extending method. Wei Xueliang 2010-04-07
Overview • Why abandon deBruijn. • Why abandon Extended deBruijn. • Introduction to current method. • Handle the old problem. • The new problem. • Todo
Why abandon deBruijn. • De Bruijn Graph’s (dis)advantage: • Very Fast. • Coverage distribution and K-Value affect a lot • Key : the coverage is not uniform distributed in the RNA assembly. • No best K value.
Why abandon deBruijn. • The length of the red part is 27.
Why abandon deBruijn. • Key : The coverage is not uniform distributed in the RNA assembly. • No best K value. • Can we using different K to run the program many times? • This is not De Novo Assembly’s job. • Time. • Provide high accurate contigs with-in limited time. • Scaffolding programs.
Why abandon Extended deBruijn. • My Extended de Bruijnmethod: • Using two or more K value at the same time.
Why abandon Extended deBruijn. • The change rate of coverage is above my expectation. Need many K. • The convert between different K are difficult. • Memory problem for big K. When K > 32, each K-index need > 50G (with Data-Sets: 10G) • Throw the K away.
Introduction to the new method • From Pramila’s genome assembly method. • Start from any Tag and do a correction. • If successfully corrected, continue.
Introduction to the new method • Find all the tag which have at least 24 bps overlaps. (Magic number) • Using these overlapping tags to extend Base and continue add more tags.
Introduction to the new method • How to find the overlapping tags fast and with mis-match? • Index and Union: {Tag3}, {Tag2, Tag3}, {Tag3, Tag4} Union =>{Tag1, Tag2, Tag3, Tag4}
Introduction to the new method • How to find the next overlapping tags fast and with mis-match? • V1 <= U3 • V2 <= (U1 << 1) + 0 • V3 <= (U2 << 1) + 0
Handle the old problem. • When the length of overlapping part < 24?
Handle the old problem. • Check the tags one by one by descending order of the length of overlap.
Handle the old problem. • Degree of approximation.
Handle the old problem. • Less tips. • Do not have bubbles. • Because we doing overlap with mis-match. • Use whole tags
The new problem. • Speed. • The tail of the tag often have more errors. • Reverse ExtendingProblem.
Todo • Handle Reverse ExtendingProblem. • Speed • Finish the comparision between deBruijn method(velvet) and my method. • Paired End Tag.