1 / 20

Reconstruction of DNA sequencing by hybridization

Reconstruction of DNA sequencing by hybridization. Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics, AMSS, CAS. Bioinformatics. Human Genome Project Large molecule data in biology, such as DNA and protein

Download Presentation

Reconstruction of DNA sequencing by hybridization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics, AMSS, CAS

  2. Bioinformatics • Human Genome Project • Large molecule data in biology, such as DNA and protein • Knowledge of mathematics, computer science, information science, physics, system science, management science as well as biology • Genomics • DNA sequencing • Gene prediction • Sequence alignment

  3. DNA Sequencing …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

  4. DNA Sequencing (shotgun) target DNA cut many times at random forward-reverse linked reads known dist ~500 bp ~500 bp

  5. DNA Sequencing (SBH) • DNA array (DNA chip) with 43 probes • Target DNA: AAATGCG

  6. Sequencing by Hybridization • Hybridize target to array containing a spot for each possible k-tuple (k-mer) • The spectrum of a sequence • multi-set of all its k-long substrings (k-tuples) • Goal • reconstruct the sequence from its spectrum • Pevzner (1989): reconstruction is polynomial • But …

  7. Uniqueness of Reconstruction • Different sequences can have the same spectrum: • ACT, CTA, TAC • ACTAC • TACTA • Non-uniqueness Probability

  8. Experiment Errors • Hybridization experiments are error prone • False negative error • k-tuple appears in target DNA but does not appear in its measured spectrum • Repetition of k-tuple • False positive error • k-tuple does not appear in target DNA but does appear in its measured spectrum

  9. Sequencing by Hybridization Target DNA ……TTTTACGC…… ß Spectrum Errors: Positive(misread) / Negative(missing, repetition) TTT TTT TTA TAC ACG CGC Ideal case TTT TTT TTA TAC ACG CGC TGA With errors

  10. SBH Reconstruction Problem • In the case of error-free SBH experiments • A desired solution of SBH is just a feasible solution including all k-tuple in the specturm • For the general case • There is no additional information except spectrum and the length of target DNA • A feasible solution composed of a maximum cardinality subset of the spectrum shall be a reasonable desired solution

  11. SBH Reconstruction Problem • Ideal case (without repetitions and errors) • Equivalent to finding an Eulerian path in a corresponding graph (Pevzner, 1989) • A linear time algorithm (Fleischner, 1990) • General case is NP-hard problem • Branch and bound • Heuristics • Extensions • PSBH (Positional SBH) • SBH with length error

  12. Motivations • Give some criteria which can determine the most possible k-tuples at both ends and in the middle of all possible reconstructions of the target DNA • These criterions greatly reduce ambiguities in the reconstruction of DNA • Transform the negative errors into the positive errors • These means enables us to handle both types of errors easily • Separate therepetitions from both type of errors

  13. Methods • Estimate the number of k-tuples that does not occur in a solution • Adjacency matrix (connection matrix) • Give a lower bound of k-tuples that does not occur in all solutions from k-tuple i to j

  14. Methods • Determine the most possible k-tuples at both ends • Reconstruct from the most possible end pairs to get an upper bound of SBH problem • Purge the end pairs that can not have better solution than current upper bound

  15. Methods • Transform the negative errors into the positive errors • Artificial k-tuple • Fill in all the possible gaps due to false negative error • Negative error level • The maximal number of allowed consecutively missing k-tuples • Reduce the number of artificial k-tuples

  16. Computational Experiments • 109 DNA sequence from GenBank • Simulate the SBH experiments • Error models • Randomly (probabilistic model) • Systematically (one base mismatched model)

  17. Conclusions • Ideal case (without repetitions and errors) can be solved in polynomial time (Pevzner, 1989) • General case is NP-hard problem • Design efficient algorithms • Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A new approach to the reconstruction of DNA sequencing by hybridization. Bioinformatics, vol 19(1), pages 14-21, 2003. • Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu. Combinatorial optimization problems in the positional DNA sequencing by hybridization and its algorithms. System Sciences and Mathematics, vol 3, 2002. (in Chinese) • Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang. Application of neural networks in the reconstruction of DNA sequencing by hybridization. In Proceedings of the 4th ISORA, 2002.

More Related