1 / 1

Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences

Failure mechanism of SBH. Methods. Interleaved repeats:. Simulation on genomic data was done as follows: Sequences were taken from random positions in the genomic sequence. Unless specified otherwise, genomic origin was chromosome II of S. cerevisiae.

tausiq
Download Presentation

Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Failure mechanism of SBH Methods Interleaved repeats: • Simulation on genomic data was done as follows: • Sequences were taken from random positions in the genomic sequence. • Unless specified otherwise, genomic origin was chromosome II of S. cerevisiae. • Random data was generated by one of the following methods: • Uniform Independent • A Markov Model whose parameters were derived from the genomic data. • All signatures are assumed to be perfect. a α b β c α d β e Triple repeat: a α b α c α d Conclusions Purpose • Simulation of SBH using uniformly drawn sequences does not represent the genomic reality. • Using Markov Models, even as simple as order 0, greatly improves the relevance of simulation studies designed to evaluate SBH and other measurement technologies. • Quantify performance of SBH on genomic sequences. • Propose and evaluate Markov Models for simulating data. Consecutive repeats: Repeats cause failure a α* b β* c Genomic data is more repetitive than random data and therefore will fail more often. k=8 k=10 k=12 Frequency of at least at least one interleaved repeat or one triple repeat in a sequence, comparing genomic (yeast) and randomly generated sequences. The bias in the frequency of triple repeats in genomic sequences may be attributed to regulatory motifs. Rate of unique reconstruction by complete k-mer arrays comparing sequences of various genomic origins to sequences derived randomly. k=8 k=10 k=12 Rate of unique reconstruction by complete k-mer arrays comparing genomic sequences to sequences derived randomly using Markov Models of various degrees. Estimated average number of consistent sequences reconstructed from complete k-mer arrays, comparing genomic and random sequences. Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences Doron Lipson, Ziv Nevo, Ari Frank, Dolev Dotan, Zohar Yakhini Computer Science Department, Technion, Haifa, Israel GATACAACGTTAAGATTTAGTATTG…CACCTTAGATTTTTGTGAGTACTC…ATTAGCTTCGATTAAAAGTCGT…

More Related