1 / 20

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets . CS 502 Ankita Bhatia March 20 th 2014. Overview. Basic Definitions. Why Study Single Cells? Need for modified Assemblers for SC. De Brujin Graph Approach. Velvet vs Velvet SC Results

maya
Download Presentation

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets CS 502 Ankita Bhatia March 20th 2014

  2. Overview • Basic Definitions. • Why Study Single Cells? • Need for modified Assemblers for SC. • De Brujin Graph Approach. • Velvet vs Velvet SC • Results • Future Scope

  3. Basic Definitions Sequencing: Genome sequencing is figuring out the order of DNA nucleotides in a genome, the order of A, C, G, and T that make up an organism's DNA. Assembly: The process of trying to put these sequences together using overlap information. E.g. Trying to put together pieces of a shredded book.

  4. Why Study Single cells? Studying the genome of single cells will help to track changes that occur in DNA over time or changes associated with exposure to different conditions. Has a huge significance in the field of Medicine E.g. Antibiotics discovery would greatly benefit from Single cell sequencing.

  5. Some Terms Related to Assembly •  Read: sequence of a single fragment. • Contig: Combination of fragments that form a much longer sequence. E.g. ATGCTA and CTATGC combined together to form the longer sequence: ATGCTATGC. • Coverage: The average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(N), and the average length(L) as : N * L/G E.g. A hypothetical genome with 2,000 base pairs (G) reconstructed from 8 reads (N) with an average length of 500 (L) nucleotides will have a coverage of 2.

  6. Need for Modified Assemblers for SC Current Assemblers like Velvettend to use fixed coverage cutoff threshold for contigs to prune out low coverage areas. In normal multicellular assembly, coverage is fairly uniform hence this method works well. But in single cell assembly, coverage is highly non-uniform and low coverage regions can represent correct contigs. Hence, the authors have come up with a modified version of Velvet known as the Velvet- SC.

  7. De Brujin graph

  8. Simplification of De Brujin Graph Whenever a node A has only one outgoing arc that points to node B, with only one ingoing arc, the nodes can be merged. It is possible to represent both nodes as one, merging them and all their information together.

  9. Simplified graph

  10. Error Removal - Tips • A node is considered a tip and should be erased if it is disconnected on one of its ends.

  11. Error Removal - Bubbles Bubbles are generated when two paths start and end at the same nodes. Normally bubbles are caused by errors or biological variants. These errors are removed using the Tour Bus algorithm, which is similar to a Dijkstra'salgorithm, a breadth-first search that detects the best path to follow and determines which ones should be erased.

  12. Low Average Cutoff: Velvet

  13. Low Coverage Cut off: Velvet-SC

  14. Velvet vs Velvet-SC

  15. Results Comparison of assemblies of known genomes (for contigs >110 bp ) N50 is the median contig size of your genomic assembly. Using equal or longer contigs produces half of the genome.

  16. Results Comparison of contigs generated by Velvet versus EULER+Velvet-SC

  17. Results Contigs in blue or green match between the assemblies. Contigs in red or orange differ between the assemblies

  18. Results Single-cell assembly of an uncultured Deltaproteobacterium

  19. Future Scope This emerging technology will drive studies of uncultured organisms from the human microbiome(including pathogens) and from marine and soil environments (including bacteria producing antibiotics and bacteria with potential for biofuel production) The cost-effective approach demonstrated here should contribute to exploration of microbial taxonomy and evolution and facilitate the mining of environmental organisms for genes and pathways of interest to biotechnology and biomedicine. We also envision further development of EULER+Velvet-SC and applications in metagenomics and transcriptome sequencing projects, which are also characterized by highly nonuniform coverage.

  20. Thank You!!

More Related