1 / 19

Cross_genome: Assembly Scaffolding using Cross-species Synteny

Cross_genome: Assembly Scaffolding using Cross-species Synteny. Zemin Ning High Performance Assembly. Can synteny help? And How?. Scaffolding. Contig gap closure. RACA - Reference-assisted chromosome assembly. Lattice of Target - Reference. Scaffold 3. Scaffold 2. Scaffold 1. Reference.

Download Presentation

Cross_genome: Assembly Scaffolding using Cross-species Synteny

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly

  2. Can synteny help? And How? Scaffolding Contig gap closure

  3. RACA - Reference-assisted chromosome assembly

  4. Lattice of Target - Reference Scaffold 3 Scaffold 2 Scaffold 1 Reference Q = scaff(i)*232 + contig_loci(j) Target sequence

  5. After Noise Cleaning Gap_size = Y - X Y Scaffold 3 X Scaffold 2 Scaffold 1 Reference Target sequence

  6. Cases Shouldn’t Join Reference Target Scaffold 1 Scaffold 2 Reference Target Gap_size Scaffold 2 Scaffold 1

  7. GAGE: Human Chr14 and RACA using Orangutan

  8. Scaffold N50 for Other Genome Assemblies Original Cross_g References Panda 1.3Mb 25Mb Dog, Human Tibetan Antelope 2.6Mb 42Mb Cattle, Dog, Human Tasmanian Devil 1.8Mb 6.8Mb Opossum Availability ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/

  9. Gorilla Assembly Human Reference Contig gap size re-estimation Improve gorilla assembly using human reference Combined Gorilla-Human Assembly Read Alignment Pair-wise/Multiple Read Clustering Local Assembly Final Gorilla Assembly

  10. Re-estimate Contig Gap Sizes from Reference Local assembly based on clustered reads Gap size Target sequence Ref seq inserted Reference sequence New gap size New gap size

  11. Assemblies using Synteny-guided Method

  12. Gorilla - Merge with other De novo Assemblies *Fermi assembler: https://github.com/lh3/fermi/ +Masurca assembler: http://www.genome.umd.edu/masurca.html

  13. Gs = (Kn – Ks)/D = 4.5x109 Kn = 125.4x109 – Total number of kmer words; Ks = 2.4x109 - Number of single copy kmer words; D = 27 - Depth of kmer occurrence

  14. Original Contig (query) against New Assembly after Contig Break

  15. Alignment Inconsistency

  16. Original Contig (query) against New Assembly after Contig Break

  17. Alignment Inconsistency

  18. The Gorilla Assemblies Original New Total number of contigs: 464,875 285,139 N50 contig size: 11.7kb 23.9kb Largest contig: 191,556 322,733 Averaged contig size: 6085 9928

  19. Acknowledgements: • Hanness Ponstingl • Frank Liu – Nanjing University of Information Technology (NUIT) • Yan Li – (NUIT) • Gorilla genome sequencing data • BGI – Panda and Tibetan Antelope assemblies

More Related