190 likes | 337 Views
Cross_genome: Assembly Scaffolding using Cross-species Synteny. Zemin Ning High Performance Assembly. Can synteny help? And How?. Scaffolding. Contig gap closure. RACA - Reference-assisted chromosome assembly. Lattice of Target - Reference. Scaffold 3. Scaffold 2. Scaffold 1. Reference.
E N D
Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly
Can synteny help? And How? Scaffolding Contig gap closure
Lattice of Target - Reference Scaffold 3 Scaffold 2 Scaffold 1 Reference Q = scaff(i)*232 + contig_loci(j) Target sequence
After Noise Cleaning Gap_size = Y - X Y Scaffold 3 X Scaffold 2 Scaffold 1 Reference Target sequence
Cases Shouldn’t Join Reference Target Scaffold 1 Scaffold 2 Reference Target Gap_size Scaffold 2 Scaffold 1
Scaffold N50 for Other Genome Assemblies Original Cross_g References Panda 1.3Mb 25Mb Dog, Human Tibetan Antelope 2.6Mb 42Mb Cattle, Dog, Human Tasmanian Devil 1.8Mb 6.8Mb Opossum Availability ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/
Gorilla Assembly Human Reference Contig gap size re-estimation Improve gorilla assembly using human reference Combined Gorilla-Human Assembly Read Alignment Pair-wise/Multiple Read Clustering Local Assembly Final Gorilla Assembly
Re-estimate Contig Gap Sizes from Reference Local assembly based on clustered reads Gap size Target sequence Ref seq inserted Reference sequence New gap size New gap size
Gorilla - Merge with other De novo Assemblies *Fermi assembler: https://github.com/lh3/fermi/ +Masurca assembler: http://www.genome.umd.edu/masurca.html
Gs = (Kn – Ks)/D = 4.5x109 Kn = 125.4x109 – Total number of kmer words; Ks = 2.4x109 - Number of single copy kmer words; D = 27 - Depth of kmer occurrence
Original Contig (query) against New Assembly after Contig Break
Original Contig (query) against New Assembly after Contig Break
The Gorilla Assemblies Original New Total number of contigs: 464,875 285,139 N50 contig size: 11.7kb 23.9kb Largest contig: 191,556 322,733 Averaged contig size: 6085 9928
Acknowledgements: • Hanness Ponstingl • Frank Liu – Nanjing University of Information Technology (NUIT) • Yan Li – (NUIT) • Gorilla genome sequencing data • BGI – Panda and Tibetan Antelope assemblies