10 likes | 131 Views
Whole Genome Shotgun Assembly Mike Ge, Enping Hong, Ariana Minot, Owen Astrachan Department of Computer Science, Duke University. Computer Science and WGSA Computer science is essential to the workings of WGSA in several ways:
E N D
Whole Genome Shotgun Assembly Mike Ge, Enping Hong, Ariana Minot, Owen AstrachanDepartment of Computer Science, Duke University • Computer Science and WGSA • Computer science is essential to the workings of WGSA in several ways: • Efficient and powerful algorithms. The use of algorithms greatly reduces the time required to process the vast amounts of information generated and needed by WGSA, allowing for rapid comparisons of data, locating reads, contig assembly, as well as intermarker assembly against known marker sites in the genome. • Databases. Computer science also allows for the creation of databases with which to store and organize the information generated by the process of WGSA, facilitating the algorithms in their tasks as well as for the assembled data to be stored and utilized for other studies. • Data presentation. The complex information presented by WGSA has to be presented and interpreted in some way. Computer algorithms help to organize and present the data generated, allowing scientists to interpret and draw conclusions far more effectively from that data. Introduction The Whole Genome Shotgun Assembly (WGSA) method of genomic sequencing was a sequencing technique developed and applied by Celera Genomics to sequence the human genome. Instead of conventional methods that clone and sequence individual segments of DNA, shotgun sequencing fragments the entire genome into smaller pieces and searches those pieces for overlaps. If an overlap is found and confirmed to be correct, the fragments can then be pieced together to form a longer sequence. By looking at many of these small sequences at the same time – the process by which this method gets its name – the process of genome sequencing is greatly accelerated with what is hoped to be a minimum effect on accuracy. Methodology • The entire genome of an organism is randomly split and broken up into millions of small pieces of DNA (usually 2, 10, 50 kb fragments). • These segments are then subcloned into appropriate plasmid vectors in order to create highly redundant sequence coverage across the whole genome. • The segments are sequenced from both ends by the chain determination method in order to obtain reads. • The reads are then assembled into sequence contigs by computational methods that matches the nucleotides of one read to the nucleotides of another. • The contigs are then linked together into scaffolds by connecting mate pairs together and matching up the reads at the ends of contigs. • The individual scaffolds are then aligned to form the genome map of the source organism with the help of a library or database of pre-defined genetic landmarks such as SNPs, genetic markers and genes that are found in genetic databases. History of WGSA First developed by Eugene Myers in 1999, the WGSA was a revolutionary method introduced by Celera Genomics (founded by Dr. Craig Venter in 1998) to compete with the government-funded Human Genome Project in sequencing the human genome. Far faster than the BAC cloning method employed by NIH (but potentially less accurate), the WGSA method was crucial to Celera’s success in the sequencing effort. Depending on incredibly powerful computer hardware and algorithms to piece the genome together, the method of WGSA is heavily reliant on computer science, and its techniques and algorithms have been continuously improved since its invention. • Benefits • Able to sequence an entire genome at a much faster rate than conventional methods. • Can initiate sequencing without having to create a clone-based physical map of the genome first. • Instead of generating thousands of sequence reads per clone, WGS can generate millions of sequence read, providing much more material for assembly. • Takes advantage of computer algorithms to help assemble the fragments of DNA. • Cost per nucleotide sequenced is lower than that of conventional methods. • Literature cited • Wikipedia: Shotgun Sequencing. Retrieved from http://en.wikipedia.org/wiki/Whole_genome_shotgun on November 8, 2006 • Green, Eric D. (2001). Strategies for the Systematic Sequencing of Complex Genomes. Nature Reviews, 2, p.537-538. • Shreeve, J. (2004). The Genome War: How Craig Venter tried to capture the Code of Life and change the world. New York, NY: Alfred A. Knopf. For further information Please refer to www.duke.edu/~eh39/ Or contact any of the following: mike.ge@duke.edu enping.hong@duke.edu ariana.minot@duke.edu ola@cs.duke.edu