10 likes | 167 Views
Software Development For High-Throughput DNA Sequencing. Yaron Butterfield, Ran Guin, Ursula Skalska, Duane Smailus, Angelique Schnerch, Kevin Teague, Jacquie Schein, Marco Marra, Steven Jones And the Genome Sciences Centre. Abstract. Full Length cDNA Sequencing. Sub-Section Headers.
E N D
Software Development For High-Throughput DNA Sequencing Yaron Butterfield, Ran Guin, Ursula Skalska, Duane Smailus, Angelique Schnerch, Kevin Teague, Jacquie Schein, Marco Marra, Steven Jones And the Genome Sciences Centre Abstract Full Length cDNA Sequencing Sub-Section Headers Overview of cDNA sequencing process The project relies heavily on the LIMS to track clone processing and sequencing through the pipeline. All sequence data is then assembled and results are stored in a separate cDNA database. This database stores results of all Phrap assemblies as contained in the ACE file generated by Phrap. It also stores the results of extra calculations on the Phrap assemblies. Code development is primarily done in Perl and various packages make use of the two databases to obtain information on clones for further display on web pages and for use by analysis and processing scripts. Where appropriate, bio-perl is also used to work with clone sequence data. Consed is used where necessary to when editing and finishing is required. We have established a bioinformatics pipeline to handle the large amount of DNA sequence data generated in our laboratory. The Genome Sciences Centre generates an average of 1.75 Mb of raw sequence data per day. In order to facilitate the laboratory processes, we have created a laboratory information system where data is stored in a central MySQL database. Communication with the relational database is accomplished primarily using Perl, which drives a comprehensive web-based interface as well as a number of automated scripts. As a participating group in the Mammalian Gene Collection (MGC) initiative, one of our major projects has involved the sequencing of a large number of full length cDNA clones (http://mgc.nci.nih.gov/). We have developed an efficient, high-throughput method for the accurate DNA sequencing of entire cDNA clones that relies heavily on bioinformatics. Sequencing is accomplished through the insertion of Mu transposon into pools of cDNAs, followed by sequencing reactions primed with Mu-specific sequencing primers. We have designed software that uses BLAT to detect chimeric cDNA clones early on in the sequence pipeline. Algorithms allow for proportional representation of each cDNA clone in the pool. Clone end reads, transposon reads, and primer walk sequences are assembled using Phred, Phrap, and Consed to yield the full-length cDNA sequence. Additionally, sequence editing and other sequence finishing activities are performed as required to resolve sequence ambiguities. Sequence assembly information is placed into a MySQL database and visualized through the web. Our bioinfomatic approach allows for the automated identification of cDNA clones from the pool, recognition of completed clones, quality assessment, restriction fragment detection, and polyA site detection. We are currently in our second year of the MGC project and have used this method to generate more than 8.7 Mb of finished sequence from 4 650 candidate full-length cDNAs. We have also been able to analyse 22 785 sequenced Mu transposon insertion events into the cDNA clones. This process revealed a weak sequence preference for Mu insertion, though the insertion pattern deviates only slightly from random and does not adversely affect the efficacy of our method. If you require sub-section headers, do not use another font. Please continue to use Century Schoolbook, and make your headers as below. Special Sub-Section The header should be 16pt with surrounding text at 12pt. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Another Special Sub-Section Sub-Sub-Section Sub-sub-section headers should be bold and the same size as regular text, but indented by a tab. Sub-Sub-Section Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Multiple Pictures vertical clearance check If you have multiple images, consider alternating them between the left and right margins. This gives the layout a more balanced look. Leave sufficient room between the pictures and your text. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Laboratory Information Management System (LIMS) small vertical clearance check The GSC LIMS has been specifically designed for use in a genome sequencing laboratory. It has been developed to store detailed information related to standard sample preparation procedures as well as final sequence data. Also, by maintaining this detailed information in a highly structured format, there is the capability to perform a number of automatic procedures such as monitoring stock, error checking, and diagnostic analysis, generating real-time messages for users or regular email notifications to administrators. This helps to ensure the integrity of recorded data, and may prevent time-consuming errors by flagging them or identifying possible problems. The database is implemented in MySQL and is comprised of about 75 tables. The largest table is the clone_sequence table having close to 1 million records and a size of 4 GB (roughly 4k/read x 1 million reads). The remainder of the database is 60 MB. Database Schema Web front end and barcoding There is an interface which allows users to interact with the database via a barcode scanner during regular lab processes, and a sophisticated suite of report generating and data visualization tools which provide lab administrators with the means to quickly and effectively evaluate results and monitor status on a regular basis. Web Front End Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Future Acknowledgements Always let this be your last section. You can make the font here as small as you like (no smaller than 8pt though) to fit in everybody on your list. Organization A Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Organization B Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Figure 1. This is a small figure caption using 10pt text. Use small text sparingly. Please set your figures flush against the column guidelines. Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.