831 likes | 1.65k Views
Genome Biology for Programmers Lecture Series: Illumina Sequencing. Chris Daum JGI Illumina Group Lead April 1, 2011. Outline. Workflow Overview Process Science Sample Prep & qPCR quantification Cluster Generation Sequencing Sequencer instruments: GA & HiSeq Illumina Developments
E N D
Genome Biology for Programmers Lecture Series: Illumina Sequencing Chris Daum JGI Illumina Group Lead April 1, 2011
Outline • Workflow Overview • Process Science • Sample Prep & qPCR quantification • Cluster Generation • Sequencing • Sequencer instruments: GA & HiSeq • Illumina Developments • Illumina quality & continuous improvement
Illumina Workflow Analysis Clustering Sequencing Sample Preparation Sample Quantification Analysis
Sample Preparation Library Preparation – Main Goals: • Prepares sample nucleic acids for sequencing • Many library types and creation procedures exist • However, all preparation results in the same general template structure: • Double-stranded DNA flanked by two different adapters • Variables include: • Sequencing Application & Starting material (e.g. gDNA, mRNA, Mate Pair, Active Chromatin, ChIP-Seq) • Insert Size • Adaptor type • Index for multiplexing
Example Sample Prep Workflow:TruSeq Paired-end Library RNA DNA
Library Quantification - qPCR • Real-time qPCRallows accurate quantification of DNA templates: • qPCR is based on the detection of a fluorescent reporter molecule that increases as PCR product accumulates with each cycle of amplification • By using primers specific to the Illumina universal adapters in a qPCR reaction containing library template, only cluster-forming templates will be amplified and quantified
Library Quantification - qPCR Threshold of florescence for amplicon to produce a Cq Plot Standard curve using controls and determine concentration of library Phases of qPCR: Geometric phase – amplicons doubling every cycle; greatest precision & accuracy for quantitation Cycle Threshold Cq – Cycle of Quantification Log initial concentration Take home: qPCR mimics what is happening on the surface of the flowcell during cluster generation and allows for determining optimal loading concentrations.
Cluster Generation • Process occurs on cBot instrument: • Aspirates DNA samples into flow cell • Automates the formation of amplified clonal clusters from the DNA single molecules • 1000x amplification generates clusters • Hybridizes sequencing primer(s)
Illumina cBot • Cluster Generation 2.0 • Automated system significantly reduces workload for generation of flowcells • Compact design saves lab space • Reagent cartridge reduces prep time
Cluster Generation Prep • Prepare reagents and denature & dilute library: • The goal is to have the perfect cluster density to maximize yield (bp), this is achieved via optimized loading concentrations as determined by qPCR • Considerations: • Too low density: Fewer clusters, less sequence generated • Too high density: Overlapping clusters, removed by analysis filters, poor quality
Cluster Generation Chemistry • Cluster generation Chemistry: • Hybridization • Amplification • Linearization • Blocking • Primer hybridization
Cluster Generation Chemistry • Hybridize Sample fragments & extend:
Cluster Generation Chemistry • Bridge Amplification:
Cluster Generation Chemistry • Linearization, Blocking & Sequencing Primer Hybridization:
Sequencing • Main Goals: • Translate the chemical information of the nucleotides into fluorescence information which can be captured optically • The optical information is then transformed into text, which can be searched, aligned, or otherwise mined for biologically relevant data
Sequencing by Synthesis • Clustered Flowcell is loaded on Illumina sequencer:
Sequencing Chemistry: First Cycle Base Incorporation • To initiate the first sequencing cycle, add all 4 fluorescently labeled reversible terminators and DNA polymerase enzyme to the flowcell. • The complementary nucleotide will be added to the first position of each cluster. • A laser is then used to excite the attached fluorophore.
Sequencing Read 2 • Resynthesis of second strand for Read 2 occurs on sequencer without removing flowcell:
Index for Multiplex Sequencing • Sample multiplexing involves 3 reads: • A: Sample Read 1 is sequenced • B: Read 1 product removed and Index Read is sequenced • C: Template strand used to generate complementary strand, and sample Read 2 is sequenced • Analysis software identifies the index sequence from each cluster so that the sample reads 1 & 2 can be assigned to single sample
Illumina HiSeq2000 Sequencer Nifty Lights
HiSeq2000 Fluidics Fluidics were the Achilles heel of the GA, and now 2X in the HiSeq
HiSeq: Temperature control • 3 mechanisms: • Heat extraction via liquid coolant • Flow cell temperature control via Peltier • Maintain reagents temperature via cooled compartment • Reagent Chiller: • All reagents cooled at 4C • Condensation Pump runs every 4 min for 30 sec Flow cell sits on Peltierblocks, and is water cooled (heat extraction from underneath)
Cost & Throughput Comparison • Notes: • Throughput metrics are averages from runs performed in FY11 for each of the run types to date • Italicized HiSeq Bases & Reads throughput metrics are estimates based on 2x100 run type since we have limited data on other run types • Only vendor reagent costs shown here; library creation and overhead costs are not included, but are roughly equal and are mostly independent of run type • Cost per million reads goes up with the longer run types, but the readlength increases as well and this makes each read more valuable for some assembly applications • HiSeq 2x150 run type not yet supported & the current HiSeq chemistry has worse quality beyond 80-100bases than compared to GA • The HiSeq platform is still new and we are experiencing a higher number of hardware failures than GA; Illumina does replace reagents for failed runs and we rerun failed flowcells immediately whenever possible.
HiSeq Development Coming in early Summer:
Providing Quality Sequence Incident Reporting & Resolution (JIRA) Troubleshooting Procedures Throughput Goals & Metrics Continuous Improvement - Lean Six Sigma Failure Tracking & SPC Charts; RQC Instrument Status & real-time run monitoring Instrument Utilization & Efficiency
LLNL – Six Sigma Training • Tools and methodologies to: • Improve work quality • Improve process efficiencies & eliminate waste • Improve employee and customer satisfaction • Lean Six Sigma is about: • Eliminating waste and improving process flow • Focusing on reducing variation and improving process yield by following a problem-solving approach using statistical tools
What is Six Sigma? • A Six Sigma process is literally one that’s statistically 99.99966% successful. • This is not always cost effective to achieve, so as a methodology it’s about gaining control of a process and implementing improvements.
What is Six Sigma? • Six Sigma is a data driven problem solving approach where process inputs (Xs) are identified and optimized to impact the output (Y) • The output is a function of the inputs and process • Y: Output • f: function • X: variables that must be controlled to consistently predict Y Y = f(x)