1 / 55

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module 1 Introduction to next-gen sequencing. FRANCIS OUELLETTE Informatics on High Throughput Sequencing Data July 2009. Overview. “next-gen” or “next-next-gen”: why are we here? What kinds of sequencing are we doing?

mio
Download Presentation

Canadian Bioinformatics Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Module 1Introduction to next-gen sequencing FRANCIS OUELLETTE Informatics on High Throughput Sequencing Data July 2009

  3. Overview • “next-gen” or “next-next-gen”: why are we here? • What kinds of sequencing are we doing? • How does DNA sequencing works? • Trying to stay away from vender-specific challenges, but can we really? • Where next?

  4. History of DNA Sequencing Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998) 1870 Miescher: Discovers DNA Avery: Proposes DNA as ‘Genetic Material’ 1940 Efficiency (bp/person/year) Watson & Crick: Double Helix Structure of DNA 1953 Holley: Sequences Yeast tRNAAla 1 15 1965 Wu: Sequences  Cohesive End DNA 150 1970 Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation 1,500 1977 Messing: M13 Cloning 15,000 1980 25,000 Hood et al.: Partial Automation 50,000 1986 • Cycle Sequencing • Improved Sequencing Enzymes • Improved Fluorescent Detection Schemes 200,000 1990 50,000,000 2002 • Next Generation Sequencing • Improved enzymes and chemistry • New image processing 100,000,000,000 2009

  5. Why are we sequencing? • Before Next-generation: • Reductionist perspective on life • DNA, RNA, (proteins), (populations), sampling, averages, consensus • Problems: sampling, averages, consensus. • After Next-generation: • We are still reductionist, but better • Genome sequence and structure • Less cloning/PCR • Single molecules (for some)

  6. Basics of the “old” technology • Clone the DNA. • Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. • Separate mixture on some matrix. • Detect fluorochrome by laser. • Interpret peaks as string of DNA. • Strings are 500 to 1,000 letters long • 1 machine generates 57,000 nucleotides/run • Assemble all strings into a “whole”.

  7. Differences between the various platforms: • Nanotechnology used. • Resolution of the image analysis. • Chemistry and enzymology. • Signal to noise detection in the software • Software/images/file size/pipeline • Cost $$$

  8. Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk Next Generation DNA Sequencing Technologies

  9. From John McPherson, OICR Next-gen sequencers 100 Gb AB/SOLiDv3, Illumina/GAII short-read sequencers (10+Gb in 50-100 bp reads, >100M reads, 4-8 days) 10 Gb 454 GS FLX pyrosequencer 1 Gb (100-500 Mb in 100-400 bp reads, 0.5-1M reads, 5-10 hours) bases per machine run 100 Mb ABI capillary sequencer 10 Mb (0.04-0.08 Mb in 450-800 bp reads, 96 reads, 1-3 hours) 1 Mb 10 bp 100 bp 1,000 bp read length

  10. From John McPherson, OICR 2009/10 Promises? AB SOLiDv3 120Gb, 100 bp reads 100 Gb Illumina GAII 90Gb, 175bp reads 10 Gb 1 Gb 454 GS FLX Titanium bases per machine run 0.4-0.6 Gb, 100-400 bp reads 100 Mb 10 Mb ABI capillary sequencer (0.04-0.08 Mb, 450-800 bp reads 1 Mb 10 bp 100 bp 1,000 bp read length

  11. http://tinyurl.com/nk9rkm

  12. Solexa-based Whole Genome Sequencing Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

  13. Illumina (Solexa)

  14. Illumina (Solexa)

  15. Illumina (Solexa)

  16. From Debbie Nickerson, Department of Genome Sciences, University of Washington, http://tinyurl.com/6zbzh4

  17. AB SOLiD: filemanagement

  18. SOLiD color space

  19. SOLiD color space

  20. SOLiD color space

  21. SOLiD color space

  22. SOLiD color space

  23. SOLiD color space

  24. AB SOLiD

  25. SOLiD color space

  26. SOLiD color space

  27. SOLiD color space

  28. SOLiD color space

  29. http://solidsoftwaretools.com/gf/project/dh10bfrag/

  30. Sample AB data Lab >443_1087_001_F3 T12111121313231331100020021211112211 >443_1087_002_F3 T01121100201303232033213132212320123 >443_1087_003_F3 T21333200110101330330011101121132111 >443_1087_004_F3 T21322103331203331001002121021323111 >443_1088_005_F3 T32311301011311231133321301012223110 >443_1088_006_F3 T13211113031122103020002220012122101 >443_1088_007_F3 T21112301301221022023212000311310313 >443_1088_008_F3 T12133033210200001231010301011012031 >443_1088_009_F3 T23330012121212103111123012012320300 >443_1088_010_F3 T10213330331021322130123311011312110 • Get sequence assignment from instructor • Work with people at your table. • Use info from lecture notes (Panel E) • BLAST sequence at NCBI • What is it?

  31. Module 1 lab

  32. Roche / 454 : GS FLX • Also known as “pyrosequencing” • http://www.454.com/products-solutions/system-features.asp • 500 million bp/run • 10 hr run • 400-500 bp/read & > 1 M reads

  33. Roche / 454 : GS FLX • Made for de novo sequencing. • Too expensive for resequencing. • For example, this platform will be used a lot by laboratories doing new bacterial genomes. • Baylor Genome Center involved in Sea Urchin, Bee, Platypus genomes: They have a number of 454.

  34. Roche / 454 : GS FLX

  35. Roche / 454 : GS FLX

  36. Roche / 454 : GS FLX

  37. It’s more complicated! • Get files with quality scores • Get files with miss-matches • Need to align them to a reference genome • Multiple tools do this today … and there will be more later. • What do you do? Do it all!

  38. Pacific Biosystems (PacBio)July 2008

  39. Pacific Biosystems (PacBio)

  40. Things to keep in mind • All people are learning, if you don’t know, ask, and they probably won’t know either, and you can figure it out together! • The technology is changing – This workshop next year will be totally different! • We can only do so much in two days – you will need to find things, find people who can help you, and you will need to teach your friends!

  41. Other factors • Changing technology • New and disappearing companies? • Changing price structure • Cost of machine • Cost of operation (reagents/people) • Service from the company • 1 machine vs (2 or 3 machines) vs 40 machines. • Changing software and processing

More Related