500 likes | 635 Views
High Throughput Profiling of Prokaryotic Species. Joachim De Schrijver joachim.deschrijver@ugent.be Vakgroep Wiskundige Modellering, Statistiek en Bio-informatica. Overview. Sequencing technology Roche/454 GS-FLX (‘454’) Illumina Prokaryotic profiling De novo genome sequencing
E N D
High Throughput Profiling of Prokaryotic Species Joachim De Schrijverjoachim.deschrijver@ugent.be Vakgroep Wiskundige Modellering, Statistiek en Bio-informatica
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Sequencing technology Classic chain-terminator sequencing Dye chain-terminator sequencing Next-generation sequencing
Sequencing technology • Next-gen sequencing principle • Massive parallel • Add ACTGs • Catch a signal
Sequencing technology • Roche/454 GS-FLX+ (‘454’) • Pyrosequencing • problems with homopolymers (e.g. AAAAAA) • Long-read sequencing: 500-1000 bp • Variable sequencing length • 1 million reads/run 1Gb/run • Sequencing speed: ~ 1 day/run • Next-next generation: IonTorrent PGM/Proton
Sequencing technology • Illumina • Sequence by synthesis • Short-read sequencing: 36, 72, …, 150bp • Fixed sequencing length • 1 billion reads/run • 100Gb/run (= 33 x human genome!) • Sequencing speed: 3 day – 10 days ~ length • Solid • Short-read sequencing (similar to Illumina)
Sequencing technology • 454 • Illumina
Sequencing technology • Price per run: $10000/run • Price per machine: $200-500.000 • Supporting IT hardware • Peripheral devices such as fragmentation instrument, PCR equipment … • Negotiating power… • Use service centers! • Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI … • No overhead cost, no maintenance etc. • Cheaper
Sequencing technology • Next-generation sequencing has become 2nd generation sequencing • Next-next-generation sequencing is almost there: 3rd generation sequencing • Helicos: True Single Molecule Sequencing • IonTorrent/Life: Cheap and fast • Nanopore: Unlimited read size • …
Sequencing technology • Evolution sequencing technology goes hand in hand with evolution of • IT infrastructure/hardware • Analysis software • Hardware • 1 Illumina run ~ 100Gb text-file ~ 5million page book • Processing power/storage are an issue! • Software • Mapping to a human genome: ‘couple of hours’
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Prokaryotic profiling • Prokaryotic genomics 101 • Prokaryotes = bacterias + archaea • Prokaryotic genomes • Large circular genome (0.5 – 10 Mb) ‘chromosome’ • Small plasmids (1-1000 kb) (virulence factors, antibiotics resistance …) • (Almost) no introns • Easy ORF annotation
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Prokaryotic profiling: de novo genome sequencing • 1953: Watson/Crick discover DNA helix • 1977: First complete genome bacteriophageφX174 • 1995: First genome of free-living organism H. influenza • 2001: First draft of the human genome • 2006: >200 complete bacterial genomes • 2012: An uncountable number of bacterial genomes have been sequenced using next-gen sequencing
Prokaryotic profiling: de novo genome sequencing • Complete bacterial genomes used to be • Expensive • Difficult to obtain • ‘Nature’ or ‘Science’ work • Remained complex until the invention of next-generation sequencing
Prokaryotic profiling:de novo genome sequencing • Using next-generation sequencing, de novo sequencing has become • Relatively easy • Relatively cheap • Routine research • Already >10 complete bacterial genomes published in 2012 • More than just an assembly!
Prokaryotic profiling: de novo genome sequencing • Practical • Get some DNA from an isolated species of interest • Sequence: long or short reads (1-10 days) • Obtain your sequences • Assemble (1h) • Pure de novo assembly • Guided assembly • Annotate the genome (days-weeks)
Prokaryotic profiling: de novo genome sequencing • Assembly: Multiple ‘short’ reads 1 long sequence • Existing software • Velvet • SSAKE • Newbler • SSAKE • … Source: Nature 2009, MacLean et al.
Prokaryotic profiling: de novo genome sequencing • Relatively cheap • Sequencing cost: depending on coverage • Illumina, 30x, 5Gb genome: $10-$100 • 454, 30x, 5Gb genome: $1000-$5000 • Equipment • IT infrastructure, sequencing equipment, people … • Relatively easy • Need for IT support • No out-of-the-box standard solution for everything • Several different software packages for assembly
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Prokaryotic profiling:Metagenomics • De novo genome assembly • Study of 1 single species • Need for species isolation • Metagenomics analysis • Study of a community of species • No need for isolation (culturing bias!) • Study the collective gene pool and function of the community/ecology • No need for individual functions
Prokaryotic profiling:Metagenomics • Practical • Get bacterial DNA or RNA from a sample • Soil • Gut/Fecal • Ocean water (e.g. Craig Venter) • … • Sequence: long or short reads (1-10 days) • Obtain your sequences • Map on a database of known genes (1 day) • Annotate/analyse the community (weeks)
Prokaryotic profiling:Metagenomics • 2010: Giant Panda genome (2nd carnivore) • No umami taster receptor -> no meat affinity • The panda is more a dog than a bear • The panda is a carnivore eating bamboo!
Prokaryotic profiling:Metagenomics • Still 2010 !: Panda ‘microbiome’ • Gut microbiome of the panda reveals the presence of bamboo/cellulose degrading pathways
Prokaryotic profiling:Metagenomics • A clinical example: gut microbiome can predict diabetes and malnourishment • Plos One (2011), Brown et al. Plos One (2010), Valladares et al. Gut Pathology (2011),Gupta et al.
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Prokaryotic profiling:SNP profiling • Classical SNP analysis - practical • Design PCR primers • Generate amplicons • Re-sequence using long read sequencing • Conserve ‘SNP blocks’ • Detect SNPs • Correlate SNPs to drug resistance, severity of symptoms …
Prokaryotic profiling:SNP profiling • Amplicon resequencing is the same for human, prokaryotic, viral analyses • Many standardized out-of-the-box solutions available • Very simple analysis • Watch out for the overkill… • Don’t use a bazooka to kill a fly! • Throughput can be too high
Prokaryotic/Viral profiling:SNP profiling • Profile the coding region of hepatitis C Lauck et al. 2012
Prokaryotic profiling:SNP profiling • Use next-generation sequencing to predict the optimal HIV therapy Thielen et al. 2012
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Prokaryotic profiling:Species quantification • Imagine the following research questions • Which (known) species/groups are present in a certain sample • Does this composition alter given a certain treatment, change of conditions, patients etc. • No need for de novo genome sequencing • No metagenomics: species instead of functions
Prokaryotic profiling:Species quantification • Prokaryotes have the gene 16S rDNA, coding for ribosomal RNA • The 16S rDNA region is 1.5 kb long • 16S rDNA is specific for each species/strain • Theoretical: 41,500= 10903 possibilities • In practice: 16S rDNA sequence known for millions of species
Prokaryotic profiling:Species quantification • 16S rDNA can be isolated in different species using universal PCR primers • Isolate/amplify different regions using the same primers • Compare the isolated sequences against a database of known sequences
Prokaryotic profiling:Species quantification • Practical procedure • Sample an environment and isolate DNA • Do a universal PCR amplification • Sequence using long read sequencing: the longer the better! • Obtain sequences • Map sequences against a reference database • Annotate the data
Prokaryotic profiling:Species quantification • Example: The Antarctica project • Which parameters determine the composition of bacterial communities in antarctical lakes? • 20 different samples/lakes • Sequence 16S rDNA genes • 1 x 454 run (1 million 500bp sequences) • Map all sequences back to the RDP database
Prokaryotic profiling:Species quantification • Analyse the data using computing power • Compare different locations • Is species A present in location1, location2,… • Assess the distribution in a single location • How dominant is the most dominant species in location 1 • How many species are in location 1 • … • Visualize !
Prokaryotic profiling:Species quantification • Analyse different samples on different taxonomic levels • Include taxonomic tree of life of bacterias • Use a ‘taxonomy browser’
Prokaryotic profiling:Species quantification • Analyse a single location
Prokaryotic profiling:Species quantification • Compare different locations
Overview • Sequencing technology • Roche/454 GS-FLX (‘454’) • Illumina • Prokaryotic profiling • De novo genome sequencing • Metagenomics • SNP profiling • Species quantification • Viral profiling • De novo genome sequencing
Viral profiling • Viral profiling • Viral profiling = prokaryotic profiling, but… • Cheaper • Faster • Easier • De novo genome sequencing = OK • Don’t spend $10.000 on a 100kb genome! • Multiplexing/pooling capacity is limited!
Viral profiling • Watch out for the overkill • An illumina run can be split into 8 lanes • >20 samples per lane can be combined • Still >100Mb per sample…
Questions joachim.deschrijver@ugent.be