360 likes | 1.98k Views
Genotyping-by- Sequencing what is it and what is it good for ?. Keith R. Merrill NCSU – Crop Science. GBS vs. RAD-Seq The ultimate throw down! (of acronyms). GBS: Genotyping-by-Sequencing RAD-Seq: Restriction-site associated DNA sequencing. GBS vs. RAD-Seq What’s the Difference?.
E N D
Genotyping-by-Sequencing what is it and what is it good for? Keith R. Merrill NCSU – Crop Science
GBS vs. RAD-SeqThe ultimate throw down! (of acronyms) GBS: Genotyping-by-Sequencing RAD-Seq: Restriction-site associated DNA sequencing
The Concept Assign sequences to individuals Call Variants between individuals Sequence Combined Pool Reduce the Genome Pool Samples
The Concept It’s all about probability Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n
The Concept Reduce the genome and increase the probability of overlap Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n
How it works Tags (AKA Barcodes, MID Barcodes, etc.) Tag1 Ind1 = GGATA Tag2 Ind2 = CACCA Tag3 Ind3 = CAGATA Tag4 Ind4 = GAAGTG Tag5 Ind5 = TAGCGGAT TagN IndN = …
How it works(The One Enzyme Method) Tag1 Tag1 Tag1 Ind1 Tag2 Tag2 Tag2 Ind2 Tag3 Tag3 Tag3 Ind3 Tag4 Tag4 Tag4 Ind4 Tag5 Tag5 Tag5 Ind5 TagN TagN TagN IndN
How it worksSize Selection Base-pair range selected
How it worksPooling Tag1 Tag1 Ind1 Ind1 Size Selection (optional if using two-enzymes) Tag2 Tag2 Ind2 Ind2 Tag3 Tag3 Ind3 Ind3 Ind4 Tag4 Tag4 Ind4 Tag5 Tag5 Ind5 Ind5 IndN TagN TagN IndN
Why Pool Samples? • On the Illumina Hi-seq 2000: • 8 lanes of sequencing, each capable of giving 374 million reads. • You can’t partition a lane. • Sequencing is expensive ($1500 - $3000 per lane). • You don’t need/want 374 million reads per individual.
A Word About Tags • Hamming vs. Edit Distance • Sequence errors may result from things other than sequencing. • n-1 errors are the most common error encountered during oligo synthesis.
Analysis It’s about time… and money… and time Key Considerations: • Time • Computing power available • Amount of sequence data (back to time) • Availability of a reference genome
Key Considerations • Study goals • Availability of a reference genome • Expected degree of polymorphism • Choice of restriction enzyme • DNA sample preparation • Adaptor design • PCR amplification • Sequencing • Pooling individuals • Analysis
Analysis It’s about time… and money… and time A Few Options: • Stacks • For use with bi-parental mapping populations • Takes a lot of time • Looks at entire reads • Reference genome optional • Designed to work nicely with MySQL • More memory intensive • UNEAK • For use with species without a reference genome • Uses only 64 bp of each read • MUCH faster than Stacks • Less memory intensive • TASSEL • For use with species with a reference genome • Uses only 64 bp of each read • MUCH faster than Stacks • Less memory Intensive • Custom scripts • Completely flexible (hence the ‘custom’) • Requires significant knowledge about programming (or knowing someone who does and is willing to help)
Does it work? Note: This is with hexaploid wheat and no reference genome
The Good • No ascertainment bias • Random distribution throughout the genome • May be useful for species without a reference genome • Useful with genomic selection • May provide a large number of SNPs • Relatively low per sample cost
The Good (cont) GBS is extremely flexible • Number of individuals per lane/flowcell • Choice of enzymes • Cut sites • Methylation sensitivity • Size of fragments selected
The Bad • Poor reproducibility between runs • Species without a reference genome *cannot* infer missing data • Often dealing with large amounts of missing data • Difficult to filter out false SNPs in non-mapping populations, unless you have a reference genome and even then… • In my opinion: this would be nigh impossible to use with association studies in species without a reference genome UNLESS you sequence to very high coverage to virtually eliminate missing data (alternatively, you could drastically reduce the genome by your choice of enzymes – but this may be bad if your expected degree of polymorphism is low)
TASSEL-GBS • www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119 • GBS_Document • www.maizegenetics.net/tassel/docs/TasselPipelineGBS.pdf