E N D
1. PROTOCOLS FOR THEHIGH-VOLUME ASSEMBLYOF DNA BARCODES Mehrdad Hajibabaei
Barcode of Life Initiative
University of Guelph
Canada
<rd>background w specimens, molecules, etc. ? associate DNA/Biodiversity, suggest methodology</rd>
<rd>Added some headers to content to indicate subject structure</rd>
<rd>background w specimens, molecules, etc. ? associate DNA/Biodiversity, suggest methodology</rd>
<rd>Added some headers to content to indicate subject structure</rd>
2. Our work aims to set the stage for an effort to gather barcodes for all animal species. This task will be substantial; it will require some 100 million analyses if we seek 10 barcode records for each of the expected 10 million species.
<rd>illustrate the diversity ? amount of work ahead</rd>
<rd>Collage to be finished</rd>
Our work aims to set the stage for an effort to gather barcodes for all animal species. This task will be substantial; it will require some 100 million analyses if we seek 10 barcode records for each of the expected 10 million species.
<rd>illustrate the diversity ? amount of work ahead</rd>
<rd>Collage to be finished</rd>
3. These 100 million sequences are approximately twice the current size of GenBank.
<rd>Lets put this volume of genetic information in perspective:</rd>
<rd>Some colour work could be done here</rd>These 100 million sequences are approximately twice the current size of GenBank.
<rd>Lets put this volume of genetic information in perspective:</rd>
<rd>Some colour work could be done here</rd>
4. Since achieving barcode closure is going to be a significant task, its important to consider how long it might take. The answer depends, of course, on the number of sequence records that are generated. If we are ambitious and would like closure within a decade, well need to analyze 10M sequences a year. Of course, this work wont be done by a single lab. If all 200 people in this room rushed home and established a barcode lab, this production rate would be met if each lab generated 50K barcode sequences a year. This number is the production goal for our lab this year.
<rd>emphasize 10M per year and what this means logistically</rd>
Since achieving barcode closure is going to be a significant task, its important to consider how long it might take. The answer depends, of course, on the number of sequence records that are generated. If we are ambitious and would like closure within a decade, well need to analyze 10M sequences a year. Of course, this work wont be done by a single lab. If all 200 people in this room rushed home and established a barcode lab, this production rate would be met if each lab generated 50K barcode sequences a year. This number is the production goal for our lab this year.
<rd>emphasize 10M per year and what this means logistically</rd>
5. <rd>Define the problems/questions that the remainder of the presentation will address</rd><rd>Define the problems/questions that the remainder of the presentation will address</rd>
6. The Barcoding Process Before I discuss the protocols that we employ in our work, let me remind you of the analytical chain. Barcode analysis is straightforward, but it does involve a series of steps that each consumes time and money. Before I discuss the protocols that we employ in our work, let me remind you of the analytical chain. Barcode analysis is straightforward, but it does involve a series of steps that each consumes time and money.
7. At some point, before too long, sequencing will be so simple and inexpensive that it will be just part of the standard toolkit of any lab with interests in biodiversity. However, at present, sequence analysis is an expensive venture. There are two sorts of organizational structures that are appropriate at this point if one wishes to gather large numbers of barcodes.
The first involves an initiative built around a core DNA Analysis Facility. In this case, specimen fragments are submitted to a core facility that carries out all stages of DNA analysis train. The cost to establish a core facility capable of processing up to 100K samples per year is about $1M.At some point, before too long, sequencing will be so simple and inexpensive that it will be just part of the standard toolkit of any lab with interests in biodiversity. However, at present, sequence analysis is an expensive venture. There are two sorts of organizational structures that are appropriate at this point if one wishes to gather large numbers of barcodes.
The first involves an initiative built around a core DNA Analysis Facility. In this case, specimen fragments are submitted to a core facility that carries out all stages of DNA analysis train. The cost to establish a core facility capable of processing up to 100K samples per year is about $1M.
8. At some point, before too long, sequencing will be so simple and inexpensive that it will be just part of the standard toolkit of any lab with interests in biodiversity. However, at present, sequence analysis is an expensive venture. There are two sorts of organizational structures that are appropriate at this point if one wishes to gather large numbers of barcodes.
The first involves an initiative built around a core DNA Analysis Facility. In this case, specimen fragments are submitted to a core facility that carries out all stages of DNA analysis train. The cost to establish a core facility capable of processing up to 100K samples per year is about $1M.At some point, before too long, sequencing will be so simple and inexpensive that it will be just part of the standard toolkit of any lab with interests in biodiversity. However, at present, sequence analysis is an expensive venture. There are two sorts of organizational structures that are appropriate at this point if one wishes to gather large numbers of barcodes.
The first involves an initiative built around a core DNA Analysis Facility. In this case, specimen fragments are submitted to a core facility that carries out all stages of DNA analysis train. The cost to establish a core facility capable of processing up to 100K samples per year is about $1M.
9. Our barcode facility at Guelph is based on the first organizational model. Our collaborators seek barcodes, but arent keen to oversee a DNA lab. We are currently involved in an effort to increase our production rate from 5K to 50 K sequences per year. This exercise has forced us to abandon protocols that were working quite nicely, but simply wouldnt easily scale to the new production target. Weve had to move from an analytical chain based on single specimens to one based on batches of 96. We now aim to complete each step of the work flow in blocks of this size
<rd>memory trigger: incorporate image to recall the first model</rd>
<rd>Customize this to brand GAN</rd>Our barcode facility at Guelph is based on the first organizational model. Our collaborators seek barcodes, but arent keen to oversee a DNA lab. We are currently involved in an effort to increase our production rate from 5K to 50 K sequences per year. This exercise has forced us to abandon protocols that were working quite nicely, but simply wouldnt easily scale to the new production target. Weve had to move from an analytical chain based on single specimens to one based on batches of 96. We now aim to complete each step of the work flow in blocks of this size
<rd>memory trigger: incorporate image to recall the first model</rd>
<rd>Customize this to brand GAN</rd>
10. A critical first step in barcoding involves the assembly of specimens for analysis. With enough analytical capacity, one could simply analyze every specimen collected. However, because of the huge variation in species abundances, this is a very inefficient approach for gaining barcode coverage for species. There is a far better approach; establish collaborations with taxonomists. In this case, one can initially direct equal analytical effort to each species or OTU and then ramp up sampling intensity as complexities are encountered. Our work is being advanced through collaborations of this sort with many individuals and institutions holding identified specimens.
What about bioblitz and samplying done by us as well?A critical first step in barcoding involves the assembly of specimens for analysis. With enough analytical capacity, one could simply analyze every specimen collected. However, because of the huge variation in species abundances, this is a very inefficient approach for gaining barcode coverage for species. There is a far better approach; establish collaborations with taxonomists. In this case, one can initially direct equal analytical effort to each species or OTU and then ramp up sampling intensity as complexities are encountered. Our work is being advanced through collaborations of this sort with many individuals and institutions holding identified specimens.
What about bioblitz and samplying done by us as well?
11. <rd>composite image ? Sample Submission for high-throughput Sequencing. Key benefits individually traceable , compatible with a range of tissue preservation methods, long-term tissue storage solutions, and the format transfers directly to high-throughput methods.</rd>
<rd>composite image ? Sample Submission for high-throughput Sequencing. Key benefits individually traceable , compatible with a range of tissue preservation methods, long-term tissue storage solutions, and the format transfers directly to high-throughput methods.</rd>
12. The amount of tissue required for barcode analysis is very small- a single leg from the smallest insect, the tip of a birds feather or a tiny tissue sample will all suffice if the DNA is well preserved.
Note: DNA can be harvested during genitalic preps
<rd> examples of extractions overall</rd>The amount of tissue required for barcode analysis is very small- a single leg from the smallest insect, the tip of a birds feather or a tiny tissue sample will all suffice if the DNA is well preserved.
Note: DNA can be harvested during genitalic preps
<rd> examples of extractions overall</rd>
13. Destructive tissue sampling is the rule Any loss of tissue is a concern in some cases. If organisms are very small, DNA extraction using conventional methods can mean that there is no voucher specimen left. <rd>Consumptive tissue</rd> sampling is also undesirable for type specimens. However, there are now varied DNA extraction protocols that allow the organism to remain intact.Any loss of tissue is a concern in some cases. If organisms are very small, DNA extraction using conventional methods can mean that there is no voucher specimen left. <rd>Consumptive tissue</rd> sampling is also undesirable for type specimens. However, there are now varied DNA extraction protocols that allow the organism to remain intact.
14. Non-destructive methods are available Ideal for small life, and valuable specimens. Ideal for small life, and valuable specimens.
15. DNA Extraction There are varied DNA extraction methods that typically show a tradeoff between cost and efficiency. We employ two different approaches depending upon specimen condition. If we are working on dried specimens that are less than a year old or on frozen tissues, Chelex is great, but otherwise a move to membrane-based methods is critical. Both are equally rapid and both are available in 96 well format.
<rd>thumbnail to illustrate?</rd>
There are varied DNA extraction methods that typically show a tradeoff between cost and efficiency. We employ two different approaches depending upon specimen condition. If we are working on dried specimens that are less than a year old or on frozen tissues, Chelex is great, but otherwise a move to membrane-based methods is critical. Both are equally rapid and both are available in 96 well format.
<rd>thumbnail to illustrate?</rd>
16. Primer Design Optimized Reaction The next step in the barcode chain involves amplification of the target region.
Primer design is critical and minor adjustments can have large impacts on barcode recovery. The first phase of any study on a new group should involve a serious effort to identify optimal primers. Where we have done this, we typically gain very high success in barcode recovery. The 2 primer sets that we employ for lepidopterans recover the barcode region from more than 99% of species and our 2 primer sets for fishes have about 97% success. To drop PCR costs, its important to lower reaction volumes. Standard PCR reactions are often 50ul in volume, but we now use 10ul, lowering costs by nearly 80%.
<rd> basic thumbnail illustration</rd>
The next step in the barcode chain involves amplification of the target region.
Primer design is critical and minor adjustments can have large impacts on barcode recovery. The first phase of any study on a new group should involve a serious effort to identify optimal primers. Where we have done this, we typically gain very high success in barcode recovery. The 2 primer sets that we employ for lepidopterans recover the barcode region from more than 99% of species and our 2 primer sets for fishes have about 97% success. To drop PCR costs, its important to lower reaction volumes. Standard PCR reactions are often 50ul in volume, but we now use 10ul, lowering costs by nearly 80%.
<rd> basic thumbnail illustration</rd>
17. <rd> quick substitution</rd>
<rd> quick substitution</rd>
18. An optimized PCR reaction should yield sharp bands with minor impurities. Using this strategy we have removed the need for a PCR clean up process
To drop PCR costs, its important to lower reaction volumes. Standard PCR reactions are often 50ul in volume, but we now use 10ul, lowering costs by nearly 80%.
An optimized PCR reaction should yield sharp bands with minor impurities. Using this strategy we have removed the need for a PCR clean up process
To drop PCR costs, its important to lower reaction volumes. Standard PCR reactions are often 50ul in volume, but we now use 10ul, lowering costs by nearly 80%.
19. Screening PCR Products Following PCR, especially when one is working on a new group, it is critical to screen the reactions for product. This has traditionally been a laborious task involving gel casting and the loading of individual reaction products into the gel. We have explored two options to reduce this time and increase efficiency.Following PCR, especially when one is working on a new group, it is critical to screen the reactions for product. This has traditionally been a laborious task involving gel casting and the loading of individual reaction products into the gel. We have explored two options to reduce this time and increase efficiency.
20. Screening PCR Products However, there is another solution.: pre-cast agarose gels that screen 96 samples simultaneously at a time. They are fast, have no capital or maintenance costs and have a reasonable operating costHowever, there is another solution.: pre-cast agarose gels that screen 96 samples simultaneously at a time. They are fast, have no capital or maintenance costs and have a reasonable operating cost
21. Screening PCR Products The high technology approach involves microfluidic devices that are able to sip small volumes of the PCR product from a 96 well plate and determine both the size and concentration of the PCR product. These devices have several drawbacks- they are expensive, have high operating costs, are relatively slow, and maintenance contracts are expensive.
The high technology approach involves microfluidic devices that are able to sip small volumes of the PCR product from a 96 well plate and determine both the size and concentration of the PCR product. These devices have several drawbacks- they are expensive, have high operating costs, are relatively slow, and maintenance contracts are expensive.
22. Screening PCR Products
23. Protocols here are quite standard, but costs can be significantly reduced by both lowering the concentration of key reagents, especially Big Dye, and by lowering reaction volumes to as little as 2ul.
As concentrations drop, it is useful to make up large-volume master mixes. These can be aliquoted into 96 well plates and stored frozen till use.
There are a variety of methods for the cleanup of sequencing products, but we just employ column based approach that is available in 96 well format.Protocols here are quite standard, but costs can be significantly reduced by both lowering the concentration of key reagents, especially Big Dye, and by lowering reaction volumes to as little as 2ul.
As concentrations drop, it is useful to make up large-volume master mixes. These can be aliquoted into 96 well plates and stored frozen till use.
There are a variety of methods for the cleanup of sequencing products, but we just employ column based approach that is available in 96 well format.
24. Protocols here are quite standard, but costs can be significantly reduced by both lowering the concentration of key reagents, especially Big Dye, and by lowering reaction volumes to as little as 2ul.
As concentrations drop, it is useful to make up large-volume master mixes. These can be aliquoted into 96 well plates and stored frozen till use.
There are a variety of methods for the cleanup of sequencing products, but we just employ column based approach that is available in 96 well format.Protocols here are quite standard, but costs can be significantly reduced by both lowering the concentration of key reagents, especially Big Dye, and by lowering reaction volumes to as little as 2ul.
As concentrations drop, it is useful to make up large-volume master mixes. These can be aliquoted into 96 well plates and stored frozen till use.
There are a variety of methods for the cleanup of sequencing products, but we just employ column based approach that is available in 96 well format.
25. We now regularly gather bidirectional sequences. This has two advantages. Firstly, it allows us to generate full length barcode sequences because one avoids the problems in signal deterioration that often occur near the end of a read.
This approach has also allowed us to create specialized software that generates a consensus sequence for the 2 reads and determines the PHRED score for each position.
Slide could show Janet or Angela by sequencerWe now regularly gather bidirectional sequences. This has two advantages. Firstly, it allows us to generate full length barcode sequences because one avoids the problems in signal deterioration that often occur near the end of a read.
This approach has also allowed us to create specialized software that generates a consensus sequence for the 2 reads and determines the PHRED score for each position.
Slide could show Janet or Angela by sequencer
26. One of the primary challenges that we have encountered is the need for improved information management capabilities to track 50,000 specimens, their DNA extracts, their varied PCR products, and so on until the finished sequence is deposited in BOLD and ultimately in GenBank. There are commercial LIMS, but they are expensive. Installation at a single site can cost $50K. As a result, we are developing a barcode LIMS that will be available for distribution to high-volume analytical labs.One of the primary challenges that we have encountered is the need for improved information management capabilities to track 50,000 specimens, their DNA extracts, their varied PCR products, and so on until the finished sequence is deposited in BOLD and ultimately in GenBank. There are commercial LIMS, but they are expensive. Installation at a single site can cost $50K. As a result, we are developing a barcode LIMS that will be available for distribution to high-volume analytical labs.
27. <rd>Where we are now</rd><rd>Where we are now</rd>
28. If we want to see fast progress on the assembly of animal barcodes, it is critical that we develop the capability to obtain barcode records from museum collections. The primary challenge in this exercise lies with the fact that these specimens are old and they have been exposed to varied and often unknown histories of preservation. However, these collections are immense. There is more than a billion specimens awaiting analysis and their replacement value is huge. An average replacement cost for each bird in museum collections has been estimated at $500.
Slide of the SI birds in drawers
If we want to see fast progress on the assembly of animal barcodes, it is critical that we develop the capability to obtain barcode records from museum collections. The primary challenge in this exercise lies with the fact that these specimens are old and they have been exposed to varied and often unknown histories of preservation. However, these collections are immense. There is more than a billion specimens awaiting analysis and their replacement value is huge. An average replacement cost for each bird in museum collections has been estimated at $500.
Slide of the SI birds in drawers
29. Full Barcodes versus Mini-Barcodes Efforts to recover a full-length barcode sequence from specimens more than a decade old using standard protocols are rarely successful. The usual solution to this problem lies in the amplification of short sequences and their concatenation to generate a full-length product, an approach we have used with some success. Efforts to recover a full-length barcode sequence from specimens more than a decade old using standard protocols are rarely successful. The usual solution to this problem lies in the amplification of short sequences and their concatenation to generate a full-length product, an approach we have used with some success.
30. However, there are two interesting new analytical options.However, there are two interesting new analytical options.
31. Whole genome amplification protocols allow the 1000-fold amplification of genomic DNA. This means that even if a small quantity of intact barcode DNA is present in an extract, it can be amplified. One can follow up this process with a standard PCR of the barcode region.
Whole genome amplification protocols allow the 1000-fold amplification of genomic DNA. This means that even if a small quantity of intact barcode DNA is present in an extract, it can be amplified. One can follow up this process with a standard PCR of the barcode region.
32. A second analytical approach involves the use of PCR cocktails that include repair enzymes that can fix strand breaks or other DNA damage, allowing restitution of full-length templates where none existed. Sigma markets one of these products under the name Restorase, with the rather cute logo of CPR for PCR.
Augmented PCR cocktails could play a key role in aiding the analysis of museum specimens. Its worth recalling that there is a very large suite of repair enzymes. Current cocktails are rather narrow in their capacity to repair damage, but future blends will be more powerful.A second analytical approach involves the use of PCR cocktails that include repair enzymes that can fix strand breaks or other DNA damage, allowing restitution of full-length templates where none existed. Sigma markets one of these products under the name Restorase, with the rather cute logo of CPR for PCR.
Augmented PCR cocktails could play a key role in aiding the analysis of museum specimens. Its worth recalling that there is a very large suite of repair enzymes. Current cocktails are rather narrow in their capacity to repair damage, but future blends will be more powerful.
33. We have done a small amount of work with Restorase. Our results with Restorase 1 showed some promise, but there were two downsides. The reaction mix was expensive and its use was labor-intensive and required lots of optimization. We have done a small amount of work with Restorase. Our results with Restorase 1 showed some promise, but there were two downsides. The reaction mix was expensive and its use was labor-intensive and required lots of optimization.
34. We have now had a chance to examine the effectiveness of the next version of Restorase. Its cost wont likely decline, but it is much easier to use. Moreover, we have managed to recover barcodes from a number of 70-year old specimens that failed to reveal any product with other methods.
Clearly these are early times, but there are hopeful signs that approaches such as this will allow museum specimens to contribute in a very powerful way to the development of a comprehensive barcode library.
We have now had a chance to examine the effectiveness of the next version of Restorase. Its cost wont likely decline, but it is much easier to use. Moreover, we have managed to recover barcodes from a number of 70-year old specimens that failed to reveal any product with other methods.
Clearly these are early times, but there are hopeful signs that approaches such as this will allow museum specimens to contribute in a very powerful way to the development of a comprehensive barcode library.
35. In closing, Id like to consider the intrusion of barcodes into routine biodiversity assessment. These applications will likely place two demands on the process- one for speed and one for production volume. In this regard, it is worth noting that fast DNA extraction protocols, fast PCR and fast sequencing reactions/cleanup currently allow the move from organism to finished barcode within 3 hours. In closing, Id like to consider the intrusion of barcodes into routine biodiversity assessment. These applications will likely place two demands on the process- one for speed and one for production volume. In this regard, it is worth noting that fast DNA extraction protocols, fast PCR and fast sequencing reactions/cleanup currently allow the move from organism to finished barcode within 3 hours.
36. On the Horizon Current technologies will also make it possible to ramp up production volumes. Robotic intervention is possible throughout the analytical chain at all points bar acquiring the sample for analysis. The barcode labs at the SI and at Guelph are currently injecting such automation into their analytical chains. If they integrate successfully, it will be reasonable to think of facilities that generate a million sequences per year.Current technologies will also make it possible to ramp up production volumes. Robotic intervention is possible throughout the analytical chain at all points bar acquiring the sample for analysis. The barcode labs at the SI and at Guelph are currently injecting such automation into their analytical chains. If they integrate successfully, it will be reasonable to think of facilities that generate a million sequences per year.
37. On the Horizon Current technologies will also make it possible to ramp up production volumes. Robotic intervention is possible throughout the analytical chain at all points bar acquiring the sample for analysis. The barcode labs at the SI and at Guelph are currently injecting such automation into their analytical chains. If they integrate successfully, it will be reasonable to think of facilities that generate a million sequences per year.Current technologies will also make it possible to ramp up production volumes. Robotic intervention is possible throughout the analytical chain at all points bar acquiring the sample for analysis. The barcode labs at the SI and at Guelph are currently injecting such automation into their analytical chains. If they integrate successfully, it will be reasonable to think of facilities that generate a million sequences per year.
38. Acknowledgments In closing, I would like to acknowledge the deep involvement of the whole DNA barcode group at Guelph in honing the protocols that I have discussed. If you have questions, I will try to answer them.
In closing, I would like to acknowledge the deep involvement of the whole DNA barcode group at Guelph in honing the protocols that I have discussed. If you have questions, I will try to answer them.